[FFmpeg-devel] [PATCH 3/4] huffyuvencdsp: Add ff_diff_bytes_sse2

James Almer jamrial at gmail.com
Mon Oct 19 22:37:51 CEST 2015


On 10/19/2015 5:00 PM, Timothy Gu wrote:
> 4% to 35% faster depending on the width.
> ---
>  libavcodec/x86/huffyuvencdsp.asm   | 31 ++++++++++++++++++++-----------
>  libavcodec/x86/huffyuvencdsp_mmx.c |  8 +++++++-
>  2 files changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/libavcodec/x86/huffyuvencdsp.asm b/libavcodec/x86/huffyuvencdsp.asm
> index 97de7e9..9625fbe 100644
> --- a/libavcodec/x86/huffyuvencdsp.asm
> +++ b/libavcodec/x86/huffyuvencdsp.asm
> @@ -27,27 +27,27 @@
>  
>  section .text
>  
> -INIT_MMX mmx
>  ; void ff_diff_bytes_mmx(uint8_t *dst, const uint8_t *src1, const uint8_t *src2,
>  ;                        intptr_t w);
> -cglobal diff_bytes, 4,6,0, dst, src1, src2, w, i
> +%macro DIFF_BYTES 0
> +cglobal diff_bytes, 4,6,2, dst, src1, src2, w, i
>      xor               iq, iq
> -    cmp               wq, 16
> +    cmp               wq, mmsize * 2
>          jb        .loop2
> -    sub               wq, 15
> +    sub               wq, mmsize * 2 - 1
>  .loop:
> -    mova              m0, [src2q + iq]
> -    mova              m1, [src1q + iq]
> +    movu              m0, [src2q + iq]
> +    movu              m1, [src1q + iq]

If dst and/or src can sometimes be aligned, check how ff_add_hfyu_left_pred
(also huffyuvdsp.asm) handles it.



More information about the ffmpeg-devel mailing list