[FFmpeg-devel] [PATCH 4/5] x86: hpeldsp: implement SSE2 versions

James Almer jamrial at gmail.com
Thu May 22 22:13:50 CEST 2014


On 22/05/14 2:48 PM, Christophe Gisquet wrote:
> Those are mostly used in codecs older than H.264, eg MPEG-2.
> 
> put16 versions:
>       mmx  mmx2  sse2
> x2:  1888  1185   552
> y2:  1778  1092   510
> 
> avg16 xy2: 3509(mmx2) -> 2169(sse2)
> ---
>  libavcodec/x86/hpeldsp.asm    | 115 +++++++++++++++++++++++++++++++-----------
>  libavcodec/x86/hpeldsp_init.c |  15 ++++++
>  2 files changed, 100 insertions(+), 30 deletions(-)
> 
> diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm
> index 2adead2..1d26c45 100644
> --- a/libavcodec/x86/hpeldsp.asm
> +++ b/libavcodec/x86/hpeldsp.asm
> @@ -35,21 +35,39 @@ SECTION_TEXT
>  
>  ; void ff_put_pixels8_x2(uint8_t *block, const uint8_t *pixels, ptrdiff_t line_size, int h)
>  %macro PUT_PIXELS8_X2 0
> +%if cpuflag(sse2)
> +cglobal put_pixels16_x2, 4,5,4
> +%else
>  cglobal put_pixels8_x2, 4,5
> +%endif
>      lea          r4, [r2*2]
>  .loop:
> -    mova         m0, [r1]
> -    mova         m1, [r1+r2]
> -    PAVGB        m0, [r1+1]
> -    PAVGB        m1, [r1+r2+1]
> +    movu         m0, [r1+1]
> +    movu         m1, [r1+r2+1]

I assume movu is needed for the sse2 version, but unless i'm missing 
something there's no need to force it on the mmx version.
Afaik, old CPUs (The kind that doesn't have SSE2) have slow unaligned 
movs, so performance would be degraded where it matters.


More information about the ffmpeg-devel mailing list