[FFmpeg-devel] [PATCH 1/9] SBR DSP x86: implement SSE qmf_pre_shuffle

Michael Niedermayer michaelni at gmx.at
Fri Apr 5 15:56:14 CEST 2013


On Thu, Apr 04, 2013 at 07:45:45PM +0000, Christophe Gisquet wrote:
> From 253 to 70c on Arrandale and Win64.
> ---
>  libavcodec/x86/sbrdsp.asm    | 33 +++++++++++++++++++++++++++++++++
>  libavcodec/x86/sbrdsp_init.c |  2 ++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
> index 1b7f3a8..2029b45 100644
> --- a/libavcodec/x86/sbrdsp.asm
> +++ b/libavcodec/x86/sbrdsp.asm
> @@ -220,3 +220,36 @@ cglobal sbr_qmf_post_shuffle, 2,3,4,W,z
>      cmp               zq, r2q
>      jl             .loop
>      REP_RET
> +
> +INIT_XMM sse
> +cglobal sbr_qmf_pre_shuffle, 1,4,7,z
> +%define OFFSET  (32*4-2*mmsize)
> +    mov       r3q, OFFSET
> +    lea       r1q, [zq + (32+1)*4]
> +    lea       r2q, [zq + 64*4]
> +    mova       m6, [ps_neg]
> +.loop:
> +    movu       m0, [r1q]
> +    movu       m2, [r1q + mmsize]
> +    movu       m1, [zq + r3q + 4 + mmsize]
> +    movu       m3, [zq + r3q + 4]
> +    xorps      m2, m6
> +    xorps      m0, m6
> +    shufps     m2, m2, q0123
> +    shufps     m0, m0, q0123
> +    mova       m5, m2
> +    mova       m4, m0
> +    unpcklps   m2, m3
> +    unpckhps   m5, m3
> +    unpcklps   m0, m1
> +    unpckhps   m4, m1
> +    mova  [r2q + 2*r3q + 0*mmsize], m2
> +    mova  [r2q + 2*r3q + 1*mmsize], m5
> +    mova  [r2q + 2*r3q + 2*mmsize], m0
> +    mova  [r2q + 2*r3q + 3*mmsize], m4
> +    add       r1q, 2*mmsize
> +    sub       r3q, 2*mmsize
> +    jge      .loop
> +    mova       m2, [zq]
> +    movlps  [r2q], m2

using simpler memory indexing ([r2q + n*mmsize] and [zq])
and incremeanting them seperately seems 1-2 cpu cycles faster here

on sandybridge

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In fact, the RIAA has been known to suggest that students drop out
of college or go to community college in order to be able to afford
settlements. -- The RIAA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130405/2666b372/attachment.asc>


More information about the ffmpeg-devel mailing list