[FFmpeg-devel] [PATCH] sbrdsp.asm: convert all instructions to float/SSE ones.

Reimar Döffinger Reimar.Doeffinger at gmx.de
Wed Mar 7 22:33:53 CET 2012


On Wed, Mar 07, 2012 at 01:29:31PM -0800, Jason Garrett-Glaser wrote:
> On Wed, Mar 7, 2012 at 12:35 PM, Reimar Döffinger
> <Reimar.Doeffinger at gmx.de> wrote:
> > Since the values are floats, using the float operations
> > makes sense, improves performance on some CPUs and
> > makes the code SSE compatible instead of needing SSE2.
> >
> > Based on suggestion by Jason.
> >
> > Signed-off-by: Reimar Döffinger <Reimar.Doeffinger at gmx.de>
> > ---
> >  libavcodec/x86/sbrdsp.asm |   16 ++++++++--------
> >  1 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
> > index c3b559b..31a1c8b 100644
> > --- a/libavcodec/x86/sbrdsp.asm
> > +++ b/libavcodec/x86/sbrdsp.asm
> > @@ -82,14 +82,14 @@ cglobal sbr_hf_g_filt, 5, 6, 5
> >     lea         r0, [r0 + r3*8]
> >     neg         r3
> >  .loop4:
> > -    movq        m0, [r2 + 4*r3 + 0]
> > -    movq        m1, [r2 + 4*r3 + 8]
> > -    movq        m2, [r1 + 0*STEP]
> > -    movq        m3, [r1 + 2*STEP]
> > +    movlps      m0, [r2 + 4*r3 + 0]
> > +    movlps      m1, [r2 + 4*r3 + 8]
> > +    movlps      m2, [r1 + 0*STEP]
> > +    movlps      m3, [r1 + 2*STEP]
> >     movhps      m2, [r1 + 1*STEP]
> >     movhps      m3, [r1 + 3*STEP]
> > -    punpckldq   m0, m0
> > -    punpckldq   m1, m1
> > +    unpcklps    m0, m0
> > +    unpcklps    m1, m1
> 
> Suggestion (not required for this patch) -- if you do an SSE3 version,
> use movddup instead of movlps + unpcklps.

I'll try to remember it when someone does.
I personally probably won't, considering my CPU doesn't do SSE3
and I have no plans on buying a new one so far :-)


More information about the ffmpeg-devel mailing list