[FFmpeg-devel] [PATCH] Dsputilize some functions from APE decode 2/2 - SSE2

Michael Niedermayer michaelni
Thu Jul 10 22:36:07 CEST 2008


On Thu, Jul 10, 2008 at 06:17:59PM +0300, Kostya wrote:
> On Thu, Jul 10, 2008 at 01:20:18PM +0200, Michael Niedermayer wrote:
> > On Thu, Jul 10, 2008 at 01:48:46PM +0300, Kostya wrote:
> > > On Thu, Jul 10, 2008 at 11:48:14AM +0200, Michael Niedermayer wrote:
> > > > On Thu, Jul 10, 2008 at 11:16:01AM +0300, Kostya wrote:
> > > > [...]
> > > > > +static void add_int16_sse2(int16_t * v1, int16_t * v2, int order)
> > > > > +{
> > > > > +    x86_reg o = -(order << 1);
> > > > > +    v1 += order;
> > > > > +    v2 += order;
> > > > > +    asm volatile(
> > > > > +        "1:                       \n\t"
> > > > 
> > > > > +        "movdqu  (%1,%2), %%xmm0  \n\t"
> > > > > +        "paddw   (%0,%2), %%xmm0  \n\t"
> > > > > +        "movdqa  %%xmm0,  (%0,%2) \n\t"
> > > > > +        "add     $16,     %2      \n\t"
> > > > > +        "movdqu  (%1,%2), %%xmm0  \n\t"
> > > > > +        "paddw   (%0,%2), %%xmm0  \n\t"
> > > > > +        "movdqa  %%xmm0,  (%0,%2) \n\t"
> > > > > +        "add     $16,     %2      \n\t"
> > > > 
> > > > is that faster than:
> > > > "movdqu    (%1,%2), %%xmm0  \n\t"
> > > > "paddw     (%0,%2), %%xmm0  \n\t"
> > > > "movdqa  %%xmm0,    (%0,%2) \n\t"
> > > > "movdqu  16(%1,%2), %%xmm0  \n\t"
> > > > "paddw   16(%0,%2), %%xmm0  \n\t"
> > > > "movdqa  %%xmm0,  16(%0,%2) \n\t"
> > > > "add     $32,     %2      \n\t"
> > > > 
> > > > ?
> > >  
> > > It was the first thing I've tried. It was slower (on Core2).
> > 
> > and:
> > 
> > "movdqu    (%1,%2), %%xmm0  \n\t"
> > "movdqu  16(%1,%2), %%xmm1  \n\t"
> > "paddw     (%0,%2), %%xmm0  \n\t"
> > "paddw   16(%0,%2), %%xmm1  \n\t"
> > "movdqa  %%xmm0,    (%0,%2) \n\t"
> > "movdqa  %%xmm1,  16(%0,%2) \n\t"
> > "add     $32,     %2      \n\t"
>  
> It's on par. Patch attached for reference.

patch looks ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The misfortune of the wise is better than the prosperity of the fool.
-- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080710/347843f0/attachment.pgp>



More information about the ffmpeg-devel mailing list