[FFmpeg-devel] [PATCH] get_pixels_sse2

Baptiste Coudurier baptiste.coudurier
Thu Oct 9 19:47:48 CEST 2008


Hi Michael,

Michael Niedermayer wrote:
> On Wed, Oct 08, 2008 at 06:48:30PM -0700, Baptiste Coudurier wrote:
>> Hi
>>
>> $subject.
>>
>> 1987 dezicycles in get pixels mmx, 131063 runs, 9 skips
>> 2014 dezicycles in get pixels mmx, 262129 runs, 15 skips
>> 2005 dezicycles in get pixels mmx, 524258 runs, 30 skips
>> 2009 dezicycles in get pixels mmx, 1048513 runs, 63 skip
>> 2025 dezicycles in get pixels mmx, 2097009 runs, 143 skips
>>
>> 1820 dezicycles in get pixels sse2, 131061 runs, 11 skips
>> 1828 dezicycles in get pixels sse2, 262125 runs, 19 skips
>> 1819 dezicycles in get pixels sse2, 524259 runs, 29 skips
>> 1814 dezicycles in get pixels sse2, 1048524 runs, 52 skips
>> 1813 dezicycles in get pixels sse2, 2097063 runs, 89 skips
>>
>> -- 
>> Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
>> Smartjog USA Inc.                                http://www.smartjog.com
>> Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
> 
>> Index: libavcodec/i386/dsputilenc_mmx.c
>> ===================================================================
>> --- libavcodec/i386/dsputilenc_mmx.c	(revision 15588)
>> +++ libavcodec/i386/dsputilenc_mmx.c	(working copy)
>> @@ -56,6 +56,40 @@
>>      );
>>  }
>>  
>> +static void get_pixels_sse2(DCTELEM *block, const uint8_t *pixels, int line_size)
>> +{
>> +    asm volatile(
>> +        "pxor %%xmm7,      %%xmm7         \n\t"
>> +        "movq (%0),        %%xmm0         \n\t"
>> +        "movq (%0, %2),    %%xmm1         \n\t"
>> +        "movq (%0, %2,2),  %%xmm2         \n\t"
>> +        "movq (%0, %3),    %%xmm3         \n\t"
>> +        "punpcklbw %%xmm7, %%xmm0         \n\t"
>> +        "punpcklbw %%xmm7, %%xmm1         \n\t"
>> +        "punpcklbw %%xmm7, %%xmm2         \n\t"
>> +        "punpcklbw %%xmm7, %%xmm3         \n\t"
>> +        "movdqa %%xmm0,      (%1)         \n\t"
>> +        "movdqa %%xmm1,    16(%1)         \n\t"
>> +        "movdqa %%xmm2,    32(%1)         \n\t"
>> +        "movdqa %%xmm3,    48(%1)         \n\t"
>> +        "lea (%0,%2,4), %0                \n\t"
>> +        "movq (%0),        %%xmm0         \n\t"
> 
> my gut feeling says that the code should be faster with the lea moved farther
> up, but i might be wrong ...

Changed. I don't really see the difference in benchmark though.

> [...]
>> @@ -1332,7 +1366,11 @@
>>              }
>>          }
>>  
>> -        c->get_pixels = get_pixels_mmx;
>> +        if(mm_flags & MM_SSE2)
>> +            c->get_pixels = get_pixels_sse2;
>> +        else
>> +            c->get_pixels = get_pixels_mmx;
>> +
>>          c->diff_pixels = diff_pixels_mmx;
>>          c->pix_sum = pix_sum16_mmx;
> 
> there is a if(mm_flags & MM_SSE2) below, this could be used instead
> of adding a new if()

Ok, done.

Updated patch attached.

-- 
Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
Smartjog USA Inc.                                http://www.smartjog.com
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: get_pixels_sse2_2.patch
Type: text/x-diff
Size: 2147 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081009/462ce700/attachment.patch>



More information about the ffmpeg-devel mailing list