[FFmpeg-devel] MMX version for put_no_rnd_h264_chroma_mc8_c

Christophe GISQUET christophe.gisquet
Sun Nov 25 00:35:12 CET 2007


Good evening,

Michael Niedermayer a ?crit :
> also the //START_TIMER dont belong in the patch

It was intended in that case, to show how I compared the versions, and
it seems it was worth it. Here are the new results.

Before:
VC-1: 2085 dezicycles in rnd, 1047692 runs, 884 skips
h264: 1093 dezicycles in rnd, 2096936 runs, 216 skips

Patch applied:
VC-1: 2106 dezicycles in rnd, 1047537 runs, 1039 skips
      2119 dezicycles in no_rnd, 1047384 runs, 1192 skips
h264: 1097 dezicycles in rnd, 2096867 runs, 285 skips

And using a global benchmarking, without the *_TIMER macro, yields no
measurable difference.

>>          const int dxy = x ? 1 : stride;
>>  
>>          asm volatile(
>> +            "movq %2, %%mm6\n\t"
>>              "movd %0, %%mm5\n\t"
>>              "movq %1, %%mm4\n\t"
>>              "punpcklwd %%mm5, %%mm5\n\t"
>>              "punpckldq %%mm5, %%mm5\n\t" /* mm5 = B = x */
>> -            "movq %%mm4, %%mm6\n\t"
>>              "pxor %%mm7, %%mm7\n\t"
>>              "psubw %%mm5, %%mm4\n\t"     /* mm4 = A = 8-x */
>> -            "psrlw $1, %%mm6\n\t"        /* mm6 = 4 */
>> -            :: "rm"(x+y), "m"(ff_pw_8));
>> +            "psrlw $3, %%mm6" /* mm6 = rnd */
>> +            :: "rm"(x+y), "m"(ff_pw_8), "m"(*rnd_reg));
> 
> the psrlw can be avoided by shifting the constant right

The bilinear case further down doesn't do that psrlw and use the
constant as is. Still I applied your suggestion, that you can observe in
the attached patch.

Best regards,
-- 
Christophe GISQUET
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h264.2.diff
Type: text/x-patch
Size: 5355 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071125/1ce4589d/attachment.bin>



More information about the ffmpeg-devel mailing list