[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions

Christophe GISQUET christophe.gisquet
Sun Nov 18 17:20:35 CET 2007


Michael Niedermayer a ?crit :
> On Sat, Nov 17, 2007 at 12:33:31PM +0100, Christophe GISQUET wrote:
>> +#define SHIFT2_16B_END_LINE(R)                  \
>> +    "psraw     %5, %%mm"#R"           \n\t"     \
>> +    "movq      %%mm"#R", (%2)         \n\t"     \
>> +    "add       %3, %1                 \n\t"     \
>> +    "add       $24, %2                \n\t"
> 
> the $24 add can be avoided by using a offset for the movq above

Applied. Also made me see I didn't use SHIFT2_8B_END_LINE macro.

>> +        "movq      %%mm3, %%mm1            \n\t" /* 0,1,1,0*/
>> +        "movq      %%mm4, %%mm2            \n\t" /* 0,1,1,0*/
>> +        "psubw     %%mm5, %%mm3            \n\t" /*-1,1,1,0*/
>> +        "psubw     %%mm6, %%mm4            \n\t" /*-1,1,1,0*/
>> +        "psllw     $3, %%mm1               \n\t" /* 0,8,8,0*/
>> +        "psllw     $3, %%mm2               \n\t" /* 0,8,8,0*/
>> +        "movd      0(%1,%3), %%mm5         \n\t"
>> +        "movd      4(%1,%3), %%mm6         \n\t"
>> +        "paddw     %%mm1, %%mm3            \n\t" /*-1,9,9,0*/
>> +        "paddw     %%mm2, %%mm4            \n\t" /*-1,9,9,0*/
>> +        "punpcklbw %%mm0, %%mm5            \n\t"
>> +        "punpcklbw %%mm0, %%mm6            \n\t"
>> +        "psubw     %%mm5, %%mm3            \n\t" /*-1,9,9,-1*/
>> +        "psubw     %%mm6, %%mm4            \n\t" /*-1,9,9,-1*/
> 
> 
>         "psubw     %%mm3, %%mm5            \n\t" /* 1,-1,-1, 0*/
>         "psubw     %%mm4, %%mm6            \n\t" /* 1,-1,-1, 0*/
>         "psllw     $3, %%mm3               \n\t" /* 0,8,8,0*/
>         "psllw     $3, %%mm4               \n\t" /* 0,8,8,0*/
>         "movd      0(%1,%3), %%mm1         \n\t"
>         "movd      4(%1,%3), %%mm2         \n\t"
>         "psubw     %%mm5, %%mm3            \n\t" /*-1,9,9,0*/
>         "psubw     %%mm6, %%mm4            \n\t" /*-1,9,9,0*/
>         "punpcklbw %%mm0, %%mm1            \n\t"
>         "punpcklbw %%mm0, %%mm2            \n\t"
>         "psubw     %%mm1, %%mm3            \n\t" /*-1,9,9,-1*/
>         "psubw     %%mm2, %%mm4            \n\t" /*-1,9,9,-1*/

Yes, but I've decided to use pmullw here... (see below).

>> +     "movq      %%mm1, %%mm3    \n\t"                      \
>> +     "movq      %%mm2, %%mm4    \n\t"                      \
>> +     "paddw     %%mm1, %%mm1    \n\t"                      \
>> +     "paddw     %%mm2, %%mm2    \n\t"                      \
>> +     "paddw     %%mm3, %%mm1    \n\t" /* 3* */             \
>> +     "paddw     %%mm4, %%mm2    \n\t" /* 3* */             \
> 
> have you checked that pmullw with 3 is not faster?

It only improves the horizontal pass (2550 vs 2700 dezicycles ie 5%).
Other seem improved too, but by less than 1%.

There are 2 reasons why I didn't want to use pmullw as much as possible:
- here, I couldn't load the factor in a register (seems less speed
critical than in my recollection)
- I have a core2 and an Athlon computers; both have a latency for pmullw
of 3; I think some P4 have a latency of 6.

As for the code change you proposed in the previous paragraph, I decided
to retest vc1_put_shift2_mmx with a pmullw. It did improve the speed of
this function on my core2 (around 5%).

Therefore, the patch I attached implements the 2 uses of pmullw. It's
better on my computer, I don't know for P4.

Best regards,
-- 
Christophe GISQUET

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vc1dsp.diff
Type: text/x-patch
Size: 26117 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071118/73891a58/attachment.bin>



More information about the ffmpeg-devel mailing list