[FFmpeg-devel] [PATCH] SSE2 Xvid idct

Alexander Strange astrange
Sun Apr 6 22:41:53 CEST 2008


On Apr 6, 2008, at 4:03 PM, Pascal Massimino wrote:
>  Re,
>
> On Sun, Apr 6, 2008 at 9:39 PM, Pascal Massimino <pascal.massimino at gmail.com 
> >
> wrote:
>
>>
>>
>>>
>>> [...]
>>>>    "movdqa   %%xmm2, ("dct")         \n\t" \
>>>>    "movdqa   %%xmm3, %%xmm2          \n\t" \
>>>>    "psubsw   %%xmm6, %%xmm3          \n\t" \
>>>>    "paddsw   %%xmm2, %%xmm6          \n\t" \
>>>>    "movdqa   %%xmm6, %%xmm2          \n\t" \
>>>>    "psubsw   %%xmm7, %%xmm6          \n\t" \
>>>>    "paddsw   %%xmm2, %%xmm7          \n\t" \
>>>>    "movdqa   %%xmm3, %%xmm2          \n\t" \
>>>>    "psubsw   %%xmm5, %%xmm3          \n\t" \
>>>>    "paddsw   %%xmm2, %%xmm5          \n\t" \
>>>>    "movdqa   %%xmm5, %%xmm2          \n\t" \
>>>>    "psubsw   %%xmm0, %%xmm5          \n\t" \
>>>>    "paddsw   %%xmm2, %%xmm0          \n\t" \
>>>>    "movdqa   %%xmm3, %%xmm2          \n\t" \
>>>>    "psubsw   %%xmm4, %%xmm3          \n\t" \
>>>>    "paddsw   %%xmm2, %%xmm4          \n\t" \
>>>>    "movdqa  ("dct"), %%xmm2          \n\t" \\
>>
>>
> oh! now i recall an optim: you don't need to
> save and recall xmm2 in "dct", provided you replace
> the first butterfly :
>
>>    "movdqa   %%xmm3, %%xmm2          \n\t" \
>>    "psubsw   %%xmm6, %%xmm3          \n\t" \
>>    "paddsw   %%xmm2, %%xmm6          \n\t" \
>
> by its (non-saturating) sub,add,add equivalent:
>
> psubw %%xmm6,%%xmm3
> paddw %%xmm6,%%xmm6
> paddw %%xmm3,%%xmm6

xmm2 is used as scratch for the other butterflies too, so it would  
have to replace all of them. Also, that has more register dependencies  
and might change the overflow behavior... I don't think it looks good,  
but I'll try it. Right now it looks like reordering branches/replacing  
shufd are the best things to look at first.




More information about the ffmpeg-devel mailing list