[FFmpeg-devel] [PATCH] SSE dct32() [Was: r23095 - in trunk/libavcodec: ...]

Vitor Sessak vitor1001
Sun Jun 20 13:39:26 CEST 2010


On 06/20/2010 01:33 PM, M?ns Rullg?rd wrote:
> Vitor Sessak<vitor1001 at gmail.com>  writes:
>
>> On 06/20/2010 12:15 PM, M?ns Rullg?rd wrote:
>>> Vitor Sessak<vitor1001 at gmail.com>   writes:
>>>
>>>>>> I don't remember seeing a big difference _for the dct32 code_ between in ==
>>>>>> out and in != out.
>>>>>
>>>>> now iam confused, i thought the 3% you quoted was about in ==out vs in!= out
>>>>> ?
>>>>
>>>> No, the 3% slowdown was when converting our general code (using FFT)
>>>> to have in != out.
>>>
>>> And that was due to missed optimisations caused by gcc not knowing
>>> that those pointers don't alias each other.  Marking them restrict is
>>> not good either, since we actually want to pass the same value
>>> sometimes.
>>
>> That and one extra used register.
>
> So what do we do?  I see the following options:
>
> 1. Change mp3 decoder to work with inplace transform.

Looks hard with no speed loss

> 2. Copy the block before doing inplace transform.

Speed loss

> 3. Apply magic to remove slowdown from splitting in/out.
> Did I miss anything?

Yes:

4. Have a special function pointer only for the 32-point DCT accepting 
in != out as in my patch in this thread (dct32_new.diff). Note that for 
the function for 32-point DCT (and only for it) in != out does not give 
a noticeable speed loss.

-Vitor



More information about the ffmpeg-devel mailing list