[FFmpeg-devel] [PATCH] SSE dct32()

Vitor Sessak vitor1001
Wed Jun 30 18:59:34 CEST 2010


On 06/28/2010 07:23 AM, Vitor Sessak wrote:
> On 06/23/2010 11:27 PM, Vitor Sessak wrote:
>> On 06/20/2010 07:51 PM, Michael Niedermayer wrote:
>>> On Sun, Jun 20, 2010 at 01:12:48PM +0100, M?ns Rullg?rd wrote:
>>>> Vitor Sessak<vitor1001 at gmail.com> writes:
>>>>
>>>>> On 06/20/2010 01:33 PM, M?ns Rullg?rd wrote:
>>>>>> Vitor Sessak<vitor1001 at gmail.com> writes:
>>>>>>
>>>>>>> On 06/20/2010 12:15 PM, M?ns Rullg?rd wrote:
>>>>>>>> Vitor Sessak<vitor1001 at gmail.com> writes:
>>>>>>>>
>>>>>>>>>>> I don't remember seeing a big difference _for the dct32 code_
>>>>>>>>>>> between in ==
>>>>>>>>>>> out and in != out.
>>>>>>>>>>
>>>>>>>>>> now iam confused, i thought the 3% you quoted was about in
>>>>>>>>>> ==out vs in!= out
>>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> No, the 3% slowdown was when converting our general code (using
>>>>>>>>> FFT)
>>>>>>>>> to have in != out.
>>>>>>>>
>>>>>>>> And that was due to missed optimisations caused by gcc not knowing
>>>>>>>> that those pointers don't alias each other. Marking them
>>>>>>>> restrict is
>>>>>>>> not good either, since we actually want to pass the same value
>>>>>>>> sometimes.
>>>>>>>
>>>>>>> That and one extra used register.
>>>>>>
>>>>>> So what do we do? I see the following options:
>>>>>>
>>>>>> 1. Change mp3 decoder to work with inplace transform.
>>>>>
>>>>> Looks hard with no speed loss
>>>>
>>>> Just hard or impossible?
>>>
>>> hard, not impossible
>>> just consider that dct32() trashes its input array
>>>
>>> Either way, the in != out thing is not a big issue if its not slower
>>> what is a big issue is that high level optimizations have to be done
>>> before asm optimisations
>>>
>>> is our dct32() code optimal? If i didnt miscount mp3lib does 4
>>> butterflies
>>> less but i could have miscounted. Also our dct32() should be benchmarked
>>> against dct32() codes from other mp3 decoders to make sure our highlevel
>>> code is ok before one starts writing asm for it
>>
>> Our C dct32() is faster than mp3lib latest svn C version. Patch to test
>> attached (dct32_test.diff). Don't expect me to test every dct32()
>> implementation on the web...
>>
>> Anyway, in what does it influences the patch to move dct32() to shared
>> code? New version attached (dct32_common.diff)...
>
> ping?

No route to host?

-Vitor



More information about the ffmpeg-devel mailing list