[FFmpeg-devel] [PATCH] SSE dct32()

Vitor Sessak vitor1001
Mon Jun 28 07:23:04 CEST 2010


On 06/23/2010 11:27 PM, Vitor Sessak wrote:
> On 06/20/2010 07:51 PM, Michael Niedermayer wrote:
>> On Sun, Jun 20, 2010 at 01:12:48PM +0100, M?ns Rullg?rd wrote:
>>> Vitor Sessak<vitor1001 at gmail.com> writes:
>>>
>>>> On 06/20/2010 01:33 PM, M?ns Rullg?rd wrote:
>>>>> Vitor Sessak<vitor1001 at gmail.com> writes:
>>>>>
>>>>>> On 06/20/2010 12:15 PM, M?ns Rullg?rd wrote:
>>>>>>> Vitor Sessak<vitor1001 at gmail.com> writes:
>>>>>>>
>>>>>>>>>> I don't remember seeing a big difference _for the dct32 code_
>>>>>>>>>> between in ==
>>>>>>>>>> out and in != out.
>>>>>>>>>
>>>>>>>>> now iam confused, i thought the 3% you quoted was about in
>>>>>>>>> ==out vs in!= out
>>>>>>>>> ?
>>>>>>>>
>>>>>>>> No, the 3% slowdown was when converting our general code (using
>>>>>>>> FFT)
>>>>>>>> to have in != out.
>>>>>>>
>>>>>>> And that was due to missed optimisations caused by gcc not knowing
>>>>>>> that those pointers don't alias each other. Marking them restrict is
>>>>>>> not good either, since we actually want to pass the same value
>>>>>>> sometimes.
>>>>>>
>>>>>> That and one extra used register.
>>>>>
>>>>> So what do we do? I see the following options:
>>>>>
>>>>> 1. Change mp3 decoder to work with inplace transform.
>>>>
>>>> Looks hard with no speed loss
>>>
>>> Just hard or impossible?
>>
>> hard, not impossible
>> just consider that dct32() trashes its input array
>>
>> Either way, the in != out thing is not a big issue if its not slower
>> what is a big issue is that high level optimizations have to be done
>> before asm optimisations
>>
>> is our dct32() code optimal? If i didnt miscount mp3lib does 4
>> butterflies
>> less but i could have miscounted. Also our dct32() should be benchmarked
>> against dct32() codes from other mp3 decoders to make sure our highlevel
>> code is ok before one starts writing asm for it
>
> Our C dct32() is faster than mp3lib latest svn C version. Patch to test
> attached (dct32_test.diff). Don't expect me to test every dct32()
> implementation on the web...
>
> Anyway, in what does it influences the patch to move dct32() to shared
> code? New version attached (dct32_common.diff)...

ping?

-Vitor



More information about the ffmpeg-devel mailing list