[FFmpeg-devel] [PATCH] SSE RDFT

Måns Rullgård mans
Sat Mar 20 23:07:14 CET 2010


Alex Converse <alex.converse at gmail.com> writes:

> 2010/3/20 M?ns Rullg?rd <mans at mansr.com>
>
>> Jason Garrett-Glaser <darkshikari at gmail.com> writes:
>>
>> > On Sun, Mar 14, 2010 at 3:23 PM, Alex Converse <alex.converse at gmail.com>
>> wrote:
>> >> I'm sure I've made some embarrassingly amateurish mistakes here.
>> >> Feedback is more than welcome.
>> >>
>> >> --Alex
>> >
>> > In the interests of getting away from discussions about yasm and into
>> > actually reviewing the asm...
>> >
>> > +///sign mask of RDFT sine terms
>> >
>> > Three / ?
>> >
>> > Looking at the asm overall, it looks like there's a huge amount of
>> > moving stuff around and very little actual calculation.  Is there no
>> > better way to organize it?
>> >
>> > +        "movlps     (%4,%0,4), %%xmm4     \n\t"
>> > +        "unpcklps      %%xmm4, %%xmm4     \n\t"
>> > +        "movlps     (%5,%0,4), %%xmm3     \n\t"
>> > +        "unpcklps      %%xmm3, %%xmm3     \n\t"
>> >
>> > This looks like a candidate for movsldup in an SSE3 version.
>>
>> Well?
>>
>
> Sorry, I've been a little tied up trying to finish up PS.
>
> There is a lot of data shuffling in here. One potential reduction is
> reorganizing the trig tables but keeping extra trig tables around is always
> a bit controversial.

FWIW, the NEON FFT uses interleaved trig tables.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list