[FFmpeg-devel] [PATCH] split-radix FFT

Benjamin Larsson banan
Sun Jul 27 11:09:27 CEST 2008


Loren Merritt wrote:
> $subject, vaguely based on djbfft.
> Changed from djb:
> * added simd.
> * removed the hand-scheduled pentium-pro code. gcc's output from simple
> C is better on all cpus I have access to.
> * removed the distinction between fft and ifft. they're just
> permutations of eachother, so the difference belongs in revtab[] and not
> in the code.
> * removed the distinction between pass() and pass_big(). C can always
> use the memory-efficient version, and simd never does because the
> shuffles are too costly.
> * made an entirely different pass_big(), to avoid store->load aliasing.
> 
> I tried the tangent FFT, but I couldn't make it faster than split-radix.
> Tangent has asymptotically 5% fewer arithmetic ops, but only 1-2% for
> sizes typical of audio codecs, and even a couple extra shuffles or other
> overhead pushes it over.
> 
> I tried an in-place fft_permute, but it wasn't any faster than
> out-of-place + memcpy, and quite a bit more complex.
> 
> 
> benchmarks (cycles):
> 
> 2^4  2^5  2^6   2^7  2^8   2^9    2^10  2^11   2^12    2^13    2^14    
> 2^15     2^16 fft size
> 
> --Loren Merritt

How do I reproduce your benchmarks ? I wanna test on Turion and Geode.

MvH
Benjamin Larsson




More information about the ffmpeg-devel mailing list