[FFmpeg-devel] [PATCH] split-radix FFT
Sun Jul 27 11:09:27 CEST 2008
Loren Merritt wrote:
> $subject, vaguely based on djbfft.
> Changed from djb:
> * added simd.
> * removed the hand-scheduled pentium-pro code. gcc's output from simple
> C is better on all cpus I have access to.
> * removed the distinction between fft and ifft. they're just
> permutations of eachother, so the difference belongs in revtab and not
> in the code.
> * removed the distinction between pass() and pass_big(). C can always
> use the memory-efficient version, and simd never does because the
> shuffles are too costly.
> * made an entirely different pass_big(), to avoid store->load aliasing.
> I tried the tangent FFT, but I couldn't make it faster than split-radix.
> Tangent has asymptotically 5% fewer arithmetic ops, but only 1-2% for
> sizes typical of audio codecs, and even a couple extra shuffles or other
> overhead pushes it over.
> I tried an in-place fft_permute, but it wasn't any faster than
> out-of-place + memcpy, and quite a bit more complex.
> benchmarks (cycles):
> 2^4 2^5 2^6 2^7 2^8 2^9 2^10 2^11 2^12 2^13 2^14
> 2^15 2^16 fft size
> --Loren Merritt
How do I reproduce your benchmarks ? I wanna test on Turion and Geode.
More information about the ffmpeg-devel