[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Michael Niedermayer michaelni
Thu Aug 23 04:03:28 CEST 2007


Hi

On Wed, Aug 22, 2007 at 08:31:12PM -0400, Marc Hoffman wrote:
> On 8/21/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> > Hi
> >
> > On Tue, Aug 21, 2007 at 05:02:18PM -0400, Marc Hoffman wrote:
> > > Hi,
> > >
> > > On 8/19/07, Justin Ruggles <justinruggles at bellsouth.net> wrote:
> > > >
> > > >                      256    512   1024   2048    4096
> > > > -------------------------------------------------------
> > > > (1) fft32           46.1   101.0  205.0  494.4  1169.3
> > > > (2) fftr2            7.6    16.0   34.1  101.3   258.5
> > > > (3) fft ffmpeg       7.4    16.4   33.8   82.4   187.7
> > > > (4) fftr4/2          6.7    13.6   28.2   74.0   169.8
> > > >
> > > > (1) fixedpoint 32bit simple RAD2 fft
> > > > (2) simple rad2 first stage rad4
> > > > (3) current lavc RAD2, first 2 stages, and middle pulled
> > > > (4) rad4/2 with last stage rad2 if needed
> > >
> > > I'm adding a split radix to the list.  Its a lot harder to optimize
> > > because of the extra pointer manipulation and control structures.  I
> > > did manage to unroll the W0 multipliers out but getting a clean
> > > iteration space has not happened yet.
> >
> > could you also add the split radix fft from
> > liba52-cvs/a52dec/liba52/imdct.c
> > as comparission
> >
> > also djbfft and fftw2 would be interresting
> > IIRC they all use split radix ffts
> >
> > and looking at them might help you with optimizing your code
> 
> Your right if we do those techniques we would have to implement a
> different optimized FFT for every possible data size we need. Unless
> I'm missing something please advise.

each of these implementations needs exactly 4 lines of code (not counting {}
in liba52)

now all the ffts are n^2 based so if we assume that the largest we want
to support is 1mb then we would need 80 lines of code max

and if you want to support all sizes up to 2^64 thats still just 256 lines
and if you use a macro well its just 64+4 lines then

that said iam not saying this aproach should be taken but if you cant find
a faster and simpler solution then it might be a good choice

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Democracy is the form of government in which you can choose your dictator
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070823/478976b2/attachment.pgp>



More information about the ffmpeg-devel mailing list