[FFmpeg-devel] Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Marc Hoffman mmhoffm
Sat Jul 28 03:56:34 CEST 2007


Hi,

On 7/27/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> Hi
>
> On Fri, Jul 27, 2007 at 05:40:14PM -0400, mmh wrote:
> [...]
> > +/*
> > +  This is a fixedpoint inplace 32bit FFT which accepts 3 arguments:
> > +
> > +  @param X   - input signal in format 1.31
> > +  @param W   - phase factors in 1.31 format
> > +  @param lgN - log_2(N) where N is the size of the input data set.
> > +
> > +*/
> > +void ff_fft32 (FFTContext *c, FFTComplex32 *X)
> > +{
> > +    FFTComplex16 *W = c->exptab16;
> > +    int lgN    = c->nbits;
> > +    int N      = 1<<lgN;
> > +    int s, j, k, m, w, ws;
> > +    int64_t wwr,wwi;
> > +
> > +    for (s=1; s<=lgN; s++) {
> > +        m  = 1<<s;
> > +        ws = 1<<(lgN-s);
> > +        w=0;
> > +        for (j=0; j<m/2; j++) {
> > +            wwr=W[w].re;
> > +            wwi=W[w].im;
> > +            w+=ws;
> > +            for (k=j; k<N; k+=m) {
> > +                long tr,ti;
> > +                int k2 = k+m/2;
>
> signed and /2 is slow either changed to unsiged or use >>1
>
Compiler should take care of this via strength reduction and to boot
its a constant in the loop. I think m/2 is ok but will change if you
want me too semantics are identical.

>
> > +
> > +                tr        = (X[k2].re*wwr - X[k2].im*wwi + 0x4000)>>15;
> > +                ti        = (X[k2].re*wwi + X[k2].im*wwr + 0x4000)>>15;
> > +
> > +                X[k2].re  = (X[k].re - tr);
> > +                X[k2].im  = (X[k].im - ti);
> > +
> > +                X[k].re   = (X[k].re + tr);
> > +                X[k].im   = (X[k].im + ti);
> > +            }
> > +        }
> > +    }
> > +}
>
> what about implementing a split radix fft?
> (http://en.wikipedia.org/wiki/Split-radix_FFT)

I'm thinking this is the begining of the whole fixedpoint thing.  My
real goal here is to get some of this working with an audio codec and
see what we really need to optimize.

Benjiman was talking about a 640 point FFT he needs for one of the
standards I'm thinking here we might build something special like a
128pt/5pt FFT but for now I wanted to keep it as simple as possible to
get the infastructure correct.

>
> also several iterations have trivial (1/-1/0) w* these should not be done
> in this generic way
?

>
>
> > +
> > +/*
> > +  use same number of bits of precision as floating point.
> > +*/
> > +#define PRECISION 23
> > +#define HALF (1<<PRECISION)
> > +void ff_fft32x32 (FFTContext *c, FFTComplex32 *X)
> > +{
> > +    FFTComplex32 *W = c->exptab32;
> > +    int lgN = c->nbits;
> > +    int N   = 1<<lgN;
> > +    int s, j, k, m, w, ws;
> > +    int64_t wwr,wwi;
> > +
> > +
> > +    for (s=1; s<=lgN; s++) {
> > +        m  = 1<<s;
> > +        ws = 1<<(lgN-s);
> > +        w=0;
> > +        for (j=0; j<m/2; j++) {
> > +            wwr=W[w].re;
> > +            wwi=W[w].im;
> > +            w+=ws;
> > +            for (k=j; k<N; k+=m) {
> > +                long tr,ti;
> > +                int k2 = k+m/2;
> > +
> > +                tr        = (X[k2].re*wwr - X[k2].im*wwi + HALF)>>PRECISION;
> > +                ti        = (X[k2].re*wwi + X[k2].im*wwr + HALF)>>PRECISION;
>
> what about a
>
> re+=re;
> im+=im;
> tr= ((re*wwr)>>32) - ((im*wwi)>>32);
> ti= ((re*wwi)>>32) + ((im*wwr)>>32);
>
> this should be much faster if a 32*32->high 32 instruction is available
>
yep I can do that no problem. I was really just trying to match the
fft-test thing, think I could easily ajust the precision later when it
goes into use.

>
>
> > +
> > +                X[k2].re  = (X[k].re - tr);
> > +                X[k2].im  = (X[k].im - ti);
> > +
> > +                X[k].re   = (X[k].re + tr);
> > +                X[k].im   = (X[k].im + ti);
> > +            }
> > +        }
> > +    }
> > +}
> > +
> > +
> > +
>
> > +
>
> trailing whitespace
>
k

>
> > +/*
> > +  Generate 16b coefficient table from a predefined floating point table.
> > +*/
>
> not a proper doxygen comment

I probably need to do that for every one of the functions good point thanks.

>
>
> > +static FFTComplex16 *stwids (FFTComplex *w, int n) {
> > +    int i;
> > +    FFTComplex16 *v = av_malloc (sizeof (short)*n);
> > +    for (i = 0; i < n/2; i++) {
> > +        v[i].re = w[i].re*32767;
> > +        v[i].im = w[i].im*32767;
> > +    }
>
> this should be 32768 with proper cliping

ok good point.

>
>
> [...]
> > +  s->fft_calc = (void *)ff_fft32;
>
> unneeded cast or your code is buggy
>
removed.

>
> > +}
>
> inconsistant indention
>
>
> [...]
> > +/* complex multiplication: p = a * b */
> > +#define CMUL(pre, pim, are, aim, bre, bim) \
> > +{\
> > +    int32_t _are = (are);\
> > +    int32_t _aim = (aim);\
> > +    int64_t _bre = (bre);\
> > +    int64_t _bim = (bim);\
> > +    (pre) = (_are * _bre - _aim * _bim + 0x4000)>>15;\
> > +    (pim) = (_are * _bim + _aim * _bre + 0x4000)>>15;\
>
> stuff begining with _ is reserved

I coppied this I will check the imdct code because if your saying this
is wrong then its wrong there as well.

>
>
> [...]
>
> > -
> > +#undef random
> >  float frandom(void)
>
Takis fixed this already.


Thanks for your review.
Marc




More information about the ffmpeg-devel mailing list