[FFmpeg-devel] Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Mon Jul 30 00:02:19 CEST 2007

On 7/29/07, Trent Piepho <xyzzy at speakeasy.org> wrote:
> On Sun, 29 Jul 2007, Marc Hoffman wrote:
> > On 7/29/07, Trent Piepho <xyzzy at speakeasy.org> wrote:
> > > On Sun, 29 Jul 2007, Loren Merritt wrote:
> > > > On Sat, 28 Jul 2007, Marc Hoffman wrote:
> > > > > On 7/27/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > > >> On Fri, Jul 27, 2007 at 09:56:34PM -0400, Marc Hoffman wrote:
> > > > >>> On 7/27/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > > >>>> On Fri, Jul 27, 2007 at 05:40:14PM -0400, mmh wrote:
> > > > >>>> [...]
> > > > >>>>> +/*
> > > > >>>>> +  This is a fixedpoint inplace 32bit FFT which accepts 3 arguments:
> > > > >>>>> +
> > > > >>>>> +  @param X   - input signal in format 1.31
> > > > >>>>> +  @param W   - phase factors in 1.31 format
> > > > >>>>> +  @param lgN - log_2(N) where N is the size of the input data set.
> > > > >>>>> +
> > > > >>>>> +*/
> > > > >>>>> +void ff_fft32 (FFTContext *c, FFTComplex32 *X)
> > > > >>>>> +{
> > > > >>>>> +    FFTComplex16 *W = c->exptab16;
> > > > >>>>> +    int lgN    = c->nbits;
> > > > >>>>> +    int N      = 1<<lgN;
> > > > >>>>> +    int s, j, k, m, w, ws;
> > > > >>>>> +    int64_t wwr,wwi;
> > > > >>>>> +
> > > > >>>>> +    for (s=1; s<=lgN; s++) {
> > > > >>>>> +        m  = 1<<s;
> > > > >>>>> +        ws = 1<<(lgN-s);
> > > > >>>>> +        w=0;
> > > > >>>>> +        for (j=0; j<m/2; j++) {
> > > > >>>>> +            wwr=W[w].re;
> > > > >>>>> +            wwi=W[w].im;
> > > > >>>>> +            w+=ws;
> > > > >>>>> +            for (k=j; k<N; k+=m) {
> > > > >>>>> +                long tr,ti;
> > > > >>>>> +                int k2 = k+m/2;
> > > > >>>>
> > > > >>>> signed and /2 is slow either changed to unsiged or use >>1
> > > > >>>>
> > > > >>> Compiler should take care of this via strength reduction and to boot
> > > > >>> its a constant in the loop. I think m/2 is ok but will change if you
> > > > >>> want me too semantics are identical.
> > > > >>
> > > > >> you put a lot of trust in the compiler, this is not a simple optimization
> > > > >> it has to proof m can not reach values where m>>1 != m/2
> > > > >
> > > > > Come on this is trivial...
> > > >
> > > > Nonetheless, gcc 4.1.2 fails to make this optimization, even in cases much
> > > > more trivial than yours.
> > >
> > > For signed values, it's not a valid optimization, unless gcc is somehow
> > > supposed to know that it can't be negative.  Change it to unsigned and gcc
> > > will use shifting.
> > >
> > > I'd be tempted to try lgN-- before the first loop and change s to start at
> > > 0.  ws will be the same, but m will be 1/2 its previous value.  Now all the
> > > m/2 can be changed to just m, and k+=m becomes k+=m+m (which can be
> > > calculated in a single lea instruction, e.g. "lea (%[k],%[m],2), %[k]").
> > > Rather than worry about gcc turning the division into a shift, just get rid
> > > of the operation entirely.
> >
> > Trent thanks I'm shocked gcc doesn't know the value is never negative
> > it actually knows everything about this particular variable.  But I
> > guess as Loren points out with the output of the compiler the signed
> > arithmetic is inefficient.
>
> In your code, m could be negative, if c->nbits is 31.  Though in Loren's
> example gcc does know the value is never negative, but the strength
> reduction code doesn't seem to take that into account.  If you do:
>
> for(i=0;i<100;i++) if(i>1000) bar();
>
> gcc is smart enough to completely optimize the if statement away.

So much stuff to keep in mind when coding C for performance. Thanks
for helping me understand this in more detail.