[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Mon Aug 27 08:37:14 CEST 2007

On 27 August 2007, Mike Giacomelli wrote:

> > As opposed to recent x86 chips, where 32x32 mul is 9 times slower than
> > add?
>
> Modern x86 chips have pipelined adders and multipliers, so the add and
> multiply rate is the same (at least assuming they have equal numbers
> of each).  I believe Intel has been doing this since the pentium pro
> in the mid 90s, and AMD since the K7 in the late 90s.
>
> >Moreover, at least ARM9E and ARM11 cores execute 32x32->64 MAC in 3 cycles
>
> Which is still 3x slower verses adds then a desktop PC made in the
> last 10-15 years, hence my point about not generalizing from x86 too
> quickly.

Yes, that's a good point about not generalizing benchmark results from one
platform to another.

But you discarded the second the second part of my statement about 
16x32 multiplications which have the best precision while still 
keeping 1 instruction per cycle throughput. Is there any chance
of making use of them? Or the precision would be too bad for FFT?

For example, ffmp3 could be accelerated with some (acceptable) loss of
precision:
http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2006-October/016567.html

All in all, maybe it is really a good idea to make some list of possible
targets for fixed point FFT implementation and check what capabilities
they have and what operations are the most efficient there?