[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Trent Piepho xyzzy
Mon Aug 27 20:43:00 CEST 2007


On Mon, 27 Aug 2007, Loren Merritt wrote:
> On Sun, 26 Aug 2007, Mike Giacomelli wrote:
> > Modern x86 chips have pipelined adders and multipliers, so the add and
> > multiply rate is the same (at least assuming they have equal numbers
> > of each).  I believe Intel has been doing this since the pentium pro
> > in the mid 90s, and AMD since the K7 in the late 90s.
>
> But they don't have equal numbers of each.
> Sorry, I screwed up my throughput test. mul is only 3x slower.
>
> Both K8 and Core2 have:
> add is latency 1, throughput 3.
> 32x32->32 mul is latency 3, throughput 1.
> 64x64->64 mul is latency 4, throughput 1.
> 32x32->64 mul is latency 3, and can't be used pipelined due to its use
> of implicit registers.

Doesn't register renaming mean that operations that use the same registers
can be run at the same time?  Since the register are in fact allocated from
a much larger pool of virtual registers, the eax and edx in one instruction
are not necessary the same registers as the eax and edx in another
instruction.




More information about the ffmpeg-devel mailing list