[FFmpeg-devel] [PATCH 2/2] mpegaudiodec: add SSE-optimized imdct36()
lorenm at u.washington.edu
Sun Aug 28 02:37:48 CEST 2011
On Sat, 27 Aug 2011, Vitor Sessak wrote:
> %macro PSHUFD_AVX 3
> shufps %1, %2, %2, %3
This can serve as sse1 too.
>>> %macro SWAP_64BITS 2
>>> %ifdef ARCH_X86_64
>>> SWAP %1, %2
>> What good is this doing? There's no %else, so the code must also work
>> (with no extra instructions) if you don't swap...?
> I was hoping that swapping the temp variable in code like
> mova m5, m0
> addps m5, m1
> mulps m2, m5
> SWAP_64BITS m5, m10
> mova m5, m3
> addps m5, m6
> mulps m7, m5
> would allow a x32_64 CPU to use out-of-order execution to interleave
> the two blocks of instructions in any order.
Unnecessary. Every x86 cpu that supports out of order execution also
supports register renaming.
Equivalently, the x86 pipeline really uses static-single-assignment, with
the output value of every instruction remaining available even if some
later instruction overwrites the same variable name.
More information about the ffmpeg-devel