[FFmpeg-devel] [PATCH 2/2] mpegaudiodec: add SSE-optimized imdct36()
vitor1001 at gmail.com
Sun Aug 28 10:46:59 CEST 2011
On Sun, Aug 28, 2011 at 2:37 AM, Loren Merritt <lorenm at u.washington.edu> wrote:
> On Sat, 27 Aug 2011, Vitor Sessak wrote:
>> %macro PSHUFD_AVX 3
>> shufps %1, %2, %2, %3
> This can serve as sse1 too.
>>>> %macro SWAP_64BITS 2
>>>> %ifdef ARCH_X86_64
>>>> SWAP %1, %2
>>> What good is this doing? There's no %else, so the code must also work
>>> (with no extra instructions) if you don't swap...?
>> I was hoping that swapping the temp variable in code like
>> mova m5, m0
>> addps m5, m1
>> mulps m2, m5
>> SWAP_64BITS m5, m10
>> mova m5, m3
>> addps m5, m6
>> mulps m7, m5
>> would allow a x32_64 CPU to use out-of-order execution to interleave
>> the two blocks of instructions in any order.
> Unnecessary. Every x86 cpu that supports out of order execution also
> supports register renaming.
> Equivalently, the x86 pipeline really uses static-single-assignment, with
> the output value of every instruction remaining available even if some
> later instruction overwrites the same variable name.
Ok, removed it.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 11857 bytes
Desc: not available
More information about the ffmpeg-devel