[FFmpeg-devel] [PATCH 2/2] mpegaudiodec: add SSE-optimized imdct36()
michaelni at gmx.at
Wed Aug 31 04:06:07 CEST 2011
On Sun, Aug 28, 2011 at 10:46:59AM +0200, Vitor Sessak wrote:
> On Sun, Aug 28, 2011 at 2:37 AM, Loren Merritt <lorenm at u.washington.edu> wrote:
> > On Sat, 27 Aug 2011, Vitor Sessak wrote:
> >> %macro PSHUFD_AVX 3
> >> shufps %1, %2, %2, %3
> >> %endmacro
> > This can serve as sse1 too.
> >>>> %macro SWAP_64BITS 2
> >>>> %ifdef ARCH_X86_64
> >>>> SWAP %1, %2
> >>>> %endif
> >>>> %endmacro
> >>> What good is this doing? There's no %else, so the code must also work
> >>> (with no extra instructions) if you don't swap...?
> >> I was hoping that swapping the temp variable in code like
> >> mova m5, m0
> >> addps m5, m1
> >> mulps m2, m5
> >> SWAP_64BITS m5, m10
> >> mova m5, m3
> >> addps m5, m6
> >> mulps m7, m5
> >> would allow a x32_64 CPU to use out-of-order execution to interleave
> >> the two blocks of instructions in any order.
> > Unnecessary. Every x86 cpu that supports out of order execution also
> > supports register renaming.
> > Equivalently, the x86 pipeline really uses static-single-assignment, with
> > the output value of every instruction remaining available even if some
> > later instruction overwrites the same variable name.
> Ok, removed it.
> libavcodec/x86/Makefile | 1
> libavcodec/x86/imdct36_sse.asm | 363 ++++++++++++++++++++++++++++++++++++++
> libavcodec/x86/mpegaudiodec_mmx.c | 12 +
> libavutil/x86/x86inc.asm | 2
> 4 files changed, 378 insertions(+)
> 969de5b59e5dfba7cfda2b080e41b72c478982d7 0002-mpegaudiodec-add-SSE-optimized-imdct36.patch
> From 0d7fb2081b572e89521e480407c86d6768f23eb8 Mon Sep 17 00:00:00 2001
> From: Vitor Sessak <vitor1001 at gmail.com>
> Date: Mon, 22 Aug 2011 07:59:46 +0200
> Subject: [PATCH 2/2] mpegaudiodec: add SSE-optimized imdct36()
patch LGTM, feel free to push it to ffmpeg git
further improvments very welcome too!
and thanks alot for the work
and thanks to loren for the review
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Rewriting code that is poorly written but fully understood is good.
Rewriting code that one doesnt understand is a sign that one is less smart
then the original author, trying to rewrite it will not make it better.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel