[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions

Pascal Massimino pascal.massimino
Sun Jul 8 17:58:54 CEST 2007


  Hi everybody,

 may i recall some remarks about compliance of the code you're presently
optimizing?

http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2006-November/019051.html

skal

On 7/8/07, Zuxy Meng <zuxy.meng at gmail.com> wrote:
>
> Hi,
>
> 2007/7/8, Zuxy Meng <zuxy.meng at gmail.com>:
> > Hi,
> >
> > 2007/7/7, Christophe GISQUET <christophe.gisquet at free.fr>:
> > > Hello,
> > >
> > > here are the MMX functions now licensed under the MIT license.
> > >
> > > Zuxy Meng has been working on SSE2 versions of those; I'm not sure if
> he
> > > would agree to contribute to this file using MIT license. In that
> case,
> > > I don't mind the license being changed, but I would prefer having the
> > > MIT licensing available in the svn history.
> >
> > I care less about license issues than raw performance :-)
> >
> > I did a quick test on 64-bit K8 tonight thanks to Stephan's testbed.
> > The result wasn't promising. In short, from fastest to slowest:
> > MMX > SSE2 w/o sw pipeling > SSE2 w/ sw pipeling
> >
> > The reason may be that on K8 SSE2 is thoughput bound (K8 can decode 3
> > MMX instructions per cycle, but only 1.5 SSE2 ones), and sw pipeling
> > increase the # of instructions per loop. If AMD does what they've
> > promised on their upcoming K10, I guess the result will be:
> > SSE2 w/o sw pipeling > SSE2 w/ sw pipeling > MMX
> >
> > And IIRC on your 32-bit Conroe, where SSE2 is latency bound (punpcklbw
> > and unaligned movq are slow), the list is somewhat different:
> > SSE2 w/ sw pipeling > MMX > SSE2 w/o sw pipeling
> >
> > On my Dothan:
> > MMX > SSE2 w/ sw pipeling > SSE2 w/o sw pipeling
> >
> > So the conclusion is that I can't make  a conclusion. Any suggestions?
>
> I just tried to unroll the loop so the # of instructions per loop
> remains the same after being sw pipelined and the speed improves a
> little bit:
>
> Now SSE2 is about the same speed as MMX (+- 0.5%) both on my Dothan
> and Stephan's 64-bit K8.
>
> Attached isn't against Christophe's newest version and may look ugly,
> but it serves as base for further improvement.
> --
> Zuxy
> Beauty is truth,
> While truth is beauty.
> PGP KeyID: E8555ED6
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> http://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>
>




More information about the ffmpeg-devel mailing list