[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions

Zuxy Meng zuxy.meng
Mon Jul 9 05:09:04 CEST 2007


Hi,

2007/7/9, Guillaume POIRIER <poirierg at gmail.com>:
> Hi,
>
> On 7/8/07, Zuxy Meng <zuxy.meng at gmail.com> wrote:
> > 2007/7/8, Christophe GISQUET <christophe.gisquet at free.fr>:
> > > Zuxy Meng a ?crit :
> > > > I did a quick test on 64-bit K8 tonight thanks to Stephan's testbed.
> > >
> > > And myself on a x86-64 core2 system.
> > >
> > > > The result wasn't promising. In short, from fastest to slowest:
> > > > MMX > SSE2 w/o sw pipeling > SSE2 w/ sw pipeling
> > >
> > > I haven't tested the mid-performer, but I can confirm this. Using
> > > START/STOP_TIMER, the figures are (on a 1080p sequence): ~2800
> > > dezicycles for MMX, ~3800 for SSE2.
> >
> > I doubt if there's anything wrong. IIRC 32-bit SSE2 (w/ sw pipelining)
> > is faster than MMX on your Conroe. How can it be more than 25% slower
> > under 64-bit?
>
> That's indeed surprising. The only difference that I know of on Conroe
> in 64bits mode is that less micro-op fusion take place, if at all.

%s/micro/macro IIRC.
> That shouldn't be the cause for that huge slowdown however IMHO.

Indeed, since this piece of code doesn't benefit from macro-fusion,
whose first instruction has to be CMP or TEST while the second be
JA(E)/JB(E).
-- 
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6




More information about the ffmpeg-devel mailing list