[Ffmpeg-devel] [REQUEST] MMX/MMX2 and SSE optimizations for H.264 decoding

Michael Niedermayer michaelni
Fri Sep 16 17:42:27 CEST 2005


On Fri, Sep 16, 2005 at 02:20:42PM +0200, Martin Boehme wrote:
> Loren Merritt wrote:
> >On Thu, 15 Sep 2005, Martin Boehme wrote:
> >
> >>Gamester17 wrote:
> >>
> >>>Yes there already are some MMX integer optimization for H264 but what 
> >>>about SSE (Streaming SIMD Extensions) optimizations?, isn't SSE 
> >>>suppose to be much more powerfull than MMX (and in fact be the thing 
> >>>that replaces MMX)?
> >>
> >>
> >>Well, for a start, SSE has registers that are 128 bits wide, while 
> >>MMX's registers are 64 bits. As long as you're operating only on the 
> >>registers (i.e. you're CPU-bound, not memory bandwidth limited) that's 
> >>an instant factor of 2 speedup.
> >
> >On AMD, most SSE2 instructions take exactly twice as long as the 
> >equivalent MMX instruction. Any speedups are due only to scheduling.
> >In x264, we have a bunch of SSE2 functions, but most of them are 
> >_slower_ than the MMX versions on AMD.
> Interesting -- wasn't aware of that. I would assume that the AMD 
> processors only have enough execution units for 64 bits worth of data 
> and have to do SSE operations in two gos?

dunno but
AFAIK the P4 (at least the older ones) have 2 MMX units running at half the
cpu clock speed so they can execute either 1 MMX instruction per clock or
1 SSE(2) every 2 clocks, with a very small number of exceptions
further note that execution itself isnt the only thing which can be a 
bottleneck ...


More information about the ffmpeg-devel mailing list