[Ffmpeg-devel] [REQUEST] MMX/MMX2 and SSE optimizations for H.264 decoding
Fri Sep 16 17:42:27 CEST 2005
On Fri, Sep 16, 2005 at 02:20:42PM +0200, Martin Boehme wrote:
> Loren Merritt wrote:
> >On Thu, 15 Sep 2005, Martin Boehme wrote:
> >>Gamester17 wrote:
> >>>Yes there already are some MMX integer optimization for H264 but what
> >>>about SSE (Streaming SIMD Extensions) optimizations?, isn't SSE
> >>>suppose to be much more powerfull than MMX (and in fact be the thing
> >>>that replaces MMX)?
> >>Well, for a start, SSE has registers that are 128 bits wide, while
> >>MMX's registers are 64 bits. As long as you're operating only on the
> >>registers (i.e. you're CPU-bound, not memory bandwidth limited) that's
> >>an instant factor of 2 speedup.
> >On AMD, most SSE2 instructions take exactly twice as long as the
> >equivalent MMX instruction. Any speedups are due only to scheduling.
> >In x264, we have a bunch of SSE2 functions, but most of them are
> >_slower_ than the MMX versions on AMD.
> Interesting -- wasn't aware of that. I would assume that the AMD
> processors only have enough execution units for 64 bits worth of data
> and have to do SSE operations in two gos?
AFAIK the P4 (at least the older ones) have 2 MMX units running at half the
cpu clock speed so they can execute either 1 MMX instruction per clock or
1 SSE(2) every 2 clocks, with a very small number of exceptions
further note that execution itself isnt the only thing which can be a
More information about the ffmpeg-devel