[Ffmpeg-devel] MMX/MMX2 and SSE optimizations for H.264 decoding
Mon Sep 19 17:29:10 CEST 2005
On Mon, 19 Sep 2005, Martin Boehme wrote:
> Michael Niedermayer wrote:
>> On Fri, Sep 16, 2005 at 02:20:42PM +0200, Martin Boehme wrote:
>>> Loren Merritt wrote:
>>>> On AMD, most SSE2 instructions take exactly twice as long as the
>>>> equivalent MMX instruction. Any speedups are due only to scheduling.
>>>> In x264, we have a bunch of SSE2 functions, but most of them are _slower_
>>>> than the MMX versions on AMD.
>>> Interesting -- wasn't aware of that. I would assume that the AMD
>>> processors only have enough execution units for 64 bits worth of data and
>>> have to do SSE operations in two gos?
>> dunno but
>> AFAIK the P4 (at least the older ones) have 2 MMX units running at half the
>> cpu clock speed so they can execute either 1 MMX instruction per clock or
>> 1 SSE(2) every 2 clocks, with a very small number of exceptions
>> further note that execution itself isnt the only thing which can be a
>> bottleneck ...
> Interesting, wasn't aware of that... it's probably chip space considerations
> that play into that, given that there shouldn't be aren't any dependencies
> between the individual "elements" of the vector units?
Isn't that the obvious way to do it? If you have the hardware to execute
1 SSE2 instruction at a time, why shouldn't it be able to do 2 MMX?
(assuming no other bottlenecks, of course)
More information about the ffmpeg-devel