[FFmpeg-devel] [augustus at linuxhardware.org: SSE4 and FFMPEG]

Loren Merritt lorenm
Tue Oct 30 22:25:57 CET 2007

On Tue, 30 Oct 2007, Zuxy Meng wrote:
> To be serious, only one instruction MPSADBW looks like something
> gorgeous that may boost motion compensation, most others like what
> Loren has said are simple combinations of two or three existing
> instructions to help feed the execution engine faster.

Actually, while I can't be sure until I have a cpu to try it on, I would 
guess that MPSADBW is one of the less useful instructions.
It compares 32 pixels total, as opposed to the 16 in a SSE2 PSADBW. But 
those 32 are split into 8 consecutive mvs for a 4 pixel block, and most 
motion estimation functions don't really want all 8. Exhaustive search 
does, but exhaustive search can be better optimized in other ways that 
result in sparse SADs and thus again little or no benefit from MPSADBW. 
So the only remaining use is to try a new SSE4-only motion estimation 
algorithm that uses an 8x3 window around the previous mv, and see how 
that compares to diamond or hexagon.

>> Furthermore, FFmpeg contains many functions that don't even have SSE2
>> versions. MMX2->SSE2 should make more difference than SSE2->SSE4.
> What about the 8 additional XMM registers under AMD64?

Yes, we could also gain a little speed by writing separate amd64 versions 
of some functions, rather than sharing code with x86.

--Loren Merritt

More information about the ffmpeg-devel mailing list