[FFmpeg-devel] [augustus at linuxhardware.org: SSE4 and FFMPEG]
Tue Oct 30 22:25:57 CET 2007
On Tue, 30 Oct 2007, Zuxy Meng wrote:
> To be serious, only one instruction MPSADBW looks like something
> gorgeous that may boost motion compensation, most others like what
> Loren has said are simple combinations of two or three existing
> instructions to help feed the execution engine faster.
Actually, while I can't be sure until I have a cpu to try it on, I would
guess that MPSADBW is one of the less useful instructions.
It compares 32 pixels total, as opposed to the 16 in a SSE2 PSADBW. But
those 32 are split into 8 consecutive mvs for a 4 pixel block, and most
motion estimation functions don't really want all 8. Exhaustive search
does, but exhaustive search can be better optimized in other ways that
result in sparse SADs and thus again little or no benefit from MPSADBW.
So the only remaining use is to try a new SSE4-only motion estimation
algorithm that uses an 8x3 window around the previous mv, and see how
that compares to diamond or hexagon.
>> Furthermore, FFmpeg contains many functions that don't even have SSE2
>> versions. MMX2->SSE2 should make more difference than SSE2->SSE4.
> What about the 8 additional XMM registers under AMD64?
Yes, we could also gain a little speed by writing separate amd64 versions
of some functions, rather than sharing code with x86.
More information about the ffmpeg-devel