[Ffmpeg-devel] MMX optimixation for get_amv() in libavcodec/h263.c

Andrew Savchenko Bircoph
Thu Apr 19 15:24:01 CEST 2007


I optimized one FIXME in h263.c in get_amv().
Unfortunately, I failed to find or create video material where this 
function is used during decoding, so syntetic tests was used. If 
someone can provide me a link to such video or point me a way to 
create such video, it'll be great.

Changes that was made for syntetic test benchmarks are in 

First patch (h263_mmx_16bit.diff) use 16 bits for sum "variables", 
thus operations such as shifts and summation can be perfomed on 4 
values by single instruction. But I'm afraid that in real decoding 
sum value may be overflow. So I made the second patch 
(h263_mmx_32bit.diff) to eliminate this problem. Obviously it is 
slower, because MMX instructions can take only 2 32-bit values at 

Testing was done on AthlonXP. Internal loop in 1st patch is 
totally unrolled, because this provide the best perfomance in 
comparision to untouched and partially unrolled loop (probably due 
to better pipeline utilization). 

In the 2nd patch internal loop is unrolled only partially, futher 
unrolling brings no additional perfomance within measurement 
errors. Also %%eax was used for multiplication, because MMX can 
multiply only 16-bit values and can't unpack *signed* value from 
word to double word.

There is benchmark results summary, oprofile was used as profiler:
========= mean value =========== standard deviation ===========
C:               38591                                322
mmx_16:    5790                                   38
mmx_32:    10836                                 66

So, if sum is known to fit in 16 bit (indeed it can be slightly 
larger, up to 17 bits, but it is hard to set exact treshold), 1st 
patch is highly preffered.

P.S. While not related to the pacth, I like to ask some 
development-related questions.

Can someone point me to SSE instruction set guide from AMD? Is this 
one ever exists? I'm not sure that intel's descriptions and  
perfomance recomendations for SSE are appliable for AMD 
processors. Now I have only guides for mmx, 3dnow!, mmext/3dnowext 
instructions sets from AMD and optimization guide for Athlon (pub. 
20726, 21928, 22466 and 22007 respectively).

Is there any convenient way to debug asm inlines using gdb or so 
on? Is it possible to step asm instructions, examine registers and 
so on?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070419/2fa02248/attachment.pgp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h263_syntetic.diff
Type: text/x-diff
Size: 433 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070419/2fa02248/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h263_mmx_32bit.diff
Type: text/x-diff
Size: 2960 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070419/2fa02248/attachment-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h263_mmx_16bit.diff
Type: text/x-diff
Size: 3889 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070419/2fa02248/attachment-0002.diff>

More information about the ffmpeg-devel mailing list