[FFmpeg-devel] [PATCH] H264 MC8 SSSE3 minor speedups

Ronald S. Bultje rsbultje
Sat Dec 18 02:28:55 CET 2010


Hi,

On Sat, Aug 21, 2010 at 1:18 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
> On Sat, 21 Aug 2010, Ronald S. Bultje wrote:
>
>> 604 dezicycles in w=8, 65535 runs, 1 skips
>> 603 dezicycles in w=8, 131067 runs, 5 skips
>> 606 dezicycles in w=8, 262137 runs, 7 skips
>> 606 dezicycles in w=8, 524275 runs, 13 skips
>> 605 dezicycles in w=8, 1048552 runs, 24 skips
>
> Bad benchmark technique. You should report only the last dezicycle line
> (i.e. the one with the highest # of runs, which includes all the previous
> data). But run the whole program multiple times, and report the last line
> from each.

Late...

first change (movq+mohlhps -> movdqa, before
532 dezicycles in mc8, 524271 runs, 17 skips
532 dezicycles in mc8, 524273 runs, 15 skips
539 dezicycles in mc8, 524267 runs, 21 skips
537 dezicycles in mc8, 524272 runs, 16 skips
532 dezicycles in mc8, 524274 runs, 14 skips
538 dezicycles in mc8, 524274 runs, 14 skips
after
533 dezicycles in mc8, 524278 runs, 10 skips
528 dezicycles in mc8, 524267 runs, 21 skips
527 dezicycles in mc8, 524272 runs, 16 skips
525 dezicycles in mc8, 524269 runs, 19 skips
525 dezicycles in mc8, 524274 runs, 14 skips
530 dezicycles in mc8, 524276 runs, 12 skips

So a little (~1 cycle) faster.

Then the other change (remove movdqa), before (with above change included):
1004 dezicycles in mc8, 131070 runs, 2 skips
1008 dezicycles in mc8, 131066 runs, 6 skips
996 dezicycles in mc8, 131068 runs, 4 skips
1000 dezicycles in mc8, 131068 runs, 4 skips
1055 dezicycles in mc8, 131065 runs, 7 skips
1006 dezicycles in mc8, 131069 runs, 3 skips
after:
1007 dezicycles in mc8, 131070 runs, 2 skips
1005 dezicycles in mc8, 131067 runs, 5 skips
1017 dezicycles in mc8, 131068 runs, 4 skips
1008 dezicycles in mc8, 131064 runs, 8 skips
990 dezicycles in mc8, 131070 runs, 2 skips
1014 dezicycles in mc8, 131067 runs, 5 skips

So confusingly, the 2nd change appears to not be faster. Also binary
size is the same (probably b/c of alignment further down). What to do?
(Patch against yasm attached.)

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mc8_speedups.patch
Type: application/octet-stream
Size: 2548 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101217/a6ab293c/attachment.obj>



More information about the ffmpeg-devel mailing list