[FFmpeg-devel] [PATCH] H264 MC8 SSSE3 minor speedups

Ronald S. Bultje rsbultje
Sat Dec 18 02:35:53 CET 2010


Hi again,

On Fri, Dec 17, 2010 at 8:28 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Then the other change (remove movdqa), before (with above change included):
> 1004 dezicycles in mc8, 131070 runs, 2 skips
> 1008 dezicycles in mc8, 131066 runs, 6 skips
> 996 dezicycles in mc8, 131068 runs, 4 skips
> 1000 dezicycles in mc8, 131068 runs, 4 skips
> 1055 dezicycles in mc8, 131065 runs, 7 skips
> 1006 dezicycles in mc8, 131069 runs, 3 skips
> after:
> 1007 dezicycles in mc8, 131070 runs, 2 skips
> 1005 dezicycles in mc8, 131067 runs, 5 skips
> 1017 dezicycles in mc8, 131068 runs, 4 skips
> 1008 dezicycles in mc8, 131064 runs, 8 skips
> 990 dezicycles in mc8, 131070 runs, 2 skips
> 1014 dezicycles in mc8, 131067 runs, 5 skips
>
> So confusingly, the 2nd change appears to not be faster. Also binary
> size is the same (probably b/c of alignment further down). What to do?

I suck at profiling. Here's the correct data, but it's still not faster.
Before:
578 dezicycles in mc8, 262135 runs, 9 skips
578 dezicycles in mc8, 262135 runs, 9 skips
588 dezicycles in mc8, 262133 runs, 11 skips
581 dezicycles in mc8, 262140 runs, 4 skips
570 dezicycles in mc8, 262136 runs, 8 skips
577 dezicycles in mc8, 262138 runs, 6 skips
After:
576 dezicycles in mc8, 262133 runs, 11 skips
579 dezicycles in mc8, 262131 runs, 13 skips
577 dezicycles in mc8, 262138 runs, 6 skips
577 dezicycles in mc8, 262134 runs, 10 skips
583 dezicycles in mc8, 262135 runs, 9 skips
581 dezicycles in mc8, 262135 runs, 9 skips

Ronald



More information about the ffmpeg-devel mailing list