[FFmpeg-devel] [HACK] 50% faster H.264 decoding

Jason Garrett-Glaser darkshikari
Fri Aug 20 01:33:53 CEST 2010


On Thu, Aug 19, 2010 at 4:30 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Thu, Aug 19, 2010 at 7:00 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> can you show benchmarks of w=2 without limiting to mx/my=0
>> we know the 00 case will be faster if its optimized by adding a special
>> case but we dont know if the additional branch mispredictions and code
>> cache pressure will be less than that gain so i think this should be
>> tested
>
> After (my local tree + all patches):
> 757 dezicycles in w=2, 8191 runs, 1 skips
> 731 dezicycles in w=2, 16383 runs, 1 skips
> 735 dezicycles in w=2, 32767 runs, 1 skips
> 723 dezicycles in w=2, 65535 runs, 1 skips
> 722 dezicycles in w=2, 131068 runs, 4 skips
> 718 dezicycles in w=2, 262136 runs, 8 skips
> 717 dezicycles in w=2, 524272 runs, 16 skips
>
> Before (i.e. current SVN):
> 537 dezicycles in w=2, 8192 runs, 0 skips
> 521 dezicycles in w=2, 16384 runs, 0 skips
> 518 dezicycles in w=2, 32767 runs, 1 skips
> 509 dezicycles in w=2, 65535 runs, 1 skips
> 506 dezicycles in w=2, 131068 runs, 4 skips
> 507 dezicycles in w=2, 262140 runs, 4 skips
> 505 dezicycles in w=2, 524279 runs, 9 skips
>
> Hm... That's weird, how's that possible? Would this be solved by
> adding more specialized paths for 1D, or is this just "too
> insignificant gain" compared to the added complexity (= misprediction
> or so)?

Yes, because mc2 takes so few clocks to begin with that a branch to
save a tiny bit of time isn't worth it.

Dark Shikari



More information about the ffmpeg-devel mailing list