[FFmpeg-devel] [PATCH] VP8 coeff decoding optimizations

Måns Rullgård mans
Mon Aug 2 22:17:16 CEST 2010


Jason Garrett-Glaser <darkshikari at gmail.com> writes:

> On Mon, Aug 2, 2010 at 5:38 AM, Pascal Massimino
> <pascal.massimino at gmail.com> wrote:
>> Jason,
>>
>> On Mon, Aug 2, 2010 at 1:32 AM, Jason Garrett-Glaser
>> <darkshikari at gmail.com>wrote:
>>
>>> Attached are two mutually exclusive VP8 optimization patches.
>>>
>>> Approach in #1 (test.diff): simplify addressing by eliminating
>>> vp8_coeff_band
>>> Advantage: one less dereference, seems to be slightly faster, but
>>> might depend on the mood of gcc
>>>
>>
>> +1 here. Seems to be a tad faster than test3.diff (gcc 4.2.4 x86-64):
>>
>> current (timing decode_mb_coeffs()):
>> 47533 dezicycles in dec, 131005 runs, 67 skips
>> 47594 dezicycles in dec, 130977 runs, 95 skips
>> 47681 dezicycles in dec, 131003 runs, 69 skips
>> 47503 dezicycles in dec, 130997 runs, 75 skips
>>
>> test.diff
>> 46065 dezicycles in dec, 131004 runs, 68 skips
>> 46009 dezicycles in dec, 130996 runs, 76 skips
>> 46119 dezicycles in dec, 131035 runs, 37 skips
>> 46226 dezicycles in dec, 131000 runs, 72 skips
>>
>> test3.diff:
>> 46255 dezicycles in dec, 131003 runs, 69 skips
>> 46156 dezicycles in dec, 131009 runs, 63 skips
>> 46263 dezicycles in dec, 131017 runs, 55 skips
>
> Anyone want to bench on another arch (ARM)?

Cortex-A8, gcc 4.3.3-cs2009q1:

no patch
1789 dezicycles in dec, 131059 runs, 13 skips
1786 dezicycles in dec, 131069 runs, 3 skips
1786 dezicycles in dec, 131069 runs, 3 skips

test.diff
1728 dezicycles in dec, 131065 runs, 7 skips
1726 dezicycles in dec, 131064 runs, 8 skips
1728 dezicycles in dec, 131067 runs, 5 skips

test3.diff
1780 dezicycles in dec, 131061 runs, 11 skips
1780 dezicycles in dec, 131065 runs, 7 skips
1784 dezicycles in dec, 131069 runs, 3 skips

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list