[FFmpeg-devel] Pipeline: H.264 speed improvements

Vitor Sessak vitor1001
Sun Dec 28 13:47:01 CET 2008


M?ns Rullg?rd wrote:
> Vitor Sessak <vitor1001 at gmail.com> writes:
> 
>> Michael Niedermayer wrote:
>>> On Tue, Dec 23, 2008 at 08:41:00PM +0000, M?ns Rullg?rd wrote:
>>>> Michael Niedermayer <michaelni at gmx.at> writes:
>>>>
>>>>> On Tue, Dec 23, 2008 at 04:08:26AM -0500, Jason Garrett-Glaser wrote:
>>>>>> I've put together a list of all the possible speed improvements I can
>>>>>> see, including both some obvious ones and non-obvious ones.  If you're
>>>>>> interested in implementing anything here, say so to make sure your
>>>>>> work isn't duplicated by Michael or I.  Also feel free to discuss some
>>>>>> of the more nutty ideas, like the VLC table, or tell me that I'm wrong
>>>>>> about something.
>>>>>>
>>>>>> Non-assembly stuff:
>>>>> [...]
>>>>>> av_log2 is unnecessarily powerful for use in h264.c.  All signed
>>>>>> golomb values in H.264 fit in 16-bit, and all unsigned golomb values
>>>>>> other than headers fit in 8-bit.  Thus all ordinary unsigned golomb
>>>>>> code reads can literally be put in a 256-byte VLC table and replaced
>>>>>> with a single array lookup.
>>>>> it may be that all ue golomb coded values are <256 outside the headers,
>>>>> though even this seems wrong for mb_skip_run the way i understand the spec.
>>>>> But a value of 255 corresponds to a 15bit long vlc code.
>>>>> a 256 (or 128) entry LUT limits one to values 0-15 512 (or 1024) to 0-31
>>>>>
>>>>> Now there are surely a few left that are that small but thats far from
>>>>> all non header values.
>>>> av_log2() can be trivially implemented on most CPUs using a count
>>>> leading zeros instruction.  That should be even faster than a table.
>>>> On ARM this instruction takes one cycle.
>>> patch & benchmark are welcome, but note, i dont think av_log2() is
>>> used much in h264.c
> 
> It is used in golomb.h, which used by h264.c.  I can't say what uses
> it, I only know it showed up in a profile run I did with inlining
> disabled.
> 
> I did a quick benchmark with a clz implementation, which quite
> remarkably came out slower, apparently due to gcc inlining changing.
> Randomly forcing inline/noinline on functions in h264.c easily causes
> overall performance to fluctuate 3% or so.
> 
>> Please benchmark also ALAC decoding, it seems av_log2() is much more 
>> speed critical there.
> 
> Can you suggest a good sample to benchmark?

Sure, you can try 
ftp://ffmpeg.org/MPlayer/samples/A-codecs/lossless/luckynight.m4a , but 
also you can use FFmpeg's encoder to generate a longer sample if you prefer.

-Vitor




More information about the ffmpeg-devel mailing list