[FFmpeg-devel] [PATCH] Extra build options for ALS (and others)

Måns Rullgård mans
Fri Nov 27 17:39:30 CET 2009


Thilo Borgmann <thilo.borgmann at googlemail.com> writes:

> M?ns Rullg?rd schrieb:
>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>> 
>>> M?ns Rullg?rd schrieb:
>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>>
>>>>> M?ns Rullg?rd schrieb:
>>>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> recently the need for an extra build option for the ALS decoder arose.
>>>>>> Is it impossible to achieve the desired outcome with some combination
>>>>>> of always_inline, noinline, and flatten attributes?
>>>>> No. See [PATCH] Split reading and decoding of blocks in ALS.
>>>>>
>>>>> Although I've managed to have the functions from the alsdec.c inlined
>>>>> manually according to the grep'ed output of the assembler code, it seems
>>>>> like it is not enough to manually inline functions from within that .c
>>>>> file only using these technique.
>>>> I'm confused.  Can it be done in the C code only or not?  This kind of
>>>> issue should really not be solved in the makefile.
>>> The issue is the big slowdown. The patch that causes this splits a big
>>> function into two, which are then called successively.
>>>
>>> To overcome the slowdown issue, I inspected the functions being inlined
>>> with and without the -finline-limit option. I can use av_always_inline
>>> for many functions within alsdec.c to have the same functions inlined
>>> like -finline-limit does.
>>>
>>> Unfortunately, using -finline-limit removes the slowdown introduced by
>>> the patch while using av_always_inline does not.
>> 
>> So it's not doing the same thing.  What is it doing differently?
>> Where did you get the limit number from?
>> 
>
> All function calls within alsdec.s when using -finline-limit=4096:
>    1 	call	L1102
>    1 	call	L138
>    1 	call	L456
>    2 	call	L___udivdi3$stub
>   10 	call	L_av_freep$stub
>    1 	call	L_av_get_bits_per_sample_format$stub
>   12 	call	L_av_log$stub
>    5 	call	L_av_log_missing_feature$stub
>    8 	call	L_av_malloc$stub
>    2 	call	L_av_mallocz$stub
>    1 	call	L_ff_mpeg4audio_get_config$stub
>    6 	call	L_memcpy$stub
>    2 	call	L_memmove$stub
>    1 	call	L_memset$stub
>    2 	call	_decode_blocks_ind
>    4 	call	_decode_end
>   36 	call	_decode_rice
>   10 	call	_get_bits_long
>   11 	call	_parse_bs_info
>    2 	call	_zero_remaining
>
> All function calls within alsdec.s when using many av_always_inline's.
> This is designed to inline the same functions from alsdec.c like the
> unpatched alsdec.c would yield without any extra build option:
>    1 	call	L1561
>    1 	call	L176
>    1 	call	L21
>    2 	call	L___udivdi3$stub
>   10 	call	L_av_freep$stub
>    1 	call	L_av_get_bits_per_sample_format$stub
>   13 	call	L_av_log$stub
>    5 	call	L_av_log_missing_feature$stub
>    8 	call	L_av_malloc$stub
>    2 	call	L_av_mallocz$stub
>    1 	call	L_ff_mpeg4audio_get_config$stub
>    1 	call	L_memcpy$stub
>    1 	call	L_memmove$stub
>    2 	call	L_memset$stub
>    8 	call	___inline_memcpy_chk
>    2 	call	___inline_memmove_chk
>    6 	call	_align_get_bits
>    5 	call	_av_ceil_log2
>    4 	call	_av_clip
>    4 	call	_decode_end
>   47 	call	_get_bits
>   90 	call	_get_bits1
>    3 	call	_get_bits_count
>   61 	call	_get_bits_left
>   39 	call	_get_bits_long
>    4 	call	_get_sbits_long
>   60 	call	_get_unary
>    2 	call	_init_get_bits
>    3 	call	_parse_bs_info
>    3 	call	_read_time
>    7 	call	_skip_bits
>    2 	call	_skip_bits1
>    5 	call	_skip_bits_long

Not inlining those get_bits etc will certainly slow things down,
that's for sure.

> So -finline-limit can inline many functions in the object file which are
> not part of alsdec.c. Which might be the reason for the performance
> difference.
>
> But using -finline-limit does not yield a speed gain for the unpatched
> file! So there might be something else but I don't see.
>
> The value of 4096 has been choosen randomly. As long as I don't know
> exactly why -finline-limit removes the slowdown and that it cannot be
> replaced by another approach, there is no need to figure out a more
> optimal value...

We should do some benchmarks using that flag globally and see what
happens.  Maybe we'd gain from using it everywhere.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list