[FFmpeg-devel] [PATCH] Extra build options for ALS (and others)

Michael Niedermayer michaelni
Mon Nov 30 20:22:42 CET 2009


On Mon, Nov 30, 2009 at 04:09:23PM +0100, Thilo Borgmann wrote:
> Thilo Borgmann schrieb:
> > M?ns Rullg?rd schrieb:
> >> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
> >>
> >>> M?ns Rullg?rd schrieb:
> >>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
> >>>>
> >>>>> M?ns Rullg?rd schrieb:
> >>>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
> >>>>>>
> >>>>>>> M?ns Rullg?rd schrieb:
> >>>>>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> recently the need for an extra build option for the ALS decoder arose.
> >>>>>>>> Is it impossible to achieve the desired outcome with some combination
> >>>>>>>> of always_inline, noinline, and flatten attributes?
> >>>>>>> No. See [PATCH] Split reading and decoding of blocks in ALS.
> >>>>>>>
> >>>>>>> Although I've managed to have the functions from the alsdec.c inlined
> >>>>>>> manually according to the grep'ed output of the assembler code, it seems
> >>>>>>> like it is not enough to manually inline functions from within that .c
> >>>>>>> file only using these technique.
> >>>>>> I'm confused.  Can it be done in the C code only or not?  This kind of
> >>>>>> issue should really not be solved in the makefile.
> >>>>> The issue is the big slowdown. The patch that causes this splits a big
> >>>>> function into two, which are then called successively.
> >>>>>
> >>>>> To overcome the slowdown issue, I inspected the functions being inlined
> >>>>> with and without the -finline-limit option. I can use av_always_inline
> >>>>> for many functions within alsdec.c to have the same functions inlined
> >>>>> like -finline-limit does.
> >>>>>
> >>>>> Unfortunately, using -finline-limit removes the slowdown introduced by
> >>>>> the patch while using av_always_inline does not.
> >>>> So it's not doing the same thing.  What is it doing differently?
> >>>> Where did you get the limit number from?
> >>>>
> >>> All function calls within alsdec.s when using -finline-limit=4096:
> >>>    1 	call	L1102
> >>>    1 	call	L138
> >>>    1 	call	L456
> >>>    2 	call	L___udivdi3$stub
> >>>   10 	call	L_av_freep$stub
> >>>    1 	call	L_av_get_bits_per_sample_format$stub
> >>>   12 	call	L_av_log$stub
> >>>    5 	call	L_av_log_missing_feature$stub
> >>>    8 	call	L_av_malloc$stub
> >>>    2 	call	L_av_mallocz$stub
> >>>    1 	call	L_ff_mpeg4audio_get_config$stub
> >>>    6 	call	L_memcpy$stub
> >>>    2 	call	L_memmove$stub
> >>>    1 	call	L_memset$stub
> >>>    2 	call	_decode_blocks_ind
> >>>    4 	call	_decode_end
> >>>   36 	call	_decode_rice
> >>>   10 	call	_get_bits_long
> >>>   11 	call	_parse_bs_info
> >>>    2 	call	_zero_remaining
> >>>
> >>> All function calls within alsdec.s when using many av_always_inline's.
> >>> This is designed to inline the same functions from alsdec.c like the
> >>> unpatched alsdec.c would yield without any extra build option:
> >>>    1 	call	L1561
> >>>    1 	call	L176
> >>>    1 	call	L21
> >>>    2 	call	L___udivdi3$stub
> >>>   10 	call	L_av_freep$stub
> >>>    1 	call	L_av_get_bits_per_sample_format$stub
> >>>   13 	call	L_av_log$stub
> >>>    5 	call	L_av_log_missing_feature$stub
> >>>    8 	call	L_av_malloc$stub
> >>>    2 	call	L_av_mallocz$stub
> >>>    1 	call	L_ff_mpeg4audio_get_config$stub
> >>>    1 	call	L_memcpy$stub
> >>>    1 	call	L_memmove$stub
> >>>    2 	call	L_memset$stub
> >>>    8 	call	___inline_memcpy_chk
> >>>    2 	call	___inline_memmove_chk
> >>>    6 	call	_align_get_bits
> >>>    5 	call	_av_ceil_log2
> >>>    4 	call	_av_clip
> >>>    4 	call	_decode_end
> >>>   47 	call	_get_bits
> >>>   90 	call	_get_bits1
> >>>    3 	call	_get_bits_count
> >>>   61 	call	_get_bits_left
> >>>   39 	call	_get_bits_long
> >>>    4 	call	_get_sbits_long
> >>>   60 	call	_get_unary
> >>>    2 	call	_init_get_bits
> >>>    3 	call	_parse_bs_info
> >>>    3 	call	_read_time
> >>>    7 	call	_skip_bits
> >>>    2 	call	_skip_bits1
> >>>    5 	call	_skip_bits_long
> >> Not inlining those get_bits etc will certainly slow things down,
> >> that's for sure.
> >>
> >>> So -finline-limit can inline many functions in the object file which are
> >>> not part of alsdec.c. Which might be the reason for the performance
> >>> difference.
> >>>
> >>> But using -finline-limit does not yield a speed gain for the unpatched
> >>> file! So there might be something else but I don't see.
> >>>
> >>> The value of 4096 has been choosen randomly. As long as I don't know
> >>> exactly why -finline-limit removes the slowdown and that it cannot be
> >>> replaced by another approach, there is no need to figure out a more
> >>> optimal value...
> >> We should do some benchmarks using that flag globally and see what
> >> happens.  Maybe we'd gain from using it everywhere.
> > 
> > Like Michael said, this would be a big test for different platforms and
> > compilers which I cannot offer alone so several people would have to do
> > this - if a benchmark would indicate that it might be worth testing.
> > 
> > Also, I'm lacking a good idea of how to test this efficiently without
> > having other factors like harddrives playing a predominant role which
> > means testing execution time of ffmpeg.
> 
> I played around a little with the regression tests and audio decoders.
> For most of my tests -finline-limit=4096 makes it a little faster, e.g.
> 
> g726: 47001535 dezicycles -> 41628457 dezicycles (12%)
> alac: 12855244 dezicycles -> 12849127 dezicycles ( 0%)
> flac:   842020 dezicycles ->   786226 dezicycles ( 7%)
> wma:   3663166 dezicycles ->  3197273 dezicycles (14%)
> 
> which is not surprising. Inlining comes for a price, ffmpeg executable
> growed from 5,4 MB to 6.1 MB.
> Value used fro -finline-limit is 4096, default is 600 for gcc-4.0.

what about video codecs? h264, mpeg4, mpeg2 h263 ?
and which cpu?

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The bravest are surely those who have the clearest vision
of what is before them, glory and danger alike, and yet
notwithstanding go out to meet it. -- Thucydides
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091130/27148b25/attachment.pgp>



More information about the ffmpeg-devel mailing list