[FFmpeg-devel] [PATCH] lavc/aacenc_utils: unroll abs_pow34_v loop

Tue Mar 22 19:14:49 CET 2016

On Sat, Mar 19, 2016 at 5:35 AM, Rostislav Pehlivanov
<atomnuker at gmail.com> wrote:
> On 19 March 2016 at 05:12, Ganesh Ajjanagadde <gajjanag at gmail.com> wrote:
>
>> It seems like in all usages, size is a multiple of 4. This is documented
>> as an assert.
>>
>> Yields speedup in this function, and small speedup for aac encoding
>> overall.
>>
>> Sample benchmark (Haswell, -march=native + GCC):
>> old:
>>    [...]
>>    1390 decicycles in abs_pow34_v,  127138 runs,   3934 skips63.1x
>>    1385 decicycles in abs_pow34_v,  254191 runs,   7953 skips64.4x
>>    1383 decicycles in abs_pow34_v,  508305 runs,  15983 skips65.3x
>>
>> new:
>>    [...]
>>    1109 decicycles in abs_pow34_v,  127122 runs,   3950 skips61.2x
>>    1107 decicycles in abs_pow34_v,  254177 runs,   7967 skips63.5x
>>    1106 decicycles in abs_pow34_v,  508292 runs,  15996 skips65.3x
>>
>> old:
>> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.55s user 0.03s
>> system 99% cpu 4.581 total
>> new:
>> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.50s user 0.04s
>> system 99% cpu 4.537 total
>>
>> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
>> ---
>>  libavcodec/aacenc_utils.h | 24 +++++++++++++++---------
>>  1 file changed, 15 insertions(+), 9 deletions(-)
>>
>>
> Are you sure that this speedup (and the other patch you posted) is real and
> above the error? Did you do multiple runs to rule out that it was chance?
> 0.04/0.05 second improvement on 5 seconds doesn't seem significant at all,

I am really sorry about these measurements, they were screwed up by a
very recent regression on my laptop due to some package upgrade.
Essentially, put it to suspend, restore, and the clock freq/cpu
governor would downshift slightly, from 2.4 to 2.2 GHz base, no idea
about the changes to the turbo freq.

So please ignore these.

However, here is a heuristic calculation of the impact:
between 500,000 and 1,000,000 runs, 30 cycle speedup per run ~ 15-30
million cycles saved overall out of ~ 5 * 3 billion = 15 billion
cycles. So it is near the 0.1% threshold, see below.

> and we have to put the line on placebo speedups or enjoy the whole project
> filling up with sphagetti code.
> Although the decrease in decicycles for the function was nice, what matters
> at the end is whether the speedup is enough to justify the extra code,

Per doc/optimization.txt, aac is a widely used codec, so even a 0.1%
improvement in aac is fair game for optimizations, assuming it is a
small code change. Of course, one can debate whether this is small or
not. I view it as simple and clean, others may disagree.

> and
> I have a suspicion that the compiler inlines and unrolls that function
> anyway. Try putting __attribute__ ((noinline)) as an attribute to see if
> that makes a difference. I'll have time to test later today.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel