[FFmpeg-devel] [PATCH] lavc/aacenc_utils: unroll abs_pow34_v loop
atomnuker at gmail.com
Sat Mar 19 13:35:21 CET 2016
On 19 March 2016 at 05:12, Ganesh Ajjanagadde <gajjanag at gmail.com> wrote:
> It seems like in all usages, size is a multiple of 4. This is documented
> as an assert.
> Yields speedup in this function, and small speedup for aac encoding
> Sample benchmark (Haswell, -march=native + GCC):
> 1390 decicycles in abs_pow34_v, 127138 runs, 3934 skips63.1x
> 1385 decicycles in abs_pow34_v, 254191 runs, 7953 skips64.4x
> 1383 decicycles in abs_pow34_v, 508305 runs, 15983 skips65.3x
> 1109 decicycles in abs_pow34_v, 127122 runs, 3950 skips61.2x
> 1107 decicycles in abs_pow34_v, 254177 runs, 7967 skips63.5x
> 1106 decicycles in abs_pow34_v, 508292 runs, 15996 skips65.3x
> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.55s user 0.03s
> system 99% cpu 4.581 total
> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.50s user 0.04s
> system 99% cpu 4.537 total
> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
> libavcodec/aacenc_utils.h | 24 +++++++++++++++---------
> 1 file changed, 15 insertions(+), 9 deletions(-)
Are you sure that this speedup (and the other patch you posted) is real and
above the error? Did you do multiple runs to rule out that it was chance?
0.04/0.05 second improvement on 5 seconds doesn't seem significant at all,
and we have to put the line on placebo speedups or enjoy the whole project
filling up with sphagetti code.
Although the decrease in decicycles for the function was nice, what matters
at the end is whether the speedup is enough to justify the extra code, and
I have a suspicion that the compiler inlines and unrolls that function
anyway. Try putting __attribute__ ((noinline)) as an attribute to see if
that makes a difference. I'll have time to test later today.
More information about the ffmpeg-devel