[FFmpeg-devel] [PATCH] lavc/aacenc_utils: unroll abs_pow34_v loop

Clément Bœsch u at pkh.me
Sat Mar 19 12:42:09 CET 2016


On Fri, Mar 18, 2016 at 10:12:14PM -0700, Ganesh Ajjanagadde wrote:
> It seems like in all usages, size is a multiple of 4. This is documented
> as an assert.
> 
> Yields speedup in this function, and small speedup for aac encoding overall.
> 
> Sample benchmark (Haswell, -march=native + GCC):
> old:
>    [...]
>    1390 decicycles in abs_pow34_v,  127138 runs,   3934 skips63.1x
>    1385 decicycles in abs_pow34_v,  254191 runs,   7953 skips64.4x
>    1383 decicycles in abs_pow34_v,  508305 runs,  15983 skips65.3x
> 
> new:
>    [...]
>    1109 decicycles in abs_pow34_v,  127122 runs,   3950 skips61.2x
>    1107 decicycles in abs_pow34_v,  254177 runs,   7967 skips63.5x
>    1106 decicycles in abs_pow34_v,  508292 runs,  15996 skips65.3x
> 
> old:
> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.55s user 0.03s system 99% cpu 4.581 total
> new:
> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.50s user 0.04s system 99% cpu 4.537 total
> 
> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
> ---
>  libavcodec/aacenc_utils.h | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
> index 0203b6e..800b78f 100644
> --- a/libavcodec/aacenc_utils.h
> +++ b/libavcodec/aacenc_utils.h
> @@ -37,20 +37,26 @@
>  #define ROUND_TO_ZERO 0.1054f
>  #define C_QUANT 0.4054f
>  
> -static inline void abs_pow34_v(float *av_restrict out, const float *av_restrict in, const int size)
> -{
> -    int i;
> -    for (i = 0; i < size; i++) {
> -        float a = fabsf(in[i]);
> -        out[i] = sqrtf(a * sqrtf(a));
> -    }
> -}
> -
>  static inline float pos_pow34(float a)
>  {
>      return sqrtf(a * sqrtf(a));
>  }
>  
> +static inline void abs_pow34_v(float *av_restrict out, const float *av_restrict in, const int size)
> +{
> +    av_assert2(!(size % 4));
> +    for (int i = 0; i < size; i+=4) {
> +        float a0 = fabsf(in[i]);
> +        float a1 = fabsf(in[i+1]);
> +        float a2 = fabsf(in[i+2]);
> +        float a3 = fabsf(in[i+3]);
> +        out[i  ] = pos_pow34(a0);
> +        out[i+1] = pos_pow34(a1);
> +        out[i+2] = pos_pow34(a2);
> +        out[i+3] = pos_pow34(a3);
> +    }
> +}
> +

I'm curious (and lazy), is GCC able to unroll by itself if you hint it
with a loop such as:

    int i;
    for (i = 0; i < size & ~3; i++) {
        float a = fabsf(in[i]);
        out[i] = sqrtf(a * sqrtf(a));
    }

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160319/6d0788ea/attachment.sig>


More information about the ffmpeg-devel mailing list