[FFmpeg-devel] [PATCH] aacenc_utils: unroll loops to allow compiler to use SIMD.

Reimar Döffinger Reimar.Doeffinger at gmx.de
Sun Mar 6 20:14:01 CET 2016


On Sun, Mar 06, 2016 at 03:49:00PM -0300, James Almer wrote:
> On 3/6/2016 3:35 PM, Reimar Döffinger wrote:
> > Approximately 10% faster transcode from mp3 to aac
> > with default settings.
> > 
> > Signed-off-by: Reimar Döffinger <Reimar.Doeffinger at gmx.de>
> > ---
> >  libavcodec/aacenc_utils.h | 47 ++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 38 insertions(+), 9 deletions(-)
> > 
> > diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
> > index b9bd6bf..1639021 100644
> > --- a/libavcodec/aacenc_utils.h
> > +++ b/libavcodec/aacenc_utils.h
> > @@ -36,15 +36,29 @@
> >  #define ROUND_TO_ZERO 0.1054f
> >  #define C_QUANT 0.4054f
> >  
> > +#define ABSPOW(inv, outv) \
> > +do { \
> > +    float a = (inv); \
> > +    a = fabsf(a); \
> > +    (outv) = sqrtf(a * sqrtf(a)); \
> > +} while(0)
> > +
> >  static inline void abs_pow34_v(float *out, const float *in, const int size)
> >  {
> >      int i;
> > -    for (i = 0; i < size; i++) {
> > -        float a = fabsf(in[i]);
> > -        out[i] = sqrtf(a * sqrtf(a));
> > +    for (i = 0; i < size - 3; i += 4) {
> > +        ABSPOW(in[i], out[i]);
> > +        ABSPOW(in[i+1], out[i+1]);
> > +        ABSPOW(in[i+2], out[i+2]);
> > +        ABSPOW(in[i+3], out[i+3]);
> > +    }
> 
> Are you sure this wasn't vectorized already? I remember i checked and it mostly
> was, at least on gcc 5.3 mingw-w64 with default settings.

Then it would hardly get 10% faster, would it (though
I admit I didn't test the two parts separately)?
But I am fairly sure that before the patch it only
used sqrtss instructions and not sqrtps.


More information about the ffmpeg-devel mailing list