[FFmpeg-devel] [RFC] AAC Encoder

Michael Niedermayer michaelni
Sat Aug 16 20:35:44 CEST 2008


On Sat, Aug 16, 2008 at 05:31:09PM +0300, Kostya wrote:
> On Fri, Aug 15, 2008 at 09:05:27PM +0200, Michael Niedermayer wrote:
> > On Fri, Aug 15, 2008 at 07:59:52PM +0300, Kostya wrote:
[...]
>  
> [...]
> > > +/**
> > > + * Encode MS data.
> > > + * @see 4.6.8.1 "Joint Coding - M/S Stereo"
> > > + */
> > > +static void encode_ms_info(PutBitContext *pb, ChannelElement *cpe)
> > > +{
> > > +    int i, w, wg;
> > > +
> > > +    put_bits(pb, 2, cpe->ms.present);
> > > +    if(cpe->ms.present == 1){
> > > +        w = 0;
> > > +        for(wg = 0; wg < cpe->ch[0].ics.num_window_groups; wg++){
> > > +            for(i = 0; i < cpe->ch[0].ics.max_sfb; i++)
> > > +                put_bits(pb, 1, cpe->ms.mask[w][i]);
> > > +            w += cpe->ch[0].ics.group_len[wg];
> > > +        }
> > >      }
> > >  }
> > 
> > this will not work with the data structs of the decoder
> > ms_mask is 120 elements
> > also the new group_len is still leaving holes in the arrays, its
> > surely better now as it doesnt loop over the 0 elements anymore but
> > they are still there.
> > I do not see why they should be there, it does not appear that there
> > is any advantage in them being there ... but if iam wrong iam sure you
> > will explain what they are good for?
>  
> Now it will work with flat data arrays having size 128 (which is comparable).
> I think this should be acceptable and working with fixed offset
> (window_num*16 + scalefactor_band_index) is easier.
> 
> Also I must note that decoder is presented with grouping data first and
> decodes the rest of data basing on it.
> Encoder, on the other hand, has transformed coefficients first, and applies
> grouping to them later. So it's easier and convenient to use first window of
> group to hold needed scalefactors, band types, etc. than move that stuff
> to another windows.

i do not think i understand.
Please correct me if iam misunderstanding something but
the coefficients are grouped into hmm groups (what was the correct term...)
the encoder can choose these grouping withing some limits ...
each group then gets a single scale factor. Thus one cannot select
scalefactors prior to grouping or independant of it, it would impose
a unpredictable limitation

anyway iam not arguing for the encoder to do exactly what the decoder does ATM
but it must be clean, compact, optimal and decoder and encoder should match
each other whenever possible.

Could you maybe point to exactly what code would become uglier with the
style used by the decoder? ideally with a diff showing the uglification?

[...]
> [...]
> > > +    s->path[0].bits = 0;
> > > +    for(i = 1; i <= max_sfb; i++)
> > > +        s->path[i].bits = INT_MAX;
> > > +    for(i = 0; i < max_sfb; i++){
> > > +        for(j = 1; j <= max_sfb - i; j++){
> > > +            bits = INT_MAX;
> > > +            ccb = 0;
> > > +            for(cb = 0; cb < 12; cb++){
> > > +                int sum = 0;
> > > +                for(k = 0; k < j; k++){
> > > +                    if(s->band_bits[i + k][cb] == INT_MAX){
> > > +                        sum = INT_MAX;
> > > +                        break;
> > > +                    }
> > > +                    sum += s->band_bits[i + k][cb];
> > > +                }
> > > +                if(sum < bits){
> > > +                    bits = sum;
> > > +                    ccb  = cb;
> > > +                }
> > > +            }
> > > +            assert(bits != INT_MAX);
> > > +            bits += s->path[i].bits + calculate_run_bits(j, run_bits);
> > > +            if(bits < s->path[i+j].bits){
> > > +                s->path[i+j].bits     = bits;
> > > +                s->path[i+j].codebook = ccb;
> > > +                s->path[i+j].prev_idx = i;
> > > +            }
> > > +        }
> > > +    }
> > 
> > hmm this is doing a loop more than it should ...
> > (note code below ignores [-1] and INT_MAX+a issues)
> > 
> > s->path[-1].bits= 0;
> > for(i = 0; i < max_sfb; i++){
> >     s->path[i].bits= INT_MAX;
> >     for(cb = 0; cb < 12; cb++){
> >         int sum=0;
> >         for(k = 0; k <= i; k++){
> >             sum += s->band_bits[i - k][cb];
> >             sum2= sum + calculate_run_bits(k, run_bits) + s->path[i-k-1].bits;
> >             if(sum2 < s->path[i].bits){
> >                 s->path[i].bits= sum2;
> >                 s->path[i].codebook= cb;
> >                 s->path[i].prev_idx= i - k - 1;
> >             }else if(sum2 - s->path[i].bits > THRESHOLD) // early termination to skip impossible cases
> >                 break;
> >         }
> >     }
> > }
>  
> I can't see a significant difference between them, except your code
> searches paths backward instead of forward. And calculates runs per
> codebook, so sum is updated instead of full recalculation (which I
> should adopt).
> 
> Leaved as is for now.

your commit message said "(Almost) optimal band codebook selection"
viterbi is optimal not almost optimal
(in this case at least, for others it may be that simplifiations are needed
 to achive a useable speed)

also my suggestion besides being faster by O(N) and simpler has a early
termination check


>  
> > > +
> > > +    //convert resulting path from backward-linked list
> > > +    stack_len = 0;
> > > +    idx = max_sfb;
> > > +    while(idx > 0){
> > > +        stack[stack_len++] = idx;
> > > +        idx = s->path[idx].prev_idx;
> > > +    }
> > > +
> > > +    //perform actual band info encoding
> > > +    start = 0;
> > > +    for(i = stack_len - 1; i >= 0; i--){
> > > +        put_bits(&s->pb, 4, s->path[stack[i]].codebook);
> > > +        count = stack[i] - s->path[stack[i]].prev_idx;
> > 
> > > +        for(j = 0; j < count; j++){
> > > +            cpe->ch[channel].band_type[win][start] =  s->path[stack[i]].codebook;
> > > +            cpe->ch[channel].zeroes[win][start]    = !s->path[stack[i]].codebook;
> > > +            start++;
> > > +        }
> > 
> > memset
> 
> umm, band_type[] type is int 

why is it an int?


[...]
> [...] 
> > > +    init_put_bits(&s->pb, frame, buf_size*8);
> > > +    if(avctx->frame_number==1 && !(avctx->flags & CODEC_FLAG_BITEXACT)){
> > > +        put_bitstream_info(avctx, s, LIBAVCODEC_IDENT);
> > > +    }
> > 
> > this still does not look like it is stored in extradata and neither is it
> > repeated.
> 
> now it's repeated (but I still prefer more shy marking of the file)

write a better encoder so you can be proud of it! :)


[...]

> --- /home/kst/cvs-get/ffmpeg/libavcodec/aacenc.c	2008-08-16 14:53:38.000000000 +0300
> +++ aacenc.c	2008-08-16 13:48:45.000000000 +0300
> @@ -118,6 +118,50 @@
>      swb_size_128_16, swb_size_128_16, swb_size_128_8
>  };
>  

> +#define CB_UNSIGNED 0x01    ///< coefficients are coded as absolute values
> +#define CB_PAIRS    0x02    ///< coefficients are grouped into pairs before coding (quads by default)
> +#define CB_ESCAPE   0x04    ///< codebook allows escapes

unused


> +
> +/** spectral coefficients codebook information */
> +static const struct {

> +    int16_t maxval;         ///< maximum possible value

unused


[...]
> +/** bits needed to code codebook run value for long windows */
> +static const uint8_t run_value_bits_long[64] = {
> +     5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,
> +     5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5, 10,
> +    10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
> +    10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 15
> +};
> +
> +/** bits needed to code codebook run value for short windows */
> +static const uint8_t run_value_bits_short[16] = {
> +    3, 3, 3, 3, 3, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 9
> +};
> +
> +static const uint8_t* run_value_bits[2] = {
> +    run_value_bits_long, run_value_bits_short
> +};
> +
>  /** default channel configurations */
>  static const uint8_t aac_chan_configs[6][5] = {
>   {1, TYPE_SCE},                               // 1 channel  - single channel element

ok



> @@ -129,6 +173,15 @@
>  };
>  
>  /**
> + * structure used in optimal codebook search
> + */
> +typedef struct BandCodingPath {
> +    int prev_idx; ///< pointer to the previous path point
> +    int codebook; ///< codebook for coding band run
> +    int bits;     ///< number of bit needed to code given number of bands
> +} BandCodingPath;
> +
> +/**
>   * AAC encoder context
>   */
>  typedef struct {

ok



> @@ -136,6 +189,20 @@
>      MDCTContext mdct1024;                        ///< long (1024 samples) frame transform context
>      MDCTContext mdct128;                         ///< short (128 samples) frame transform context
>      DSPContext  dsp;
> +    DECLARE_ALIGNED_16(FFTSample, output[2048]); ///< temporary buffer for MDCT input coefficients
> +    int16_t* samples;                            ///< saved preprocessed input
> +
> +    int samplerate_index;                        ///< MPEG-4 samplerate index
> +    const uint8_t *swb_sizes1024;                ///< scalefactor band sizes for long frame
> +    int swb_num1024;                             ///< number of scalefactor bands for long frame
> +    const uint8_t *swb_sizes128;                 ///< scalefactor band sizes for short frame
> +    int swb_num128;                              ///< number of scalefactor bands for short frame
> +
> +    ChannelElement *cpe;                         ///< channel elements





> +    AACPsyContext psy;                           ///< psychoacoustic model context
> +    int last_frame;

ok


> +    BandCodingPath path[64];                     ///< auxiliary data needed for optimal band info coding
> +    int band_bits[64][12];                       ///< bits needed to encode each band with each codebook
>  } AACEncContext;
>  
>  /**

I think they could be local variables


[...]
> @@ -210,7 +326,7 @@
>  static void put_ics_info(AVCodecContext *avctx, IndividualChannelStream *info)
>  {
>      AACEncContext *s = avctx->priv_data;
> -    int i;
> +    int wg;
>  
>      put_bits(&s->pb, 1, 0);                // ics_reserved bit
>      put_bits(&s->pb, 2, info->window_sequence[0]);
> @@ -220,8 +336,295 @@
>          put_bits(&s->pb, 1, 0);            // no prediction
>      }else{
>          put_bits(&s->pb, 4, info->max_sfb);
> -        for(i = 1; i < info->num_windows; i++)
> -            put_bits(&s->pb, 1, info->group_len[i]);

> +        for(wg = 0; wg < info->num_window_groups; wg++){
> +            if(wg)
> +                put_bits(&s->pb, 1, 0);
> +            if(info->group_len[wg] > 1)
> +                put_sbits(&s->pb, info->group_len[wg] - 1, 0xFF);
> +        }

is this correct? isnt it if(info->group_len[wg] > 1) else instead of if(wg)


[...]
> +/**
> + * Calculate the number of bits needed to code given band with given codebook.
> + *
> + * @param s       encoder context
> + * @param cpe     channel element
> + * @param channel channel number inside channel pair
> + * @param win     window group start number
> + * @param start   scalefactor band position in spectral coefficients
> + * @param size    scalefactor band size
> + * @param cb      codebook number
> + */
> +static int calculate_band_bits(AACEncContext *s, ChannelElement *cpe, int channel, int win, int group_len, int start, int size, int cb)
> +{
> +    int i, j, w;
> +    int score = 0, dim, idx, start2;
> +    int range = aac_cb_info[cb].range;
> +
> +    if(!range) return 0;
> +    cb--;
> +    dim = cb < FIRST_PAIR_BT ? 4 : 2;
> +
> +    start2 = start;
> +    if(cb == ESC_BT){
> +        int coef_abs[2];
> +        for(w = win; w < win + group_len; w++){
> +            for(i = start2; i < start2 + size; i += dim){

> +                idx = 0;
> +                for(j = 0; j < dim; j++){
> +                    coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> +                    idx = idx*17 + FFMIN(coef_abs[j], 16);
> +                }
> +                score += ff_aac_spectral_bits[cb][idx];
> +                for(j = 0; j < dim; j++)
> +                    if(cpe->ch[channel].icoefs[i+j])
> +                        score++;
> +                for(j = 0; j < dim; j++)
> +                    if(coef_abs[j] > 15)
> +                        score += av_log2(coef_abs[j]) * 2 - 4 + 1;

the loops still can be merged


> +            }
> +            start2 += 128;
> +       }
> +    }else if(IS_CODEBOOK_UNSIGNED(cb)){
> +        for(w = win; w < win + group_len; w++){
> +            for(i = start2; i < start2 + size; i += dim){
> +                idx = 0;
> +                for(j = 0; j < dim; j++)
> +                    idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
> +                score += ff_aac_spectral_bits[cb][idx];
> +                for(j = 0; j < dim; j++)
> +                     if(cpe->ch[channel].icoefs[i+j])
> +                         score++;
> +            }

the sign bits have the same effect on all unsigned codebooks
thus the are also redundantly calculated here


[...]
> +/**
> + * Encode one scalefactor band with selected codebook.
> + */

encode the coefficients of one ...


> +static void encode_band_coeffs(AACEncContext *s, ChannelElement *cpe, int channel, int start, int size, int cb)
> +{
> +    const uint8_t  *bits  = ff_aac_spectral_bits [cb - 1];
> +    const uint16_t *codes = ff_aac_spectral_codes[cb - 1];
> +    const int range = aac_cb_info[cb].range;
> +    const int dim = (cb < FIRST_PAIR_BT) ? 4 : 2;
> +    int i, j, idx;
> +
> +    //do not encode zero or special codebooks
> +    if(range == -1) return;
> +
> +    if(cb == ESC_BT){
> +        int coef_abs[2];
> +        for(i = start; i < start + size; i += dim){
> +            idx = 0;
> +            for(j = 0; j < dim; j++){
> +                coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> +                idx = idx*17 + FFMIN(coef_abs[j], 16);
> +            }
> +            put_bits(&s->pb, bits[idx], codes[idx]);
> +            //output signs
> +            for(j = 0; j < dim; j++)
> +                if(cpe->ch[channel].icoefs[i+j])
> +                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> +            //output escape values
> +            for(j = 0; j < dim; j++)
> +                if(coef_abs[j] > 15){
> +                    int len = av_log2(coef_abs[j]);
> +
> +                    put_bits(&s->pb, len - 4 + 1, (1 << (len - 4 + 1)) - 2);
> +                    put_bits(&s->pb, len, coef_abs[j] & ((1 << len) - 1));
> +                }
> +        }
> +    }else if(IS_CODEBOOK_UNSIGNED(cb)){
> +        for(i = start; i < start + size; i += dim){
> +            idx = 0;
> +            for(j = 0; j < dim; j++)
> +                idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
> +            put_bits(&s->pb, bits[idx], codes[idx]);
> +            //output signs
> +            for(j = 0; j < dim; j++)
> +                if(cpe->ch[channel].icoefs[i+j])
> +                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> +        }
> +    }else{
> +        for(i = start; i < start + size; i += dim){

> +            idx = 0;
> +            for(j = 0; j < dim; j++)
> +                idx = idx * range + cpe->ch[channel].icoefs[i+j];

idx=cpe->ch[channel].icoefs[i];
for(j=1; j<dim; j++)
    idx = idx * range + cpe->ch[channel].icoefs[i+j];


> +            //it turned out that all signed codebooks use the same offset for index coding
> +            idx += 40;
> +            put_bits(&s->pb, bits[idx], codes[idx]);
> +        }
> +    }
> +}

[...]
> +/**
> + * Encode scalefactors.
> + */
> +static void encode_scale_factors(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel, int global_gain)
> +{
> +    int off = global_gain, diff;
> +    int i, w, wg;
> +
> +    w = 0;
> +    for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
> +        for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> +            if(!cpe->ch[channel].zeroes[w*16 + i]){
> +                /* if we have encountered scale=256 it means empty band
> +                 * which was decided to be coded by encoder, so assign it
> +                 * last scalefactor value for compression efficiency
> +                 */
> +                if(cpe->ch[channel].sf_idx[w*16 + i] == 256)
> +                    cpe->ch[channel].sf_idx[w*16 + i] = off;

why is th code that selects scalefactors not simply setting it to the last
scale factor?


[...]

> @@ -254,12 +697,12 @@
>      for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
>          start = 0;
>          for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> -            if(cpe->ch[channel].zeroes[w][i]){
> +            if(cpe->ch[channel].zeroes[w*16 + i]){
>                  start += cpe->ch[channel].ics.swb_sizes[i];
>                  continue;
>              }
>              for(w2 = w; w2 < w + cpe->ch[channel].ics.group_len[wg]; w2++){
> -                encode_band_coeffs(s, cpe, channel, start + w2*128, cpe->ch[channel].ics.swb_sizes[i], cpe->ch[channel].band_type[w][i]);
> +                encode_band_coeffs(s, cpe, channel, start + w2*128, cpe->ch[channel].ics.swb_sizes[i], cpe->ch[channel].band_type[w*16 + i]);
>              }
>              start += cpe->ch[channel].ics.swb_sizes[i];
>          }

ok


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If a bugfix only changes things apparently unrelated to the bug with no
further explanation, that is a good sign that the bugfix is wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080816/801d480c/attachment.pgp>



More information about the ffmpeg-devel mailing list