[FFmpeg-devel] [RFC] AAC Encoder

Kostya kostya.shishkov
Sun Aug 17 13:40:19 CEST 2008


On Sat, Aug 16, 2008 at 08:35:44PM +0200, Michael Niedermayer wrote:
> On Sat, Aug 16, 2008 at 05:31:09PM +0300, Kostya wrote:
> > On Fri, Aug 15, 2008 at 09:05:27PM +0200, Michael Niedermayer wrote:
> > > On Fri, Aug 15, 2008 at 07:59:52PM +0300, Kostya wrote:
[...]
> > > 
> > > this will not work with the data structs of the decoder
> > > ms_mask is 120 elements
> > > also the new group_len is still leaving holes in the arrays, its
> > > surely better now as it doesnt loop over the 0 elements anymore but
> > > they are still there.
> > > I do not see why they should be there, it does not appear that there
> > > is any advantage in them being there ... but if iam wrong iam sure you
> > > will explain what they are good for?
> >  
> > Now it will work with flat data arrays having size 128 (which is comparable).
> > I think this should be acceptable and working with fixed offset
> > (window_num*16 + scalefactor_band_index) is easier.
> > 
> > Also I must note that decoder is presented with grouping data first and
> > decodes the rest of data basing on it.
> > Encoder, on the other hand, has transformed coefficients first, and applies
> > grouping to them later. So it's easier and convenient to use first window of
> > group to hold needed scalefactors, band types, etc. than move that stuff
> > to another windows.
> 
> i do not think i understand.
> Please correct me if iam misunderstanding something but
> the coefficients are grouped into hmm groups (what was the correct term...)
> the encoder can choose these grouping withing some limits ...
> each group then gets a single scale factor. Thus one cannot select
> scalefactors prior to grouping or independant of it, it would impose
> a unpredictable limitation
> 
> anyway iam not arguing for the encoder to do exactly what the decoder does ATM
> but it must be clean, compact, optimal and decoder and encoder should match
> each other whenever possible.
> 
> Could you maybe point to exactly what code would become uglier with the
> style used by the decoder? ideally with a diff showing the uglification?

they are almost the same to the extent decoder knows all limits and can
exploit it and my encoder have data for all windows and just marks window
group starts.

> [...]
> > [...]
> > > > +    s->path[0].bits = 0;
> > > > +    for(i = 1; i <= max_sfb; i++)
> > > > +        s->path[i].bits = INT_MAX;
> > > > +    for(i = 0; i < max_sfb; i++){
> > > > +        for(j = 1; j <= max_sfb - i; j++){
> > > > +            bits = INT_MAX;
> > > > +            ccb = 0;
> > > > +            for(cb = 0; cb < 12; cb++){
> > > > +                int sum = 0;
> > > > +                for(k = 0; k < j; k++){
> > > > +                    if(s->band_bits[i + k][cb] == INT_MAX){
> > > > +                        sum = INT_MAX;
> > > > +                        break;
> > > > +                    }
> > > > +                    sum += s->band_bits[i + k][cb];
> > > > +                }
> > > > +                if(sum < bits){
> > > > +                    bits = sum;
> > > > +                    ccb  = cb;
> > > > +                }
> > > > +            }
> > > > +            assert(bits != INT_MAX);
> > > > +            bits += s->path[i].bits + calculate_run_bits(j, run_bits);
> > > > +            if(bits < s->path[i+j].bits){
> > > > +                s->path[i+j].bits     = bits;
> > > > +                s->path[i+j].codebook = ccb;
> > > > +                s->path[i+j].prev_idx = i;
> > > > +            }
> > > > +        }
> > > > +    }
> > > 
> > > hmm this is doing a loop more than it should ...
> > > (note code below ignores [-1] and INT_MAX+a issues)
> > > 
> > > s->path[-1].bits= 0;
> > > for(i = 0; i < max_sfb; i++){
> > >     s->path[i].bits= INT_MAX;
> > >     for(cb = 0; cb < 12; cb++){
> > >         int sum=0;
> > >         for(k = 0; k <= i; k++){
> > >             sum += s->band_bits[i - k][cb];
> > >             sum2= sum + calculate_run_bits(k, run_bits) + s->path[i-k-1].bits;
> > >             if(sum2 < s->path[i].bits){
> > >                 s->path[i].bits= sum2;
> > >                 s->path[i].codebook= cb;
> > >                 s->path[i].prev_idx= i - k - 1;
> > >             }else if(sum2 - s->path[i].bits > THRESHOLD) // early termination to skip impossible cases
> > >                 break;
> > >         }
> > >     }
> > > }
> >  
> > I can't see a significant difference between them, except your code
> > searches paths backward instead of forward. And calculates runs per
> > codebook, so sum is updated instead of full recalculation (which I
> > should adopt).
> > 
> > Leaved as is for now.
> 
> your commit message said "(Almost) optimal band codebook selection"
> viterbi is optimal not almost optimal
> (in this case at least, for others it may be that simplifiations are needed
>  to achive a useable speed)
> 
> also my suggestion besides being faster by O(N) and simpler has a early
> termination check

ok, implemented (leaved it to calculate forward path though) 
 
> >  
> > > > +
> > > > +    //convert resulting path from backward-linked list
> > > > +    stack_len = 0;
> > > > +    idx = max_sfb;
> > > > +    while(idx > 0){
> > > > +        stack[stack_len++] = idx;
> > > > +        idx = s->path[idx].prev_idx;
> > > > +    }
> > > > +
> > > > +    //perform actual band info encoding
> > > > +    start = 0;
> > > > +    for(i = stack_len - 1; i >= 0; i--){
> > > > +        put_bits(&s->pb, 4, s->path[stack[i]].codebook);
> > > > +        count = stack[i] - s->path[stack[i]].prev_idx;
> > > 
> > > > +        for(j = 0; j < count; j++){
> > > > +            cpe->ch[channel].band_type[win][start] =  s->path[stack[i]].codebook;
> > > > +            cpe->ch[channel].zeroes[win][start]    = !s->path[stack[i]].codebook;
> > > > +            start++;
> > > > +        }
> > > 
> > > memset
> > 
> > umm, band_type[] type is int 
> 
> why is it an int?

because it's enum. changed for zeroes[] though

> [...]
> > [...] 
> > > > +    init_put_bits(&s->pb, frame, buf_size*8);
> > > > +    if(avctx->frame_number==1 && !(avctx->flags & CODEC_FLAG_BITEXACT)){
> > > > +        put_bitstream_info(avctx, s, LIBAVCODEC_IDENT);
> > > > +    }
> > > 
> > > this still does not look like it is stored in extradata and neither is it
> > > repeated.
> > 
> > now it's repeated (but I still prefer more shy marking of the file)
> 
> write a better encoder so you can be proud of it! :)

eventually I will

> [...]
> 
> > --- /home/kst/cvs-get/ffmpeg/libavcodec/aacenc.c	2008-08-16 14:53:38.000000000 +0300
> > +++ aacenc.c	2008-08-16 13:48:45.000000000 +0300
> > @@ -118,6 +118,50 @@
> >      swb_size_128_16, swb_size_128_16, swb_size_128_8
> >  };
> >  
> 
> > +#define CB_UNSIGNED 0x01    ///< coefficients are coded as absolute values
> > +#define CB_PAIRS    0x02    ///< coefficients are grouped into pairs before coding (quads by default)
> > +#define CB_ESCAPE   0x04    ///< codebook allows escapes
> 
> unused

dropped

> > +
> > +/** spectral coefficients codebook information */
> > +static const struct {
> 
> > +    int16_t maxval;         ///< maximum possible value
> 
> unused

that's wrong. It's used in bits for codebook calculation.
 
[...]
> 
> > +    BandCodingPath path[64];                     ///< auxiliary data needed for optimal band info coding
> > +    int band_bits[64][12];                       ///< bits needed to encode each band with each codebook
> >  } AACEncContext;
> >  
> >  /**
> 
> I think they could be local variables
 
no problem
 
> [...]
> > @@ -210,7 +326,7 @@
> >  static void put_ics_info(AVCodecContext *avctx, IndividualChannelStream *info)
> >  {
> >      AACEncContext *s = avctx->priv_data;
> > -    int i;
> > +    int wg;
> >  
> >      put_bits(&s->pb, 1, 0);                // ics_reserved bit
> >      put_bits(&s->pb, 2, info->window_sequence[0]);
> > @@ -220,8 +336,295 @@
> >          put_bits(&s->pb, 1, 0);            // no prediction
> >      }else{
> >          put_bits(&s->pb, 4, info->max_sfb);
> > -        for(i = 1; i < info->num_windows; i++)
> > -            put_bits(&s->pb, 1, info->group_len[i]);
> 
> > +        for(wg = 0; wg < info->num_window_groups; wg++){
> > +            if(wg)
> > +                put_bits(&s->pb, 1, 0);
> > +            if(info->group_len[wg] > 1)
> > +                put_sbits(&s->pb, info->group_len[wg] - 1, 0xFF);
> > +        }
> 
> is this correct? isnt it if(info->group_len[wg] > 1) else instead of if(wg)

it is. the first bit of grouping is always 0 and thus is not stored
 
> [...]
> > +/**
> > + * Calculate the number of bits needed to code given band with given codebook.
> > + *
> > + * @param s       encoder context
> > + * @param cpe     channel element
> > + * @param channel channel number inside channel pair
> > + * @param win     window group start number
> > + * @param start   scalefactor band position in spectral coefficients
> > + * @param size    scalefactor band size
> > + * @param cb      codebook number
> > + */
> > +static int calculate_band_bits(AACEncContext *s, ChannelElement *cpe, int channel, int win, int group_len, int start, int size, int cb)
> > +{
> > +    int i, j, w;
> > +    int score = 0, dim, idx, start2;
> > +    int range = aac_cb_info[cb].range;
> > +
> > +    if(!range) return 0;
> > +    cb--;
> > +    dim = cb < FIRST_PAIR_BT ? 4 : 2;
> > +
> > +    start2 = start;
> > +    if(cb == ESC_BT){
> > +        int coef_abs[2];
> > +        for(w = win; w < win + group_len; w++){
> > +            for(i = start2; i < start2 + size; i += dim){
> 
> > +                idx = 0;
> > +                for(j = 0; j < dim; j++){
> > +                    coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> > +                    idx = idx*17 + FFMIN(coef_abs[j], 16);
> > +                }
> > +                score += ff_aac_spectral_bits[cb][idx];
> > +                for(j = 0; j < dim; j++)
> > +                    if(cpe->ch[channel].icoefs[i+j])
> > +                        score++;
> > +                for(j = 0; j < dim; j++)
> > +                    if(coef_abs[j] > 15)
> > +                        score += av_log2(coef_abs[j]) * 2 - 4 + 1;
> 
> the loops still can be merged

merged 
 
> > +            }
> > +            start2 += 128;
> > +       }
> > +    }else if(IS_CODEBOOK_UNSIGNED(cb)){
> > +        for(w = win; w < win + group_len; w++){
> > +            for(i = start2; i < start2 + size; i += dim){
> > +                idx = 0;
> > +                for(j = 0; j < dim; j++)
> > +                    idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
> > +                score += ff_aac_spectral_bits[cb][idx];
> > +                for(j = 0; j < dim; j++)
> > +                     if(cpe->ch[channel].icoefs[i+j])
> > +                         score++;
> > +            }
> 
> the sign bits have the same effect on all unsigned codebooks
> thus the are also redundantly calculated here

done
 
> [...]
> > +/**
> > + * Encode one scalefactor band with selected codebook.
> > + */
> 
> encode the coefficients of one ...

corrected 
 
> > +static void encode_band_coeffs(AACEncContext *s, ChannelElement *cpe, int channel, int start, int size, int cb)
> > +{
> > +    const uint8_t  *bits  = ff_aac_spectral_bits [cb - 1];
> > +    const uint16_t *codes = ff_aac_spectral_codes[cb - 1];
> > +    const int range = aac_cb_info[cb].range;
> > +    const int dim = (cb < FIRST_PAIR_BT) ? 4 : 2;
> > +    int i, j, idx;
> > +
> > +    //do not encode zero or special codebooks
> > +    if(range == -1) return;
> > +
> > +    if(cb == ESC_BT){
> > +        int coef_abs[2];
> > +        for(i = start; i < start + size; i += dim){
> > +            idx = 0;
> > +            for(j = 0; j < dim; j++){
> > +                coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> > +                idx = idx*17 + FFMIN(coef_abs[j], 16);
> > +            }
> > +            put_bits(&s->pb, bits[idx], codes[idx]);
> > +            //output signs
> > +            for(j = 0; j < dim; j++)
> > +                if(cpe->ch[channel].icoefs[i+j])
> > +                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> > +            //output escape values
> > +            for(j = 0; j < dim; j++)
> > +                if(coef_abs[j] > 15){
> > +                    int len = av_log2(coef_abs[j]);
> > +
> > +                    put_bits(&s->pb, len - 4 + 1, (1 << (len - 4 + 1)) - 2);
> > +                    put_bits(&s->pb, len, coef_abs[j] & ((1 << len) - 1));
> > +                }
> > +        }
> > +    }else if(IS_CODEBOOK_UNSIGNED(cb)){
> > +        for(i = start; i < start + size; i += dim){
> > +            idx = 0;
> > +            for(j = 0; j < dim; j++)
> > +                idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
> > +            put_bits(&s->pb, bits[idx], codes[idx]);
> > +            //output signs
> > +            for(j = 0; j < dim; j++)
> > +                if(cpe->ch[channel].icoefs[i+j])
> > +                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> > +        }
> > +    }else{
> > +        for(i = start; i < start + size; i += dim){
> 
> > +            idx = 0;
> > +            for(j = 0; j < dim; j++)
> > +                idx = idx * range + cpe->ch[channel].icoefs[i+j];
> 
> idx=cpe->ch[channel].icoefs[i];
> for(j=1; j<dim; j++)
>     idx = idx * range + cpe->ch[channel].icoefs[i+j];

applied where possible 
 
> > +            //it turned out that all signed codebooks use the same offset for index coding
> > +            idx += 40;
> > +            put_bits(&s->pb, bits[idx], codes[idx]);
> > +        }
> > +    }
> > +}
> 
> [...]
> > +/**
> > + * Encode scalefactors.
> > + */
> > +static void encode_scale_factors(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel, int global_gain)
> > +{
> > +    int off = global_gain, diff;
> > +    int i, w, wg;
> > +
> > +    w = 0;
> > +    for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
> > +        for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> > +            if(!cpe->ch[channel].zeroes[w*16 + i]){
> > +                /* if we have encountered scale=256 it means empty band
> > +                 * which was decided to be coded by encoder, so assign it
> > +                 * last scalefactor value for compression efficiency
> > +                 */
> > +                if(cpe->ch[channel].sf_idx[w*16 + i] == 256)
> > +                    cpe->ch[channel].sf_idx[w*16 + i] = off;
> 
> why is th code that selects scalefactors not simply setting it to the last
> scale factor?

it may occur before first band with scale too 
 
> [...]
> 
> > @@ -254,12 +697,12 @@
> >      for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
> >          start = 0;
> >          for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> > -            if(cpe->ch[channel].zeroes[w][i]){
> > +            if(cpe->ch[channel].zeroes[w*16 + i]){
> >                  start += cpe->ch[channel].ics.swb_sizes[i];
> >                  continue;
> >              }
> >              for(w2 = w; w2 < w + cpe->ch[channel].ics.group_len[wg]; w2++){
> > -                encode_band_coeffs(s, cpe, channel, start + w2*128, cpe->ch[channel].ics.swb_sizes[i], cpe->ch[channel].band_type[w][i]);
> > +                encode_band_coeffs(s, cpe, channel, start + w2*128, cpe->ch[channel].ics.swb_sizes[i], cpe->ch[channel].band_type[w*16 + i]);
> >              }
> >              start += cpe->ch[channel].ics.swb_sizes[i];
> >          }
> 
> ok
> 
> 
> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> If a bugfix only changes things apparently unrelated to the bug with no
> further explanation, that is a good sign that the bugfix is wrong.
-------------- next part --------------
--- /Users/kshishkov/devel/ffmpeg/libavcodec/aacenc.c	2008-08-17 14:44:07.000000000 +0300
+++ aacenc.c	2008-08-17 14:42:17.000000000 +0300
@@ -118,6 +118,29 @@
     swb_size_128_16, swb_size_128_16, swb_size_128_8
 };
 
+/** spectral coefficients codebook information */
+static const struct {
+    int16_t maxval;         ///< maximum possible value
+     int8_t range;          ///< value used in vector calculation
+} aac_cb_info[] = {
+    {    0, -1 }, // zero codebook
+    {    1,  3 },
+    {    1,  3 },
+    {    2,  3 },
+    {    2,  3 },
+    {    4,  9 },
+    {    4,  9 },
+    {    7,  8 },
+    {    7,  8 },
+    {   12, 13 },
+    {   12, 13 },
+    { 8191, 17 },
+    {   -1, -1 }, // reserved
+    {   -1, -1 }, // perceptual noise substitution
+    {   -1, -1 }, // intensity out-of-phase
+    {   -1, -1 }, // intensity in-phase
+};
+
 /** bits needed to code codebook run value for long windows */
 static const uint8_t run_value_bits_long[64] = {
      5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,
@@ -162,6 +185,16 @@
     MDCTContext mdct1024;                        ///< long (1024 samples) frame transform context
     MDCTContext mdct128;                         ///< short (128 samples) frame transform context
     DSPContext  dsp;
+    DECLARE_ALIGNED_16(FFTSample, output[2048]); ///< temporary buffer for MDCT input coefficients
+    int16_t* samples;                            ///< saved preprocessed input
+
+    int samplerate_index;                        ///< MPEG-4 samplerate index
+    const uint8_t *swb_sizes1024;                ///< scalefactor band sizes for long frame
+    int swb_num1024;                             ///< number of scalefactor bands for long frame
+    const uint8_t *swb_sizes128;                 ///< scalefactor band sizes for short frame
+    int swb_num128;                              ///< number of scalefactor bands for short frame
+
+    ChannelElement *cpe;                         ///< channel elements
     AACPsyContext psy;                           ///< psychoacoustic model context
     int last_frame;
 } AACEncContext;
@@ -221,7 +254,9 @@
 
     s->samples = av_malloc(2 * 1024 * avctx->channels * sizeof(s->samples[0]));
     s->cpe = av_mallocz(sizeof(ChannelElement) * aac_chan_configs[avctx->channels-1][0]);
-    if(ff_aac_psy_init(&s->psy, avctx, AAC_PSY_3GPP, aac_chan_configs[avctx->channels-1][0], 0, s->swb_sizes1024, s->swb_num1024, s->swb_sizes128, s->swb_num128) < 0){
+    if(ff_aac_psy_init(&s->psy, avctx, AAC_PSY_3GPP,
+                       aac_chan_configs[avctx->channels-1][0], 0,
+                       s->swb_sizes1024, s->swb_num1024, s->swb_sizes128, s->swb_num128) < 0){
         av_log(avctx, AV_LOG_ERROR, "Cannot initialize selected model.\n");
         return -1;
     }
@@ -231,14 +266,65 @@
     return 0;
 }
 
+static void apply_window_and_mdct(AVCodecContext *avctx, AACEncContext *s,
+                                  ChannelElement *cpe, short *audio, int channel)
+{
+    int i, j, k;
+    const float * lwindow = cpe->ch[channel].ics.use_kb_window[0] ? ff_aac_kbd_long_1024 : ff_sine_1024;
+    const float * swindow = cpe->ch[channel].ics.use_kb_window[0] ? ff_aac_kbd_short_128 : ff_sine_128;
+    const float * pwindow = cpe->ch[channel].ics.use_kb_window[1] ? ff_aac_kbd_short_128 : ff_sine_128;
+
+    if (cpe->ch[channel].ics.window_sequence[0] != EIGHT_SHORT_SEQUENCE) {
+        memcpy(s->output, cpe->ch[channel].saved, sizeof(float)*1024);
+        if(cpe->ch[channel].ics.window_sequence[0] == LONG_STOP_SEQUENCE){
+            memset(s->output, 0, sizeof(s->output[0]) * 448);
+            for(i = 448; i < 576; i++)
+                s->output[i] = cpe->ch[channel].saved[i] * pwindow[i - 448];
+            for(i = 576; i < 704; i++)
+                s->output[i] = cpe->ch[channel].saved[i];
+        }
+        if(cpe->ch[channel].ics.window_sequence[0] != LONG_START_SEQUENCE){
+            j = channel;
+            for (i = 0; i < 1024; i++, j += avctx->channels){
+                s->output[i+1024]         = audio[j] * lwindow[1024 - i - 1];
+                cpe->ch[channel].saved[i] = audio[j] * lwindow[i];
+            }
+        }else{
+            j = channel;
+            for(i = 0; i < 448; i++, j += avctx->channels)
+                s->output[i+1024]         = audio[j];
+            for(i = 448; i < 576; i++, j += avctx->channels)
+                s->output[i+1024]         = audio[j] * swindow[576 - i - 1];
+            memset(s->output+1024+576, 0, sizeof(s->output[0]) * 448);
+            j = channel;
+            for(i = 0; i < 1024; i++, j += avctx->channels)
+                cpe->ch[channel].saved[i] = audio[j];
+        }
+        ff_mdct_calc(&s->mdct1024, cpe->ch[channel].coeffs, s->output);
+    }else{
+        j = channel;
+        for (k = 0; k < 1024; k += 128) {
+            for(i = 448 + k; i < 448 + k + 256; i++)
+                s->output[i - 448 - k] = (i < 1024)
+                                         ? cpe->ch[channel].saved[i]
+                                         : audio[channel + (i-1024)*avctx->channels];
+            s->dsp.vector_fmul        (s->output,     k ?  swindow : pwindow, 128);
+            s->dsp.vector_fmul_reverse(s->output+128, s->output+128, swindow, 128);
+            ff_mdct_calc(&s->mdct128, cpe->ch[channel].coeffs + k, s->output);
+        }
+        j = channel;
+        for(i = 0; i < 1024; i++, j += avctx->channels)
+            cpe->ch[channel].saved[i] = audio[j];
+    }
+}
+
 /**
  * Encode ics_info element.
  * @see Table 4.6 (syntax of ics_info)
  */
-static void put_ics_info(AVCodecContext *avctx, IndividualChannelStream *info)
+static void put_ics_info(AACEncContext *s, IndividualChannelStream *info)
 {
-    AACEncContext *s = avctx->priv_data;
-    int i;
+    int wg;
 
     put_bits(&s->pb, 1, 0);                // ics_reserved bit
     put_bits(&s->pb, 2, info->window_sequence[0]);
@@ -248,15 +334,307 @@
         put_bits(&s->pb, 1, 0);            // no prediction
     }else{
         put_bits(&s->pb, 4, info->max_sfb);
-        for(i = 1; i < info->num_windows; i++)
-            put_bits(&s->pb, 1, info->group_len[i]);
+        for(wg = 0; wg < info->num_window_groups; wg++){
+            if(wg)
+                put_bits(&s->pb, 1, 0);
+            if(info->group_len[wg] > 1)
+                put_sbits(&s->pb, info->group_len[wg] - 1, 0xFF);
+        }
+    }
+}
+
+/**
+ * Encode MS data.
+ * @see 4.6.8.1 "Joint Coding - M/S Stereo"
+ */
+static void encode_ms_info(PutBitContext *pb, ChannelElement *cpe)
+{
+    int i, w, wg;
+
+    put_bits(pb, 2, cpe->ms_mode);
+    if(cpe->ms_mode == 1){
+        w = 0;
+        for(wg = 0; wg < cpe->ch[0].ics.num_window_groups; wg++){
+            for(i = 0; i < cpe->ch[0].ics.max_sfb; i++)
+                put_bits(pb, 1, cpe->ms_mask[w + i]);
+            w += cpe->ch[0].ics.group_len[wg]*16;
+        }
+    }
+}
+
+/**
+ * Calculate the number of bits needed to code all coefficient signs in current band.
+ */
+static int calculate_band_sign_bits(AACEncContext *s, ChannelElement *cpe, int channel,
+                                    int win, int group_len, int start, int size)
+{
+    int score = 0, start2 = start;
+    int i, w;
+    for(w = win; w < win + group_len; w++){
+        for(i = start2; i < start2 + size; i++){
+            if(cpe->ch[channel].icoefs[i])
+                score++;
+        }
+        start2 += 128;
+    }
+    return score;
+}
+
+/**
+ * Calculate the number of bits needed to code given band with given codebook.
+ *
+ * @param s       encoder context
+ * @param cpe     channel element
+ * @param channel channel number inside channel pair
+ * @param win     window group start number
+ * @param start   scalefactor band position in spectral coefficients
+ * @param size    scalefactor band size
+ * @param cb      codebook number
+ */
+static int calculate_band_bits(AACEncContext *s, ChannelElement *cpe, int channel,
+                               int win, int group_len, int start, int size, int cb)
+{
+    int i, j, w;
+    int score = 0, dim, idx, start2;
+    int range = aac_cb_info[cb].range;
+
+    if(range == -1) return 0;
+    cb--;
+    dim = cb < FIRST_PAIR_BT ? 4 : 2;
+
+    start2 = start;
+    if(IS_CODEBOOK_UNSIGNED(cb)){
+        int coef_abs[2];
+        for(w = win; w < win + group_len; w++){
+            for(i = start2; i < start2 + size; i += dim){
+                idx = 0;
+                for(j = 0; j < dim; j++){
+                    coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
+                    idx = idx * range + FFMIN(coef_abs[j], 16);
+                }
+                score += ff_aac_spectral_bits[cb][idx];
+                for(j = 0; j < dim; j++){
+                    if(cb == ESC_BT && coef_abs[j] > 15)
+                        score += av_log2(coef_abs[j]) * 2 - 4 + 1;
+                }
+            }
+            start2 += 128;
+        }
+    }else{
+        for(w = win; w < win + group_len; w++){
+            for(i = start2; i < start2 + size; i += dim){
+                idx = cpe->ch[channel].icoefs[i];
+                for(j = 1; j < dim; j++)
+                    idx = idx * range + cpe->ch[channel].icoefs[i+j];
+                //it turned out that all signed codebooks use the same offset for index coding
+                idx += 40;
+                score += ff_aac_spectral_bits[cb][idx];
+            }
+            start2 += 128;
+        }
+    }
+    return score;
+}
+
+/**
+ * Encode band info for single window group bands.
+ */
+static void encode_window_bands_info(AACEncContext *s, ChannelElement *cpe,
+                                     int channel, int win, int group_len)
+{
+    BandCodingPath path[64];
+    int band_bits[64][12];
+    int maxval;
+    int w, swb, cb, start, start2, size;
+    int i, j;
+    const int max_sfb = cpe->ch[channel].ics.max_sfb;
+    const int run_bits = cpe->ch[channel].ics.num_windows == 1 ? 5 : 3;
+    const int run_esc = (1 << run_bits) - 1;
+    int bits, sbits, idx, count;
+    int stack[64], stack_len;
+
+    start = win*128;
+    for(swb = 0; swb < max_sfb; swb++){
+        maxval = 0;
+        start2 = start;
+        size = cpe->ch[channel].ics.swb_sizes[swb];
+        if(cpe->ch[channel].zeroes[win*16 + swb])
+            maxval = 0;
+        else{
+            for(w = win; w < win + group_len; w++){
+                for(i = start2; i < start2 + size; i++){
+                    maxval = FFMAX(maxval, FFABS(cpe->ch[channel].icoefs[i]));
+                }
+                start2 += 128;
+            }
+        }
+        sbits = calculate_band_sign_bits(s, cpe, channel, win, group_len, start, size);
+        for(cb = 0; cb < 12; cb++){
+            if(aac_cb_info[cb].maxval < maxval)
+                band_bits[swb][cb] = INT_MAX;
+            else{
+                band_bits[swb][cb] = calculate_band_bits(s, cpe, channel, win, group_len, start, size, cb);
+                if(IS_CODEBOOK_UNSIGNED(cb-1)){
+                    band_bits[swb][cb] += sbits;
+                }
+            }
+        }
+        start += cpe->ch[channel].ics.swb_sizes[swb];
+    }
+    path[0].bits = 0;
+    for(i = 1; i <= max_sfb; i++)
+        path[i].bits = INT_MAX;
+    for(i = 0; i < max_sfb; i++){
+        for(cb = 0; cb < 12; cb++){
+            int sum = 0;
+            for(j = 1; j <= max_sfb - i; j++){
+                if(band_bits[i+j-1][cb] == INT_MAX)
+                    break;
+                sum += band_bits[i+j-1][cb];
+                bits = sum + path[i].bits + run_value_bits[cpe->ch[channel].ics.num_windows == 8][j];
+                if(bits < path[i+j].bits){
+                    path[i+j].bits     = bits;
+                    path[i+j].codebook = cb;
+                    path[i+j].prev_idx = i;
+                }
+            }
+        }
+    }
+    assert(path[max_sfb].bits != INT_MAX);
+
+    //convert resulting path from backward-linked list
+    stack_len = 0;
+    idx = max_sfb;
+    while(idx > 0){
+        stack[stack_len++] = idx;
+        idx = path[idx].prev_idx;
+    }
+
+    //perform actual band info encoding
+    start = 0;
+    for(i = stack_len - 1; i >= 0; i--){
+        put_bits(&s->pb, 4, path[stack[i]].codebook);
+        count = stack[i] - path[stack[i]].prev_idx;
+        memset(cpe->ch[channel].zeroes + win*16 + start, !path[stack[i]].codebook, count);
+        //XXX: memset when band_type is also uint8_t
+        for(j = 0; j < count; j++){
+            cpe->ch[channel].band_type[win*16 + start] =  path[stack[i]].codebook;
+            start++;
+        }
+        while(count >= run_esc){
+            put_bits(&s->pb, run_bits, run_esc);
+            count -= run_esc;
+        }
+        put_bits(&s->pb, run_bits, count);
+    }
+}
+
+/**
+ * Encode the coefficients of one scalefactor band with selected codebook.
+ */
+static void encode_band_coeffs(AACEncContext *s, ChannelElement *cpe, int channel,
+                               int start, int size, int cb)
+{
+    const uint8_t  *bits  = ff_aac_spectral_bits [cb - 1];
+    const uint16_t *codes = ff_aac_spectral_codes[cb - 1];
+    const int range = aac_cb_info[cb].range;
+    const int dim = (cb < FIRST_PAIR_BT) ? 4 : 2;
+    int i, j, idx;
+
+    //do not encode zero or special codebooks
+    if(range == -1) return;
+
+    if(cb == ESC_BT){
+        int coef_abs[2];
+        for(i = start; i < start + size; i += dim){
+            idx = 0;
+            for(j = 0; j < dim; j++){
+                coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
+                idx = idx*17 + FFMIN(coef_abs[j], 16);
+            }
+            put_bits(&s->pb, bits[idx], codes[idx]);
+            //output signs
+            for(j = 0; j < dim; j++)
+                if(cpe->ch[channel].icoefs[i+j])
+                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
+            //output escape values
+            for(j = 0; j < dim; j++)
+                if(coef_abs[j] > 15){
+                    int len = av_log2(coef_abs[j]);
+
+                    put_bits(&s->pb, len - 4 + 1, (1 << (len - 4 + 1)) - 2);
+                    put_bits(&s->pb, len, coef_abs[j] & ((1 << len) - 1));
+                }
+        }
+    }else if(IS_CODEBOOK_UNSIGNED(cb)){
+        for(i = start; i < start + size; i += dim){
+            idx = FFABS(cpe->ch[channel].icoefs[i]);
+            for(j = 1; j < dim; j++)
+                idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
+            put_bits(&s->pb, bits[idx], codes[idx]);
+            //output signs
+            for(j = 0; j < dim; j++)
+                if(cpe->ch[channel].icoefs[i+j])
+                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
+        }
+    }else{
+        for(i = start; i < start + size; i += dim){
+            idx = cpe->ch[channel].icoefs[i];
+            for(j = 1; j < dim; j++)
+                idx = idx * range + cpe->ch[channel].icoefs[i+j];
+            //it turned out that all signed codebooks use the same offset for index coding
+            idx += 40;
+            put_bits(&s->pb, bits[idx], codes[idx]);
+        }
+    }
+}
+
+/**
+ * Encode scalefactor band coding type.
+ */
+static void encode_band_info(AACEncContext *s, ChannelElement *cpe, int channel)
+{
+    int w, wg;
+
+    w = 0;
+    for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
+        encode_window_bands_info(s, cpe, channel, w, cpe->ch[channel].ics.group_len[wg]);
+        w += cpe->ch[channel].ics.group_len[wg];
+    }
+}
+
+/**
+ * Encode scalefactors.
+ */
+static void encode_scale_factors(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel, int global_gain)
+{
+    int off = global_gain, diff;
+    int i, w, wg;
+
+    w = 0;
+    for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
+        for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
+            if(!cpe->ch[channel].zeroes[w*16 + i]){
+                /* if we have encountered scale=256 it means empty band
+                 * which was decided to be coded by encoder, so assign it
+                 * last scalefactor value for compression efficiency
+                 */
+                if(cpe->ch[channel].sf_idx[w*16 + i] == 256)
+                    cpe->ch[channel].sf_idx[w*16 + i] = off;
+                diff = cpe->ch[channel].sf_idx[w*16 + i] - off + SCALE_DIFF_ZERO;
+                if(diff < 0 || diff > 120) av_log(avctx, AV_LOG_ERROR, "Scalefactor difference is too big to be coded\n");
+                off = cpe->ch[channel].sf_idx[w*16 + i];
+                put_bits(&s->pb, ff_aac_scalefactor_bits[diff], ff_aac_scalefactor_code[diff]);
+            }
+        }
+        w += cpe->ch[channel].ics.group_len[wg];
     }
 }
 
 /**
  * Encode pulse data.
  */
-static void encode_pulses(AVCodecContext *avctx, AACEncContext *s, Pulse *pulse, int channel)
+static void encode_pulses(AACEncContext *s, Pulse *pulse, int channel)
 {
     int i;
 
@@ -272,9 +650,50 @@
 }
 
 /**
+ * Encode temporal noise shaping data.
+ */
+static void encode_tns_data(AACEncContext *s, ChannelElement *cpe, int channel)
+{
+    int i, w;
+    TemporalNoiseShaping *tns = &cpe->ch[channel].tns;
+
+    put_bits(&s->pb, 1, tns->present);
+    if(!tns->present) return;
+    if(cpe->ch[channel].ics.window_sequence[0] == EIGHT_SHORT_SEQUENCE){
+        for(w = 0; w < cpe->ch[channel].ics.num_windows; w++){
+            put_bits(&s->pb, 1, tns->n_filt[w]);
+            if(!tns->n_filt[w]) continue;
+            put_bits(&s->pb, 1, tns->coef_res[w] - 3);
+            put_bits(&s->pb, 4, tns->length[w][0]);
+            put_bits(&s->pb, 3, tns->order[w][0]);
+            if(tns->order[w][0]){
+                put_bits(&s->pb, 1, tns->direction[w][0]);
+                put_bits(&s->pb, 1, tns->coef_compress[w][0]);
+                for(i = 0; i < tns->order[w][0]; i++)
+                     put_bits(&s->pb, tns->coef_len[w][0], tns->coef[w][0][i]);
+            }
+        }
+    }else{
+        put_bits(&s->pb, 1, tns->n_filt[0]);
+        if(!tns->n_filt[0]) return;
+        put_bits(&s->pb, 1, tns->coef_res[0] - 3);
+        for(w = 0; w < tns->n_filt[0]; w++){
+            put_bits(&s->pb, 6, tns->length[0][w]);
+            put_bits(&s->pb, 5, tns->order[0][w]);
+            if(tns->order[0][w]){
+                put_bits(&s->pb, 1, tns->direction[0][w]);
+                put_bits(&s->pb, 1, tns->coef_compress[0][w]);
+                for(i = 0; i < tns->order[0][w]; i++)
+                     put_bits(&s->pb, tns->coef_len[0][w], tns->coef[0][w][i]);
+            }
+        }
+    }
+}
+
+/**
  * Encode spectral coefficients processed by psychoacoustic model.
  */
-static void encode_spectral_coeffs(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
+static void encode_spectral_coeffs(AACEncContext *s, ChannelElement *cpe, int channel)
 {
     int start, i, w, w2, wg;
 
@@ -287,7 +706,9 @@
                 continue;
             }
             for(w2 = w; w2 < w + cpe->ch[channel].ics.group_len[wg]; w2++){
-                encode_band_coeffs(s, cpe, channel, start + w2*128, cpe->ch[channel].ics.swb_sizes[i], cpe->ch[channel].band_type[w*16 + i]);
+                encode_band_coeffs(s, cpe, channel, start + w2*128,
+                                   cpe->ch[channel].ics.swb_sizes[i],
+                                   cpe->ch[channel].band_type[w*16 + i]);
             }
             start += cpe->ch[channel].ics.swb_sizes[i];
         }
@@ -296,6 +717,38 @@
 }
 
 /**
+ * Encode one channel of audio data.
+ */
+static int encode_individual_channel(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
+{
+    int g, w, wg;
+    int global_gain = 0;
+
+    //determine global gain as standard recommends - the first scalefactor value
+    w = 0;
+    for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
+        for(g = 0; g < cpe->ch[channel].ics.max_sfb; g++){
+            if(!cpe->ch[channel].zeroes[w + g]){
+                global_gain = cpe->ch[channel].sf_idx[w + g];
+                break;
+            }
+        }
+        if(global_gain) break;
+        w += cpe->ch[channel].ics.group_len[wg]*16;
+    }
+
+    put_bits(&s->pb, 8, global_gain);
+    if(!cpe->common_window) put_ics_info(s, &cpe->ch[channel].ics);
+    encode_band_info(s, cpe, channel);
+    encode_scale_factors(avctx, s, cpe, channel, global_gain);
+    encode_pulses(s, &cpe->ch[channel].pulse, channel);
+    encode_tns_data(s, cpe, channel);
+    put_bits(&s->pb, 1, 0); //ssr
+    encode_spectral_coeffs(s, cpe, channel);
+    return 0;
+}
+
+/**
  * Write some auxiliary information about the created AAC file.
  */
 static void put_bitstream_info(AVCodecContext *avctx, AACEncContext *s, const char *name)
@@ -315,6 +768,80 @@
     put_bits(&s->pb, 12 - padbits, 0);
 }
 
+static int aac_encode_frame(AVCodecContext *avctx,
+                            uint8_t *frame, int buf_size, void *data)
+{
+    AACEncContext *s = avctx->priv_data;
+    int16_t *samples = s->samples, *samples2, *la;
+    ChannelElement *cpe;
+    int i, j, chans, tag, start_ch;
+    const uint8_t *chan_map = aac_chan_configs[avctx->channels-1];
+    int chan_el_counter[4];
+
+    if(s->last_frame)
+        return 0;
+    if(data){
+        if((s->psy.flags & PSY_MODEL_NO_PREPROC) == PSY_MODEL_NO_PREPROC){
+            memcpy(s->samples + 1024 * avctx->channels, data, 1024 * avctx->channels * sizeof(s->samples[0]));
+        }else{
+            start_ch = 0;
+            samples2 = s->samples + 1024 * avctx->channels;
+            for(i = 0; i < chan_map[0]; i++){
+                tag = chan_map[i+1];
+                chans = tag == TYPE_CPE ? 2 : 1;
+                ff_aac_psy_preprocess(&s->psy, (uint16_t*)data + start_ch, samples2 + start_ch, i, tag);
+                start_ch += chans;
+            }
+        }
+    }
+    if(!avctx->frame_number){
+        memcpy(s->samples, s->samples + 1024 * avctx->channels, 1024 * avctx->channels * sizeof(s->samples[0]));
+        return 0;
+    }
+
+    init_put_bits(&s->pb, frame, buf_size*8);
+    if((avctx->frame_number & 0xFF)==1 && !(avctx->flags & CODEC_FLAG_BITEXACT)){
+        put_bitstream_info(avctx, s, LIBAVCODEC_IDENT);
+    }
+    start_ch = 0;
+    memset(chan_el_counter, 0, sizeof(chan_el_counter));
+    for(i = 0; i < chan_map[0]; i++){
+        tag = chan_map[i+1];
+        chans = tag == TYPE_CPE ? 2 : 1;
+        cpe = &s->cpe[i];
+        samples2 = samples + start_ch;
+        la = samples2 + 1024 * avctx->channels + start_ch;
+        if(!data) la = NULL;
+        ff_aac_psy_suggest_window(&s->psy, samples2, la, i, tag, cpe);
+        for(j = 0; j < chans; j++){
+            apply_window_and_mdct(avctx, s, cpe, samples2, j);
+        }
+        ff_aac_psy_analyze(&s->psy, i, tag, cpe);
+        put_bits(&s->pb, 3, tag);
+        put_bits(&s->pb, 4, chan_el_counter[tag]++);
+        if(chans == 2){
+            put_bits(&s->pb, 1, cpe->common_window);
+            if(cpe->common_window){
+                put_ics_info(s, &cpe->ch[0].ics);
+                encode_ms_info(&s->pb, cpe);
+            }
+        }
+        for(j = 0; j < chans; j++){
+            encode_individual_channel(avctx, s, cpe, j);
+        }
+        start_ch += chans;
+    }
+
+    put_bits(&s->pb, 3, TYPE_END);
+    flush_put_bits(&s->pb);
+    avctx->frame_bits = put_bits_count(&s->pb);
+
+    if(!data)
+        s->last_frame = 1;
+    memcpy(s->samples, s->samples + 1024 * avctx->channels, 1024 * avctx->channels * sizeof(s->samples[0]));
+    return put_bits_count(&s->pb)>>3;
+}
+
 static av_cold int aac_encode_end(AVCodecContext *avctx)
 {
     AACEncContext *s = avctx->priv_data;



More information about the ffmpeg-devel mailing list