[FFmpeg-devel] [RFC] AAC Encoder

Michael Niedermayer michaelni
Fri Aug 15 21:05:27 CEST 2008


On Fri, Aug 15, 2008 at 07:59:52PM +0300, Kostya wrote:
> On Thu, Aug 14, 2008 at 03:38:17PM +0200, Michael Niedermayer wrote:
> > viterbi for determining band_types ...
> > look this isnt hard, its not even slow in this paricular case,let me explain
> [explanation skipped]
> 
> Hmm, I have not understood it at the beginning but then I found out it's
> strikingly similar to the one-pass almost-optimal LZ matching scheme.
> So here it is with other comments taken care of too.
> 
> P.S. I was surprised to find out that I won't be near computer next week
> so I'll try to make encoder fit for committing ASAP.

[...]

> --- /home/kst/cvs-get/ffmpeg/libavcodec/aacenc.c	2008-08-14 19:01:30.000000000 +0300
> +++ aacenc.c	2008-08-15 19:50:22.000000000 +0300
> @@ -27,8 +27,7 @@
>  /***********************************
>   *              TODOs:
>   * psy model selection with some option
> - * change greedy codebook search into something more optimal, like Viterbi algorithm
> - * determine run lengths along with codebook
> + * add sane pulse detection
>   ***********************************/
>  
>  #include "avcodec.h"

ok


> @@ -119,6 +118,34 @@
>      swb_size_128_16, swb_size_128_16, swb_size_128_8
>  };
>  
> +#define CB_UNSIGNED 0x01    ///< coefficients are coded as absolute values
> +#define CB_PAIRS    0x02    ///< coefficients are grouped into pairs before coding (quads by default)
> +#define CB_ESCAPE   0x04    ///< codebook allows escapes
> +
> +/** spectral coefficients codebook information */
> +static const struct {
> +    int16_t maxval;         ///< maximum possible value
> +     int8_t cb_num;         ///< codebook number
> +    uint8_t flags;          ///< codebook features
> +} aac_cb_info[] = {
> +    {    0, -1, CB_UNSIGNED }, // zero codebook
> +    {    1,  0, 0 },
> +    {    1,  1, 0 },
> +    {    2,  2, CB_UNSIGNED },
> +    {    2,  3, CB_UNSIGNED },
> +    {    4,  4, CB_PAIRS },
> +    {    4,  5, CB_PAIRS },
> +    {    7,  6, CB_PAIRS | CB_UNSIGNED },
> +    {    7,  7, CB_PAIRS | CB_UNSIGNED },
> +    {   12,  8, CB_PAIRS | CB_UNSIGNED },
> +    {   12,  9, CB_PAIRS | CB_UNSIGNED },
> +    { 8191, 10, CB_PAIRS | CB_UNSIGNED | CB_ESCAPE },
> +    {   -1, -1, 0 }, // reserved
> +    {   -1, -1, 0 }, // perceptual noise substitution
> +    {   -1, -1, 0 }, // intensity out-of-phase
> +    {   -1, -1, 0 }, // intensity in-phase
> +};

cb_num is useless index-1 will work as well.


[...]

> +/**
> + * AAC encoder context
> + */
> +typedef struct {
> +    PutBitContext pb;
> +    MDCTContext mdct1024;                        ///< long (1024 samples) frame transform context
> +    MDCTContext mdct128;                         ///< short (128 samples) frame transform context
> +    DSPContext  dsp;

ok (and ok always means up to the previous non ">" line)


[...]


> +} AACEncContext;
> +
> +/**
>   * Make AAC audio config object.
>   * @see 1.6.2.1 "Syntax - AudioSpecificConfig"
>   */
> @@ -176,6 +236,11 @@
>      dsputil_init(&s->dsp, avctx);
>      ff_mdct_init(&s->mdct1024, 11, 0);
>      ff_mdct_init(&s->mdct128,   8, 0);
> +    // window init
> +    ff_kbd_window_init(ff_aac_kbd_long_1024, 4.0, 1024);
> +    ff_kbd_window_init(ff_aac_kbd_short_128, 6.0, 128);
> +    ff_sine_window_init(ff_sine_1024, 1024);
> +    ff_sine_window_init(ff_sine_128, 128);
>  
>      s->samples = av_malloc(2 * 1024 * avctx->channels * sizeof(s->samples[0]));
>      s->cpe = av_mallocz(sizeof(ChannelElement) * aac_chan_configs[avctx->channels-1][0]);

ok


[...]
> +/**
> + * Encode MS data.
> + * @see 4.6.8.1 "Joint Coding - M/S Stereo"
> + */
> +static void encode_ms_info(PutBitContext *pb, ChannelElement *cpe)
> +{
> +    int i, w, wg;
> +
> +    put_bits(pb, 2, cpe->ms.present);
> +    if(cpe->ms.present == 1){
> +        w = 0;
> +        for(wg = 0; wg < cpe->ch[0].ics.num_window_groups; wg++){
> +            for(i = 0; i < cpe->ch[0].ics.max_sfb; i++)
> +                put_bits(pb, 1, cpe->ms.mask[w][i]);
> +            w += cpe->ch[0].ics.group_len[wg];
> +        }
>      }
>  }

this will not work with the data structs of the decoder
ms_mask is 120 elements
also the new group_len is still leaving holes in the arrays, its
surely better now as it doesnt loop over the 0 elements anymore but
they are still there.
I do not see why they should be there, it does not appear that there
is any advantage in them being there ... but if iam wrong iam sure you
will explain what they are good for?


>  
>  /**
> + * Return number of bits needed to write codebook run length value.
> + *
> + * @param run     run length
> + * @param bits    number of bits used to code value (5 for long frames, 3 for short frames)
> + */
> +static av_always_inline int calculate_run_bits(int run, const int bits)
> +{
> +    int esc = (1 << bits) - 1;
> +    return (1 + (run >= esc)) * bits;
> +}

I think a table would be simpler and faster.


> +
> +/**
> + * Calculate the number of bits needed to code given band with given codebook.
> + *
> + * @param s       encoder context
> + * @param cpe     channel element
> + * @param channel channel number inside channel pair
> + * @param win     window group start number
> + * @param start   scalefactor band position in spectral coefficients
> + * @param size    scalefactor band size
> + * @param cb      codebook number
> + */
> +static int calculate_band_bits(AACEncContext *s, ChannelElement *cpe, int channel, int win, int group_len, int start, int size, int cb)
> +{
> +    int i, j, w;
> +    int score = 0, dim, idx, start2;
> +    int range;
> +
> +    if(!cb) return 0;
> +    cb--;

> +    dim = (aac_cb_info[cb].flags & CB_PAIRS) ? 2 : 4;

as CB_PAIRS is never used for anything but selecting betweem 2 and 4, it 
would be simpler to store that or maybe drop CB_PAIRS completely and use
the > 123 check like in the decoder


> +    if(aac_cb_info[cb].flags & CB_UNSIGNED)
> +        range = aac_cb_info[cb].maxval + 1;
> +    else
> +        range = aac_cb_info[cb].maxval*2 + 1;

it would be simpler to store range in the table i think.


> +
> +    start2 = start;
> +    if(aac_cb_info[cb].flags & CB_ESCAPE){
> +        int coef_abs[2];
> +        for(w = win; w < win + group_len; w++){
> +            for(i = start2; i < start2 + size; i += dim){
> +                idx = 0;
> +                for(j = 0; j < dim; j++)
> +                    coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> +                for(j = 0; j < dim; j++)
> +                    idx = idx*17 + FFMIN(coef_abs[j], 16);
> +                score += ff_aac_spectral_bits[cb][idx];
> +                for(j = 0; j < dim; j++)
> +                    if(cpe->ch[channel].icoefs[i+j])
> +                        score++;
> +                for(j = 0; j < dim; j++)
> +                    if(coef_abs[j] > 15)
> +                        score += av_log2(coef_abs[j]) * 2 - 4 + 1;

please merge the
for(j = 0; j < dim; j++)
loops
and this applies to more than just the part above


> +            }
> +            start2 += 128;
> +       }
> +    }else if(aac_cb_info[cb].flags & CB_UNSIGNED){
> +        for(w = win; w < win + group_len; w++){
> +            for(i = start2; i < start2 + size; i += dim){
> +                idx = 0;
> +                for(j = 0; j < dim; j++)
> +                    idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
> +                score += ff_aac_spectral_bits[cb][idx];
> +                for(j = 0; j < dim; j++)
> +                     if(cpe->ch[channel].icoefs[i+j])
> +                         score++;
> +            }
> +            start2 += 128;
> +        }
> +    }else{
> +        for(w = win; w < win + group_len; w++){
> +            for(i = start2; i < start2 + size; i += dim){
> +                idx = 0;

> +                for(j = 0; j < dim; j++)
> +                    idx = idx * range + cpe->ch[channel].icoefs[i+j] + aac_cb_info[cb].maxval;
> +                score += ff_aac_spectral_bits[cb][idx];
> +            }

the addition of maxval can be factored out of the dim loop, its effect is just
the addition of a constant to idx no matter what icoefs contains.


> +            start2 += 128;
> +        }
> +    }
> +    return score;
> +}
> +

> +/**
> + * Encode band info for single window group bands.
> + */
> +static void encode_window_bands_info(AACEncContext *s, ChannelElement *cpe, int channel, int win, int group_len){
> +    int maxval;
> +    int w, swb, cb, ccb, start, start2, size;
> +    int i, j, k;
> +    const int max_sfb = cpe->ch[channel].ics.max_sfb;
> +    const int run_bits = cpe->ch[channel].ics.num_windows == 1 ? 5 : 3;
> +    const int run_esc = (1 << run_bits) - 1;
> +    int bits, idx, count;
> +    int stack[64], stack_len;
> +
> +    start = win*128;
> +    for(swb = 0; swb < max_sfb; swb++){
> +        maxval = 0;
> +        start2 = start;
> +        size = cpe->ch[channel].ics.swb_sizes[swb];
> +        if(cpe->ch[channel].zeroes[win][swb])
> +            maxval = 0;
> +        else{
> +            for(w = win; w < win + group_len; w++){
> +                for(i = start2; i < start2 + size; i++){
> +                    maxval = FFMAX(maxval, FFABS(cpe->ch[channel].icoefs[i]));
> +                }
> +                start2 += 128;
> +            }
> +        }
> +        for(cb = 0; cb < 12; cb++){
> +            if(aac_cb_info[cb].maxval < maxval)
> +                s->band_bits[swb][cb] = INT_MAX;
> +            else
> +                s->band_bits[swb][cb] = calculate_band_bits(s, cpe, channel, win, group_len, start, size, cb);
> +        }
> +        start += cpe->ch[channel].ics.swb_sizes[swb];
> +    }

> +    s->path[0].bits = 0;
> +    for(i = 1; i <= max_sfb; i++)
> +        s->path[i].bits = INT_MAX;
> +    for(i = 0; i < max_sfb; i++){
> +        for(j = 1; j <= max_sfb - i; j++){
> +            bits = INT_MAX;
> +            ccb = 0;
> +            for(cb = 0; cb < 12; cb++){
> +                int sum = 0;
> +                for(k = 0; k < j; k++){
> +                    if(s->band_bits[i + k][cb] == INT_MAX){
> +                        sum = INT_MAX;
> +                        break;
> +                    }
> +                    sum += s->band_bits[i + k][cb];
> +                }
> +                if(sum < bits){
> +                    bits = sum;
> +                    ccb  = cb;
> +                }
> +            }
> +            assert(bits != INT_MAX);
> +            bits += s->path[i].bits + calculate_run_bits(j, run_bits);
> +            if(bits < s->path[i+j].bits){
> +                s->path[i+j].bits     = bits;
> +                s->path[i+j].codebook = ccb;
> +                s->path[i+j].prev_idx = i;
> +            }
> +        }
> +    }

hmm this is doing a loop more than it should ...
(note code below ignores [-1] and INT_MAX+a issues)

s->path[-1].bits= 0;
for(i = 0; i < max_sfb; i++){
    s->path[i].bits= INT_MAX;
    for(cb = 0; cb < 12; cb++){
        int sum=0;
        for(k = 0; k <= i; k++){
            sum += s->band_bits[i - k][cb];
            sum2= sum + calculate_run_bits(k, run_bits) + s->path[i-k-1].bits;
            if(sum2 < s->path[i].bits){
                s->path[i].bits= sum2;
                s->path[i].codebook= cb;
                s->path[i].prev_idx= i - k - 1;
            }else if(sum2 - s->path[i].bits > THRESHOLD) // early termination to skip impossible cases
                break;
        }
    }
}


> +
> +    //convert resulting path from backward-linked list
> +    stack_len = 0;
> +    idx = max_sfb;
> +    while(idx > 0){
> +        stack[stack_len++] = idx;
> +        idx = s->path[idx].prev_idx;
> +    }
> +
> +    //perform actual band info encoding
> +    start = 0;
> +    for(i = stack_len - 1; i >= 0; i--){
> +        put_bits(&s->pb, 4, s->path[stack[i]].codebook);
> +        count = stack[i] - s->path[stack[i]].prev_idx;

> +        for(j = 0; j < count; j++){
> +            cpe->ch[channel].band_type[win][start] =  s->path[stack[i]].codebook;
> +            cpe->ch[channel].zeroes[win][start]    = !s->path[stack[i]].codebook;
> +            start++;
> +        }

memset


> +        while(count >= run_esc){
> +            put_bits(&s->pb, run_bits, run_esc);
> +            count -= run_esc;
> +        }
> +        put_bits(&s->pb, run_bits, count);
> +    }
> +}
> +

> +/**
> + * Encode one scalefactor band with selected codebook.
> + */
> +static void encode_band_coeffs(AACEncContext *s, ChannelElement *cpe, int channel, int start, int size, int cb)
> +{
> +    const uint8_t  *bits  = ff_aac_spectral_bits [aac_cb_info[cb].cb_num];
> +    const uint16_t *codes = ff_aac_spectral_codes[aac_cb_info[cb].cb_num];
> +    const int dim = (aac_cb_info[cb].flags & CB_PAIRS) ? 2 : 4;
> +    int i, j, idx, range;
> +
> +    if(!bits) return;
> +
> +    if(aac_cb_info[cb].flags & CB_UNSIGNED)
> +        range = aac_cb_info[cb].maxval + 1;
> +    else
> +        range = aac_cb_info[cb].maxval*2 + 1;
> +
> +    if(aac_cb_info[cb].flags & CB_ESCAPE){
> +        int coef_abs[2];
> +        for(i = start; i < start + size; i += dim){
> +            idx = 0;

> +            for(j = 0; j < dim; j++)
> +                coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> +            for(j = 0; j < dim; j++)
> +                idx = idx*17 + FFMIN(coef_abs[j], 16);

the loops can be merged


> +            put_bits(&s->pb, bits[idx], codes[idx]);
> +            //output signs
> +            for(j = 0; j < dim; j++)
> +                if(cpe->ch[channel].icoefs[i+j])
> +                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> +            //output escape values
> +            for(j = 0; j < dim; j++)
> +                if(coef_abs[j] > 15){
> +                    int len = av_log2(coef_abs[j]);
> +
> +                    put_bits(&s->pb, len - 4 + 1, (1 << (len - 4 + 1)) - 2);
> +                    put_bits(&s->pb, len, coef_abs[j] & ((1 << len) - 1));
> +                }
> +        }
> +    }else if(aac_cb_info[cb].flags & CB_UNSIGNED){
> +        for(i = start; i < start + size; i += dim){
> +            idx = 0;
> +            for(j = 0; j < dim; j++)
> +                idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
> +            put_bits(&s->pb, bits[idx], codes[idx]);
> +            //output signs
> +            for(j = 0; j < dim; j++)
> +                if(cpe->ch[channel].icoefs[i+j])
> +                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> +        }
> +    }else{
> +        for(i = start; i < start + size; i += dim){

> +            idx = 0;
> +            for(j = 0; j < dim; j++)
> +                idx = idx * range + cpe->ch[channel].icoefs[i+j] + aac_cb_info[cb].maxval;

the add maxval can be factored out
SUM(0,i,dim) maxval*range^i is a constant


[...]

> +/**
> + * Encode scalefactors.
> + */
> +static void encode_scale_factors(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel, int global_gain)
> +{
> +    int off = global_gain, diff;
> +    int i, w, wg;
> +
> +    w = 0;
> +    for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
> +        for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> +            if(!cpe->ch[channel].zeroes[w][i]){

> +                if(cpe->ch[channel].sf_idx[w][i] == 256) cpe->ch[channel].sf_idx[w][i] = off;

what is 256 ?
and please write
if(condition)
    statement;

its more readable than
if(condition) statement;
when condition and statement are complex

[...]


> +/**
> + * Encode pulse data.
> + */
> +static void encode_pulses(AVCodecContext *avctx, AACEncContext *s, Pulse *pulse, int channel)
> +{
> +    int i;
> +
> +    put_bits(&s->pb, 1, !!pulse->num_pulse);
> +    if(!pulse->num_pulse) return;
> +
> +    put_bits(&s->pb, 2, pulse->num_pulse - 1);
> +    put_bits(&s->pb, 6, pulse->start);
> +    for(i = 0; i < pulse->num_pulse; i++){
> +        put_bits(&s->pb, 5, pulse->offset[i]);
> +        put_bits(&s->pb, 4, pulse->amp[i]);
> +    }
> +}

ok


[...]
> +
> +/**
> + * Encode spectral coefficients processed by psychoacoustic model.
> + */
> +static void encode_spectral_coeffs(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
> +{
> +    int start, i, w, w2, wg;
> +
> +    w = 0;
> +    for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
> +        start = 0;
> +        for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> +            if(cpe->ch[channel].zeroes[w][i]){
> +                start += cpe->ch[channel].ics.swb_sizes[i];
> +                continue;
> +            }
> +            for(w2 = w; w2 < w + cpe->ch[channel].ics.group_len[wg]; w2++){
> +                encode_band_coeffs(s, cpe, channel, start + w2*128, cpe->ch[channel].ics.swb_sizes[i], cpe->ch[channel].band_type[w][i]);
> +            }
> +            start += cpe->ch[channel].ics.swb_sizes[i];
> +        }
> +        w += cpe->ch[channel].ics.group_len[wg];
> +    }
> +}

ok


> +
> +/**
> + * Encode one channel of audio data.
> + */
> +static int encode_individual_channel(AVCodecContext *avctx, ChannelElement *cpe, int channel)
> +{
> +    AACEncContext *s = avctx->priv_data;
> +    int g, w, wg;

> +    int global_gain;
> +
> +    //determine global gain as standard recommends - the first scalefactor value
> +    global_gain = 0;

declaration and init can be merged


[...]

> +    init_put_bits(&s->pb, frame, buf_size*8);
> +    if(avctx->frame_number==1 && !(avctx->flags & CODEC_FLAG_BITEXACT)){
> +        put_bitstream_info(avctx, s, LIBAVCODEC_IDENT);
> +    }

this still does not look like it is stored in extradata and neither is it
repeated.

[...]

ill review psychoacoustics ASAP, yes i need little pauses :)


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Let us carefully observe those good qualities wherein our enemies excel us
and endeavor to excel them, by avoiding what is faulty, and imitating what
is excellent in them. -- Plutarch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080815/72099d18/attachment.pgp>



More information about the ffmpeg-devel mailing list