[FFmpeg-devel] [RFC] AAC Encoder

Kostya kostya.shishkov
Sun Aug 17 13:46:47 CEST 2008


On Sat, Aug 16, 2008 at 11:06:33PM +0200, Michael Niedermayer wrote:
> On Sat, Aug 16, 2008 at 06:00:39PM +0300, Kostya wrote:
[...]
> > > [...]
> > > > /**
> > > >  * Quantize one coefficient.
> > > >  * @return absolute value of the quantized coefficient
> > > >  * @see 3GPP TS26.403 5.6.2 "Scalefactor determination"
> > > >  */
> > > > static av_always_inline int quant(float coef, const float Q)
> > > > {
> > > >     return av_clip((int)(pow(fabsf(coef) * Q, 0.75) + 0.4054), 0, 8191);
> > > > }
> > > 
> > > converting float to int by casting is rather slow on x86
> > > also i do not see why the cliping against 0 is done
> > > 
> > > and where does the 0.4054 come from? How has this value been selected?
> > 
> > ask 3GPP folks, in their spec (there's a reference in the comment above)
> > it's also called MAGIC_NUMBER.
> 
> ideg
> morons
> anyway, its
> 1.0 - 0.5^0.75
> 
> and i seriously doubt this is optimal in the psychoacoustic sense or any
> rate distortion sense.
> It IS optimal in the "least squares distortion but i dont care about the bits"
> sense
> please add a note that this constant needs to be finetuned with listening
> tests or some more solid math!
> 
> 
> > 
> > as for clipping, it seemed more logical than applying FFMIN()
> 
> speaking of cliping, can this even overflow 8191? and if so is it even
> correct to clip?
> most signals do not like being cliped randomly
 
values > 8191 can't be coded with AAC codebook

[...] 
> > Per my understanding, 8 short windows sequence is used for better preserving
> > of sudden change of signal. And the change decision is implemented after
> > the same 3GPP spec.
> 
> thats also my understanding, but i have my doubts about the actual way
> that is used to detect such changes.
> I suspect that  doing the windowd MDCT for both cases and
> picking the one which has a lower sum of absolute coeffs would work better.
> of course this would absolutely NEED a listening test as i may be dead wrong
> iam just applying what i know form video coding to audio ...
 
well, I know a bit too and looks like nothing helps from other fields,
well, almost.

[...]
> > > >         for(ch = 0; ch < chans; ch++){
> > > >             prev_scale = -1;
> > > >             for(w = 0; w < cpe->ch[ch].ics.num_windows; w++){
> > > >                 for(g = 0; g < cpe->ch[ch].ics.num_swb; g++){
> > > >                     g2 = w*16 + g;
> > > 
> > > >                     cpe->ch[ch].zeroes[w][g] = pch->band[ch][g2].thr >= pch->band[ch][g2].energy;
> > > 
> > > how much quality is lost compared to full RD decission ? its just a matter of
> > > checking how many bits this would need which is likely negligible speed wise.
> > > (assuming you can unentangle the threshold check into a distortion
> > >  computation)
> >  
> > well, energy < threshold means resulting band will be zero anyway,
> > and without that check weird values for perceptual entropy start
> > to appear
> 
> iam perfectly fine with making bands that would quantize to all zero, 
> "zero bands" but this again isnt optimal because a band with just a single
> +1 coeff almost certainly would do better as "zero band" as well.

that's a question too. But for that threshold-manipulating tricks are used

> Also what about these noise bands? i think they are never used ...
 
not yet.
 
> >  
> > > >                     if(cpe->ch[ch].zeroes[w][g]) continue;
> > > >                     //spec gives constant for lg() but we scaled it for log2()
> > > >                     cpe->ch[ch].sf_idx[w][g] = (int)(2.66667 * (log2(6.75*pch->band[ch][g2].thr) - log2(pch->band[ch][g2].ffac)));
> > > 
> > > 2.666... * log2(6.75*pch->band[ch][g2].thr / pch->band[ch][g2].ffac)
> > > 
> > > 
> > > 
> > > >                     if(prev_scale != -1)
> > > >                         cpe->ch[ch].sf_idx[w][g] = av_clip(cpe->ch[ch].sf_idx[w][g], prev_scale - SCALE_MAX_DIFF, prev_scale + SCALE_MAX_DIFF);
> > > >                     prev_scale = cpe->ch[ch].sf_idx[w][g];
> > > >                 }
> > > >             }
> > > >         }
> > > >         break;
> > > >     case PSY_MODE_QUALITY:
> > > >         for(ch = 0; ch < chans; ch++){
> > > >             start = 0;
> > > >             for(w = 0; w < cpe->ch[ch].ics.num_windows; w++){
> > > >                 for(g = 0; g < cpe->ch[ch].ics.num_swb; g++){
> > > >                     g2 = w*16 + g;
> > > >                     if(pch->band[ch][g2].thr >= pch->band[ch][g2].energy){
> > > >                         cpe->ch[ch].sf_idx[w][g] = 0;
> > > >                         cpe->ch[ch].zeroes[w][g] = 1;
> > > >                     }else{
> > > >                         cpe->ch[ch].zeroes[w][g] = 0;
> > > >                         cpe->ch[ch].sf_idx[w][g] = (int)(2.66667 * (log2(6.75*pch->band[ch][g2].thr) - log2(pch->band[ch][g2].ffac)));
> > > >                         while(cpe->ch[ch].sf_idx[ch][g] > 3){
> > > >                             float dist = calc_distortion(cpe->ch[ch].coeffs + start, cpe->ch[ch].ics.swb_sizes[g], SCALE_ONE_POS + cpe->ch[ch].sf_idx[ch][g]);
> > > >                             if(dist < pch->band[ch][g2].thr) break;
> > > >                             cpe->ch[ch].sf_idx[ch][g] -= 3;
> > > >                         }
> > > >                     }
> > > 
> > > looks rather similar to the previous cases
> > 
> > partially - it determines primary scalefactor index from threshold
> > in the same way, what it does to thresholds before that and index
> > after that differs. 
> 
> if possible please factorize the code
> 
> 
> [...]
> > > > 
> > > 
> > > > #ifndef CONFIG_HARDCODED_TABLES
> > > >    for (i = 0; i < 316; i++)
> > > >         ff_aac_pow2sf_tab[i] = pow(2, (i - 200)/4.);
> > > > #endif /* CONFIG_HARDCODED_TABLES */
> > > 
> > > this is likely duplicated
> > 
> > it is not. When table is not hardcoded, it should be initialized. 
> 
> so ff_aac_pow2sf_tab is used just by the psychoacoustic model?

in encoder code, yes 
 
> >  
> > > [...]
> > > > void ff_aac_psy_preprocess(AACPsyContext *ctx, int16_t *audio, int16_t *dest, int tag, int type)
> > > > {
> > > >     int chans = type == ID_CPE ? 2 : 1;
> > > >     const int chstride = ctx->avctx->channels;
> > > >     int i, ch;
> > > >     float t[2];
> > > > 
> > > >     if(chans == 1){
> > > 
> > > >         for(ch = 0; ch < chans; ch++){
> > > >             for(i = 0; i < 1024; i++){
> > > >                 dest[i * chstride + ch] = audio[i * chstride + ch];
> > > >             }
> > > >         }
> > > 
> > > memcpy
> > 
> > no, it copies with gaps 
> 
> and chans=1 which makes the loop rather bloated
 
ignore preprocess code for now
 
> [...]
> > > >             }
> > > >             for(ch = 0; ch < 2; ch++)
> > > >                 dest[i * chstride + ch] = av_clip_int16(t[ch]);
> > > 
> > > we have optimized code for converting float to int16, please use it
> >  
> > 
> > I may have overlooked some stuff for now but that's because it's too
> > hot here to think properly.
> 
> come to austra we have 13 ?C here ATM in 2 days its probably 30 again though

will this mail be enough to get a visa? :)
 
> ill review the psy model ASAP, just wanted to reply with the comments above
> first

actual patch will be in reply to the next mail
 
> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> In a rich man's house there is no place to spit but his face.
> -- Diogenes of Sinope




More information about the ffmpeg-devel mailing list