[FFmpeg-devel] [PATCH] AAC Encoder, Round 2

Sun Aug 24 17:44:07 CEST 2008

On Sun, Aug 24, 2008 at 04:10:12PM +0200, Michael Niedermayer wrote:
> On Sun, Aug 24, 2008 at 09:21:26AM +0300, Kostya wrote:
> > On Sat, Aug 23, 2008 at 10:28:03PM +0200, Michael Niedermayer wrote:
> > > On Sat, Aug 23, 2008 at 06:31:30PM +0300, Kostya wrote:
> > > > I'm back (feeling even worse than before but nm).
> > > > 
> > > > Here is $subj is in a form of diff against FFmpeg SVN.
> > > 
> > > now the psy model:
> > > 
> > > [...]
> > > 
> > > > +/**
> > > >   * Calculate Bark value for given line.
> > > >   */
> > > >  static inline float calc_bark(float f)
> > > >  {
> > > >      return 13.3f * atanf(0.00076f * f) + 3.5f * atanf((f / 7500.0f) * (f / 7500.0f));
> > > >  }
> > > 
> > > why does vorbis_dec.c use a slightly different one?
> > 
> > I use generic formula available everywhere.
> > There's a comment in http://svn.xiph.org/trunk/vorbis/lib/scales.h:
> > 
> > /* The bark scale equations are approximations, since the original
> >    table was somewhat hand rolled.  The below are chosen to have the
> >    best possible fit to the rolled tables, thus their somewhat odd
> >    appearance (these are more accurate and over a longer range than
> >    the oft-quoted bark equations found in the texts I have).  The
> >    approximations are valid from 0 - 30kHz (nyquist) or so.
> > 
> >    all f in Hz, z in Bark */
> 
> vorbis_dec uses
> #define BARK(x) \
>     (13.1f*atan(0.00074f*(x))+2.24f*atan(1.85e-8f*(x)*(x))+1e-4f*(x))
> 
> does anyone happen to know why there is a difference?
> One would think a text from xiph would match a codec from xiph ...

That comes from an official Vorbis source (below the comment I quoted).
Why (Monty?) chose that is partially explained by that comment.

> >  
> > > except that, i think the previous reviews have not been dealt with yet.
> > > That is the various suggestions for quality improvment should be tried
> > > what is better should be adopted
> > > Also everything that Gabriel Bouvign suggested should be tried.
> > 
> > Err, when I find a way to download them. $20 for three-page paper is a bit
> > high to me.
> 
> forget the papers, implement what does not depend on pay per view paper
> IIRC he said something about scalefactors and 3gpp as well.

He did, but that also influences psy model interface (see below). 

> >  
> > > I do not mind if we leave some of the harder things like viterbi based window
> > > decission to after svn ci, but the majority of the things suggested should
> > > be tried before the code is commited.
> > 
> > Comment on interface then or propose your own.
> > It will be needed to plug any psychoacoustic model.
> > Also it would allow to finish encoder faster and then concentrate on
> > model(s).
> 
> The split between psy and encoder is odd to say at least.
> 
> things psy can provide IMHO
> * find perceptual weights per band or per coefficient used for RD
> * find the perceptual distortion between 2 time domain signals
> * find the perceptual distortion between 2 freq domain signals, possibly
>   just a single band or coeff

Since Gabriel recommended exactly that model, I've tried to implement it in least
intrusive way. As you demand highest possible quality, let's discuss how it should
be done.

My proposition (everybody uses slightly different terms, so I may get something wrong):
0. Initialize everything
1. Perform some input filtering (lowpass, highpass, stereo attenuation, whatever)
2. Model decides window type (well, in distant future it can be 'undecided' and encoder
will try both)
3. Encoder performs windowing and MDCT (and grouping?)
4. Model calculates perceptual entropy and thresholds
5. Ratecontrol module in encoder uses them to produce final thresholds
5.1 maybe it will call psy model to calculate perceptual distortion for the band
6. Encoder quantizes input with scalefactors
7. Encoder determines and encodes band info and coefficients
8. Fetch next frame and goto step 1 unless it was the last frame

Any ideas/suggestions/patches?

> [...]
> 
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> I hate to see young programmers poisoned by the kind of thinking
> Ulrich Drepper puts forward since it is simply too narrow -- Roman Shaposhnik