[FFmpeg-devel] [PATCH] AAC Encoder, Round 2
Sun Aug 24 18:45:58 CEST 2008
On Sun, Aug 24, 2008 at 06:44:07PM +0300, Kostya wrote:
> On Sun, Aug 24, 2008 at 04:10:12PM +0200, Michael Niedermayer wrote:
> > >
> > > > except that, i think the previous reviews have not been dealt with yet.
> > > > That is the various suggestions for quality improvment should be tried
> > > > what is better should be adopted
> > > > Also everything that Gabriel Bouvign suggested should be tried.
> > >
> > > Err, when I find a way to download them. $20 for three-page paper is a bit
> > > high to me.
> > forget the papers, implement what does not depend on pay per view paper
> > IIRC he said something about scalefactors and 3gpp as well.
> He did, but that also influences psy model interface (see below).
Anyway i suggest that you read some of the RD papers about video coding
(even if you read the audio related ones)
> > >
> > > > I do not mind if we leave some of the harder things like viterbi based window
> > > > decission to after svn ci, but the majority of the things suggested should
> > > > be tried before the code is commited.
> > >
> > > Comment on interface then or propose your own.
> > > It will be needed to plug any psychoacoustic model.
> > > Also it would allow to finish encoder faster and then concentrate on
> > > model(s).
> > The split between psy and encoder is odd to say at least.
> > things psy can provide IMHO
> > * find perceptual weights per band or per coefficient used for RD
> > * find the perceptual distortion between 2 time domain signals
> > * find the perceptual distortion between 2 freq domain signals, possibly
> > just a single band or coeff
> Since Gabriel recommended exactly that model, I've tried to implement it in least
> intrusive way. As you demand highest possible quality, let's discuss how it should
> be done.
> My proposition (everybody uses slightly different terms, so I may get something wrong):
> 0. Initialize everything
of course ...
> 1. Perform some input filtering (lowpass, highpass, stereo attenuation, whatever)
Its debateable in how far this should be here or seperate and outside of the
> 2. Model decides window type (well, in distant future it can be 'undecided' and encoder
> will try both)
> 3. Encoder performs windowing and MDCT (and grouping?)
i dont think grouping can be done at this point, at least not optimally.
> 4. Model calculates perceptual entropy and thresholds
> 5. Ratecontrol module in encoder uses them to produce final thresholds
> 5.1 maybe it will call psy model to calculate perceptual distortion for the band
> 6. Encoder quantizes input with scalefactors
> 7. Encoder determines and encodes band info and coefficients
> 8. Fetch next frame and goto step 1 unless it was the last frame
> Any ideas/suggestions/patches?
Iam not sure, this is quite vague
A few points that are IMO important
* decissions must NOT be bundled into psy models, that is when we implement
3 differnt heuristics to choose the MDCT/window size they must be choosable
independant of the remaining unrelated psy model, this also applies to
things like stereo attenution coeffs, the way low/highpass cutoff is
choosen and so on ...
* The primary goal is highest quality encoding, anything that would make
achiving this goal harder will be rejected.
* coeff quantization and scalefactors must be decided based on RD.
Its perfectly fine to support faster alternatives in addition ...
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Observe your enemies, for they first find out your faults. -- Antisthenes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel