[FFmpeg-devel] [RFC] AAC Encoder, now more optimal
Michael Niedermayer
michaelni
Sat Sep 6 21:56:16 CEST 2008
On Sat, Sep 06, 2008 at 08:28:48PM +0300, Kostya wrote:
> On Sat, Sep 06, 2008 at 06:53:59PM +0200, Michael Niedermayer wrote:
> > On Sat, Sep 06, 2008 at 04:27:21PM +0300, Kostya wrote:
> [...]
> > > The issues mentioned above disallow testing. But to my ear it was better,
> > > especially on transitions.
> > > There are 2-3 thing I have to deal with before it is suitable for SVN:
> > > * M/S detection - but how to incorporate it? Should it be performed during
> > > quantizers search or after and how?
> > > * Speed optimization
> > > * Other tricks (pulse tool, TNS) - less important though
> >
> > IMO inclusion in SVN requires to produce equal or better quality / bitrate
> > than the encoder from that paper. and better than at least one common encoder
> > like faac. (reaching the paper one should be trivial by just implementing what
> > the paper describes, deviation from this have to be better not worse quality
> > wise)
> > The paper contains some graphs that compare it against the reference encoder
> > and it should be possible to similarly generate such graphs for your encoder.
> > This is a good check to ensure that things are correctly implemented.
> >
> > I also think we should apply much stricter tests in the future for SOC project
> > decoders, that is PSNR/RMS difference, from the binary decoder but ideally
> > bit identical. To ensure that no bugs that are very hard to debug later sneak
> > in.
> >
> >
> > >
> > > And about quantizers search method (I document it here in hope it would
> > > be easier to understand and discuss it):
> > >
> > > * the code iterates over band groups (bands with the same number in different
> > > windows of window group) for all window groups since they are quantized
> > > with the same quantizer
> > > * for each of band groups all quantizers are tried (actually I determine
> > > quantizers for which quantizing have sence - i.e. outside them distortion
> > > and number of bits needed to code are the same as on the boundary - and
> > > search only in that range) to find out distortion and number of bits
> >
> > > ** quantizing and bits estimation is not optimal since it will slow down
> > > encoding even more
> >
> > > ** distortion = sum of squared quantizing errors
> >
> > yes, but as quantization is approximate so is that
> >
> >
> > > * then the cost function is calculated:
> > > C_{q1,q2} = SUM_{w} (quanterror_w / threshold_w * lambda + bits_w) + TC(q1,q2)
> > > where quanterror - sum of squared quantisation errors for band in window w,
> > > threshold - band threshold (provided by psychoacoustic model)
> > > lambda - rate control parameter
> > > bits - number of bits needed to encode that quantized band
> > > TC(a,b) - number of bits needed to encode scalefactor difference (q2-q1)
> > >
> > > and path is calculated where the total cost is minimal.
> > >
> > > I use several tricks to reduce computations for zero bands and to ensure
> > > final quantizers will not differ by more than 60.
> > >
> > > The most problematic steps are quantization and (less so) inverse quantization.
> > > By replacing inverse quantization process (x*cbrt(x)*IQ) with table lookup
> > > (with size 8192*256, so not for final encoder), I've managed to reduce
> > > coding time from 72 seconds to mere 59 seconds. Unfortunately, it's not easy
> > > to speedup quantizing.
> > >
> > > But there's an idea: represent coefficients in 'AAC domain', i.e. apply
> > > power to 3/4 and represent it as A * 2^(B/4) with integers, so it will
> > > be easier to quantize. Do you think it's worth trying?
> >
> > You have a table of vector quantizers, quantization is finding the one
> > with the lowest RD, as the table contains the unquantized vectors
> > as well i have difficulty mapping your problems onto it.
>
> well, %s/quantization/scaling/g
> indeed, the problem is to represent coefficients with vectors scaled by some
> scalefactors and get minimum distortion.
minimum rate distortion
> My main problem is that optimal search takes too much time. And here
> the tradeoffs begin.
how much time does it take?
how much quality is lost by not doing it?
can you please post the code so we can try to improve it.
I dont want to flame, but the way you seem to be working on this is extreemly
problematic. It shouldnt be "ooh that too slow lets try something else" it
should be
"A takes X seconds and has Y rate distortion curve, B takes U seconds and has
V rate distortion curve so we take A because its faster while quality
difference is negligible" or
"we take A because its better and speed difference is negligible"
or "we support both and let th user decide by compression_level"
Basically either you do not test the code at all or you do not post the
results of your tests.
Either way that makes it impossible for me to accept anything short of the
mathematical optimum.
Or to say it differently I NEED some evidence that your "tradeoffs" are
reasonable. (and not only evidence, they actually have to be good tradeoffs
or we will be stuck when the encoder is finished and falls short of the
expected quality)
I see 2 ways by which you can write the encoder
the first is what i suggested, you write a RD optiamal encoder (that
would be very slow) and we then add "tradeoffs" carefully step by step
while watching what effect each has on speed and quality/bitrate.
the second is you write a encoder any way you like and i just check if
its within 5% of one of the RD based encoders mentioned in the paper
if not you can figure out why not.
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
It is not what we do, but why we do it that matters.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080906/c6de32a0/attachment.pgp>
More information about the ffmpeg-devel
mailing list