[FFmpeg-devel] [PATCH] adpcm: Store trellis nodes in a heap structure

Fri Nov 12 19:54:33 CET 2010

On Fri, Nov 12, 2010 at 4:33 AM, Martin Storsj? <martin at martin.st> wrote:
> On Fri, 12 Nov 2010, Michael Niedermayer wrote:
>
>> On Fri, Nov 12, 2010 at 12:41:24PM +0200, Martin Storsj? wrote:
>> > On Thu, 11 Nov 2010, Reimar D?ffinger wrote:
>> >
>> > > On Thu, Nov 11, 2010 at 01:08:41AM +0200, Martin Storsj? wrote:
>> > > > > In that case, do you feel like finding some setting that with all
>> > > > > patches is about the same speed as without patches and compare the
>> > > > > quality? IMO that would possibly be the most interesting comparison.
>> > > >
>> > > > If reading the graphs at
>> > > > http://albin.abo.fi/~mstorsjo/adpcm-graphs/music1/, I find the following
>> > > > test runs quite similar:
>> > > > Original code, -trellis 6: 26.7 seconds, stddev 87.67, PSNR 57.47
>> > > > Fully patched, -trellis 8: 22.8 seconds, stddev 85.08, PSNR 57.73
>> > > >
>> > > > Thus, with all the patches, you get better quality at comparable run
>> > > > times. Or just roughly similar quality at very much shorter run time. :-)
>> > >
>> > > My question was rather: how does the "maximum" quality change, i.e.
>> > > at the highest reasonable setting.
>> > > I'd expect it to rather improve, but I think you so far only tested really
>> > > fast settings (22 seconds is not a long encoding time in any way,
>> > > even the original 1 minute something is still "acceptable", I remember
>> > > when MP3-encoding was done at I think 1/8th real-time...)
>> >
>> > Well, then I guess it's all up to how much patience you have when defining
>> > the "maximum" quality. If we consider 1/8th to 1/10th of realtime as
>> > "maximum", we get these numbers:
>> > Original code, -trellis 8: 245.8 seconds, stddev 83.65, PSNR 57.88
>> > Fully patched, -trellis 11: 189.7 seconds, stddev 83.26, PSNR 57.92
>> >
>> > However, if checking the runtime_psnr graphs at
>> > http://albin.abo.fi/~mstorsjo/adpcm-graphs/, one notices that the original
>> > (and patch #1 and patch #3) will get better PSNR/runtime if extending the
>> > benchmark to even larger trellis sizes. For the music1 sample, this
>> > happens around sometimes after ~1/15 of realtime, for the other samples it
>> > happens even later than that.
>>
>> so id say #1-#3 are ok to commit while #4 needs more work
>
> Applied #1-3. I'll think more about #4 later to see if I can make a better
> compromise between quality and runtime.
>
> I don't think I had any code in the G.722 trellis encoder similar to the
> one I'm removing in #4, so I guess I can try to update the G.722 trellis
> patch with these improvments.
>
> Thanks for the persistence on making graphs - the log(time) vs PSNR graph
> really is a valuable tool for making these comparisons.
>
> // Martin
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>

FYI, here is why PSNR sucks

http://x264.nl/developers/Dark_Shikari/2.wav (from a game that used ADPCM audio)
http://x264.nl/developers/Dark_Shikari/1.wav (ffmpeg's ADPCM, with
-trellis 16, source was the (lossless) game soundtrack CD)

ffmpeg sounds vastly worse.  I blame PSNR.

05:52 < gmaxwell> To do trellis on this stuff you need a noise shaping error
                  metric.
05:52 < gmaxwell> Not even so much a psy model. You need to feed the noise
                  through a filter and add it back to shape the quantization
                  noise.
05:53 < gmaxwell> But this is tricker to get right for higher sampling rates...
05:54 < gmaxwell> It's a lot of code... and for ADPCM.. so who gives a @#$@ ?
                  :)   If anyone wants to do it, I can point you to some papers
                  on it.

Dark Shikari