[FFmpeg-devel] [Patch] AAC encoder improvements

Claudio Freire klaussfreire at gmail.com
Mon May 6 06:27:16 CEST 2013

On Sun, May 5, 2013 at 2:31 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> see http://ffmpeg.org/~michael/psnr_loosing_audio/
> i tried encoding at 64,128 and 256 kbps:
> If someone has a link to a more standard and diverse set of
> reference audio samples ...
> also note above are just decoded mp3s not raw cd quality audio as
> it should be, it was just intended as a quick check for the patch ...
> -stddev: 4201.26 PSNR: 23.86 MAXDIFF:55201 bytes: 10580156/ 10584064
> +stddev: 4252.98 PSNR: 23.76 MAXDIFF:55201 bytes: 10580156/ 10584064
> -stddev: 1815.74 PSNR: 31.15 MAXDIFF:34136 bytes: 10584576/ 10584064
> +stddev: 1874.21 PSNR: 30.87 MAXDIFF:37634 bytes: 10584576/ 10584064
> -stddev: 2776.54 PSNR: 27.46 MAXDIFF:57966 bytes: 10580156/ 10584064
> +stddev: 2787.54 PSNR: 27.42 MAXDIFF:59070 bytes: 10580156/ 10584064

The psnr test is nice to spot major screwups, but I've written a small
shell script that runs it all over lame's samples and a few of mine,
and I now can say with confidence, it's misleading. Here:

-- 10.flac - 64k --
A: stddev:  477.95 PSNR: 42.74 MAXDIFF:39643 bytes: 95432704/ 95432704
B: stddev:  481.98 PSNR: 42.67 MAXDIFF:40041 bytes: 95432704/ 95432704

It shows almost no difference in psnr (B is the patches I gave, A is
an improvement over them I'm working on, trying to get twoloop to
really respect psy), and listening I can spot more than a few passages
where B is clearly inferior to A (I can hear artifacts in B, not in A,
I've abx'd and I can tell them apart), yet the psnr doesn't show a
hint of it.

Why? I guess because psnr doesn't take psychoacoustic effects into
account. So, while I confirm the joint stereo part about having to
recompute psy might have been unneeded, and while tiny_psnr is useful,
I wouldn't trust it too much.

Back to the patches, I'll rebase on current git, try to get them as
clean and atomic as possible, testing each step with psnr at least.
I'll replace the quantization one with the A version that seems to
perform better.

After that, I'm undecided between implementing intensity stereo,
codebook 13, or a new bit allocation strategy based on grid search.
I've noticed twoloop is rather unstable (a tweak here and there, and
it goes bonkers, probably a symptom of it not being really goot at
searching optimal allocation, although I've got to admit it does a
pretty good job already).

BTW... got any tool to clean up sources after editing to remove
trailing whitespace and whatever other pickiness git has? I feel doing
this by hand would be the wrong way.

More information about the ffmpeg-devel mailing list