[FFmpeg-user] Preserving perceived loudness when downmixing audio from 5.1 AC3 to stereo AAC

Wed Aug 7 11:03:15 CEST 2013

> -----Original Message-----
> From: Andy Furniss [mailto:adf.lists at gmail.com]
> Sent: 07 August 2013 01:46
> To: FFmpeg user questions
> Cc: Francois Visagie
> Subject: Re: [FFmpeg-user] Preserving perceived loudness when
> downmixing audio from 5.1 AC3 to stereo AAC
> 
> Francois Visagie wrote:
> 
> > Therein lies part of the problem, not all input files are AC3. Up to
> > at least 30 June -filter:a aformat=channel_layouts=stereo could be
> > used in a standard command line to produce stereo from multi-channel
> > inputs with input and output volumes perceivably equal. Now each
> > encode needs to be inspected individually for input/output
> > differences, and the remedy will in each case also differ according to
> > input type and/or volume differences. Really sub-optimal in my view,
> > one which I expect to be more widely shared once these implications are
> more widely understood.
> 
> I had a look at the old behavior and it clipped, which is not good.
> 
> It was also inconsistent - wav and 7ch thd behaved like -ac 2.
> 
> I don't know what it did as such - maybe there is a way to explicitly
recreate
> it, or perhaps just blindly boost the levels by xDb as part of the
processing if
> you don't care about clipping.
> 
> I don't know about your use case, but if I were mixing for my self I would
> take care to process individually because that's what's needed to get the
> correct results.
> 
> > I sincerely appreciate the trouble you took with outlining various
> > principles involved, but, on a more practical level: rather than
> > making -filter:a aformat=channel_layouts=stereo now share the
> > mechanism of -ac 2 and -filter:a aresample=ocl=3 (incorrectly so wrt.
> > volume levels in my view), what is the feasibility of making the other
> > two behave like -filter:a aformat=channel_layouts=stereo instead?
> 
> I am not a developer - but IMHO the old behavior was wrong, but I haven't
> tested enough to work out what/why it did.
> 
> It's possible that it was intended by someone - it does seem to down mix
in
> the sense it's not just blindly putting 100% in, but then it's not
normalised
> enough to prevent clipping.
> 
> I must admit I saw a little bit of clipping on some of the 6ch masters I
looked
> at - but there was even more after the "old" down mix.
> 
> FWIW I also consider the new behavior wrong in that the description of
> aformat says -
> 
> "Set output format constraints for the input audio. The framework will
> negotiate the most appropriate format to minimize conversions"
> 
> I think it should use -request_channels (where possible) and it doesn't,
so
> anyone using -
> 
> aformat=channel_layouts=stereo
> 
> on say a 7.1 thd stream will not get the best result = a proper studio
stereo
> mix, but instead a 7 -> 2 conversion and very low levels.

I'm not sure even -request_channels produces the expected result. It merely
seems to influence the number of input channels guessed:

C:\Users\fvisagie\Videos\Home Videos\Testing\x264\Downmixing>ffmpeg -y -i
in.ac3 out.wav
ffmpeg version N-55159-gf118b41 Copyright (c) 2000-2013 the FFmpeg
developers
  built on Aug  1 2013 18:01:57 with gcc 4.7.3 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads
--enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r
--enable-gnutls --enab
le-iconv --enable-libass --enable-libbluray --enable-libcaca
--enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmodplug
--enable-libmp3lame --ena
ble-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg
--enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr
--enable-libsp
eex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc
--enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264
--enable-libxavs --
enable-libxvid --enable-zlib
  libavutil      52. 40.100 / 52. 40.100
  libavcodec     55. 19.100 / 55. 19.100
  libavformat    55. 12.102 / 55. 12.102
  libavdevice    55.  3.100 / 55.  3.100
  libavfilter     3. 82.100 /  3. 82.100
  libswscale      2.  4.100 /  2.  4.100
  libswresample   0. 17.103 /  0. 17.103
  libpostproc    52.  3.100 / 52.  3.100
[ac3 @ 0275bc80] Estimating duration from bitrate, this may be inaccurate
Input #0, ac3, from 'in.ac3':
  Duration: 00:00:09.02, start: 0.000000, bitrate: 448 kb/s
    Stream #0:0: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s
Output #0, wav, to 'out.wav':
  Metadata:
    ISFT            : Lavf55.12.102
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz,
5.1(side), s16, 4608 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (ac3 -> pcm_s16le)
Press [q] to stop, [?] for help
size=    5076kB time=00:00:09.02 bitrate=4608.1kbits/s
video:0kB audio:5076kB subtitle:0 global headers:0kB muxing overhead
0.001962%

C:\Users\fvisagie\Videos\Home Videos\Testing\x264\Downmixing>ffmpeg -y
-request_channels 2 -i in.ac3 out.wav
ffmpeg version N-55159-gf118b41 Copyright (c) 2000-2013 the FFmpeg
developers
  built on Aug  1 2013 18:01:57 with gcc 4.7.3 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads
--enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r
--enable-gnutls --enab
le-iconv --enable-libass --enable-libbluray --enable-libcaca
--enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmodplug
--enable-libmp3lame --ena
ble-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg
--enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr
--enable-libsp
eex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc
--enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264
--enable-libxavs --
enable-libxvid --enable-zlib
  libavutil      52. 40.100 / 52. 40.100
  libavcodec     55. 19.100 / 55. 19.100
  libavformat    55. 12.102 / 55. 12.102
  libavdevice    55.  3.100 / 55.  3.100
  libavfilter     3. 82.100 /  3. 82.100
  libswscale      2.  4.100 /  2.  4.100
  libswresample   0. 17.103 /  0. 17.103
  libpostproc    52.  3.100 / 52.  3.100
[ac3 @ 0365ea60] Estimating duration from bitrate, this may be inaccurate
Guessed Channel Layout for  Input Stream #0.0 : stereo
Input #0, ac3, from 'in.ac3':
  Duration: 00:00:09.02, start: 0.000000, bitrate: 448 kb/s
    Stream #0:0: Audio: ac3, 48000 Hz, stereo, fltp, 448 kb/s
Output #0, wav, to 'out.wav':
  Metadata:
    ISFT            : Lavf55.12.102
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo,
s16, 1536 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (ac3 -> pcm_s16le)
Press [q] to stop, [?] for help
size=    1692kB time=00:00:09.02 bitrate=1536.1kbits/s
video:0kB audio:1692kB subtitle:0 global headers:0kB muxing overhead
0.004617%

Would it therefore be correct to assume that -request_channels leads to only
that number of channels being extracted, hence no down-mix?

I'm now thoroughly confused by the various "down-mixing" possibilities and
their potentially differing behaviour, but let me try to consolidate:

* you suggest processing individually which of course is the best approach
in principle
* once intended down-mixing and perhaps level adjustment have been decided
upon, which ffmpeg mechanism:
	* produces technically correct down-mixing?
	* works for most common audio input formats (e.g. according to Carl
Eugen aac does not support -request_channels?);
* or, can these two only be satisfied by down-mixing externally?