[FFmpeg-user] 5.1 downmix to 2.0 (again) and buried dialogs

pehache pehache.7 at gmail.com
Fri Aug 26 20:32:44 EEST 2022


Not strictly speaking a ffmpeg (recurring) question, but ffmpeg is often 
used for that...

Since I a have only a stereo setup (albeit a decent one) attached to my 
TV, I started a while ago to generate downmixed 2.0 tracks with ffmpeg 
on my video files with 5.1 (or 7.1) tracks.

My original motivation was a too low perceived loudness of the dialogs 
compared to the music/ambiant sound in *some* movies (not all of them!). 
My hypothesis at that time was that the built-in downmixing of my 
equiment was overweighting the left and right channels (both front and 
side) compared to the central channel where most dialogs are supposed to 
be placed.

So I started with the "-ac 2" option in ffmpeg... Which basically 
changed nothing (as far as I could say, at least). Investigating more I 
then found the -af "pan=stereo| FL< ... | FR< ..." syntax to chose the 
weighting coefficient of each 5.1 channel to buiild the stereo channels.

There were recommended coefficients:
FL < 1.0*FL + 0.707*FC + 0.707*SL (and similarly from FR)
These ones were ginving the same result than -ac 2 to my ears.

There were also tons of alternate formula described on various web 
sites... I ended up with
FL < 0.707*FL + 1.0*FC + 0.707*SL
It was doing what it was supposed to do: louder dialogs compared to 
music and ambient sounds.

However I finally observed that it was also narrowing the stereo image. 
Indeed, FC does not contain only voices but also a large part of the 
music and ambient sounds. Overweighting FC would not narrow the stereo 
image it was containing only the voices, but this is not the case.

I kept wondering why the dialog loudness is sometimes perceived too low 
after downmixing, and I have a possible explanation: the brain is very 
good at isolating a voice buried in the ambient noise because it can 
located where it comes from. That's why people with hearing aids still 
have difficulties to follow a conversation when multiple people speak at 
the same time: the earings aids can restore the volume, but the 
directivity is (mostly) lost... So, with a real 5.1 or 7.1 setup the 
brain is not bothered by the side/rear channels when it comes to focus 
on the central dialogs, because they come from fully different 
directions. But after downmix, what was coming from the side/rear 
channels is now coming from the front channels, making the separation 
task more difficult for the brain. The solution is hence to downweight 
the side/rear channels... Therefore I am now using:

FL < 1.0*FL + 0.707*FC + 0.4*SL

And it seems better to me: the dialogs are clearer, without narrowing 
the stereo image. But maybe this is just what I desperately want to hear...

Any thought on all of this ?

More information about the ffmpeg-user mailing list