[FFmpeg-trac] #8033(avfilter:new): libavfilter/af_amix.c: mixing without volume renormalization
FFmpeg
trac at avcodec.org
Wed Jul 24 15:44:33 EEST 2019
#8033: libavfilter/af_amix.c: mixing without volume renormalization
-------------------------------------+------------------------------------
Reporter: CoRoNe | Owner:
Type: enhancement | Status: new
Priority: wish | Component: avfilter
Version: git-master | Resolution:
Keywords: amix | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+------------------------------------
Comment (by CoRoNe):
I have found no way to edit my first post, so...
About 7 years ago I've used Audacity to assemble the soundtrack of the
videogame No One Lives Forever 2. See
[https://www.youtube.com/watch?v=4Y3aKcQ0HK4] for example.
The soundtrack comprised of lots of small segments that can all
dynamically be loaded by the videogame.
A year later I've assembled the soundtrack once more, but this time with
Avisynth. See the attached `SIBERIA.avs` for example.
Now I wanted to see if ffmpeg is up for the task. Just for fun.
FFmpeg used:
{{{
ffmpeg version N-94137-g89b9690-Reino Copyright (c) 2000-2019 the FFmpeg
developers
built with gcc 8.3.0 (GCC)
configuration: --arch=x86 --target-os=mingw32 --cross-prefix=/cygdrive/m
/ffmpeg-windows-build-helpers-
master/ffmpeg_local_builds/sandbox/cross_compilers/mingw-w64-i686/bin/i686-w64-mingw32-
--pkg-config=pkg-config --pkg-config-flags=--static --extra-version=Reino
--enable-gray --enable-version3 --disable-debug --disable-doc --disable-
htmlpages --disable-manpages --disable-podpages --disable-txtpages
--disable-w32threads --enable-avisynth --enable-frei0r --enable-
filter=frei0r --enable-gmp --enable-gpl --enable-libaom --enable-libass
--enable-libbluray --enable-libbs2b --enable-libcaca --extra-
cflags=-DCACA_STATIC --enable-libfdk-aac --enable-libflite --enable-
libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme
--enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libmysofa
--enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264
--enable-libopenmpt --enable-libopus --enable-librubberband --enable-
libsnappy --enable-libsoxr
--enable-libspeex --enable-libtheora --enable-libtwolame --extra-
cflags=-DLIBTWOLAME_STATIC --enable-libvidstab --enable-libvmaf --enable-
libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwebp
--enable-libx264 --enable-libx265 --enable-libxavs --enable-libxml2
--enable-libxvid --enable-libzimg --enable-libzvbi --enable-mbedtls
--extra-cflags='-march=pentium3' --extra-cflags='-mtune=athlon-xp'
--extra-cflags=-O2 --extra-cflags='-mfpmath=sse' --extra-cflags=-msse
--enable-static --disable-shared --prefix=/cygdrive/m/ffmpeg-windows-
build-helpers-
master/ffmpeg_local_builds/sandbox/cross_compilers/mingw-w64-i686/i686-w64-mingw32
libavutil 56. 30.100 / 56. 30.100
libavcodec 58. 53.100 / 58. 53.100
libavformat 58. 28.101 / 58. 28.101
libavdevice 58. 7.100 / 58. 7.100
libavfilter 7. 55.100 / 7. 55.100
libswscale 5. 4.101 / 5. 4.101
libswresample 3. 4.100 / 3. 4.100
libpostproc 55. 4.100 / 55. 4.100
}}}
The soundtrack has an "explore", "warning" and "combat" section.
{{{
"Explore": 'SIBERIAE{1-29}.WAV'
Transition: 'TRANSEWSTR100.WAV'
"Warning": 'SIBERIAW{1-22}.WAV'
Transition: 'TRANSWE19.WAV'
"Combat": 'SIBERIAC{1-25}.WAV'
Transition: 'TRANSCS21.WAV'
}}}
These segments are actually mp3 disguised as wav (blame the videogame).
Sadly it's not a matter of simply concatenating all segments to create the
entire score, because each segment starts at a very specific moment in
time and creates overlap with the rest of them. Then all of them need to
be mixed (at full volume).
I'm actually looking for a way to do a `MixAudio(clip1,clip2,1.0,1.0)`
[http://avisynth.nl/index.php/MixAudio (Avisynth)] with FFmpeg.
First let's have a look at amix's options:
{{{
ffmpeg -h filter=amix
[...]
amix AVOptions:
inputs <int> ..F.A.... Number of inputs. (from 1 to
1024) (default 2)
duration <int> ..F.A.... How to determine the end-of-
stream. (from 0 to 2) (default longest)
longest ..F.A.... Duration of longest input.
shortest ..F.A.... Duration of shortest input.
first ..F.A.... Duration of first input.
dropout_transition <float> ..F.A.... Transition time, in seconds,
for volume renormalization when an input stream ends. (from 0 to INT_MAX)
(default 2)
weights <string> ..F.A.... Set weight for each input.
(default "1 1")
}}}
I figure `weights=1 1` does the same `1.0,1.0` in
`MixAudio(clip1,clip2,1.0,1.0)`, so that's good.
But then it appears amix will ''always'' do "volume renormalization". You
can change the `dropout_transition`-option some what, but there's no way
to turn if off.
This is the command I'm using / had to use so far:
{{{
ffmpeg \
-i SIBERIAE1.WAV -i SIBERIAE2.WAV -i SIBERIAE3.WAV -i SIBERIAE4.WAV -i
SIBERIAE5.WAV \
-i SIBERIAE6.WAV -i SIBERIAE7.WAV -i SIBERIAE8.WAV -i SIBERIAE9.WAV -i
SIBERIAE10.WAV \
-i SIBERIAE11.WAV -i SIBERIAE12.WAV -i SIBERIAE13.WAV -i SIBERIAE14.WAV -i
SIBERIAE15.WAV \
-i SIBERIAE16.WAV -i SIBERIAE17.WAV -i SIBERIAE18.WAV -i SIBERIAE19.WAV -i
SIBERIAE20.WAV \
-i SIBERIAE21.WAV -i SIBERIAE22.WAV -i SIBERIAE23.WAV -i SIBERIAE24.WAV -i
SIBERIAE25.WAV \
-i SIBERIAE26.WAV -i SIBERIAE27.WAV -i SIBERIAE28.WAV -i SIBERIAE29.WAV \
-i TRANSEWSTR100.WAV \
-i SIBERIAW1.WAV -i SIBERIAW2.WAV -i SIBERIAW3.WAV -i SIBERIAW4.WAV -i
SIBERIAW5.WAV \
-i SIBERIAW6.WAV -i SIBERIAW7.WAV -i SIBERIAW8.WAV -i SIBERIAW9.WAV -i
SIBERIAW10.WAV \
-i SIBERIAW11.WAV -i SIBERIAW12.WAV -i SIBERIAW13.WAV -i SIBERIAW14.WAV -i
SIBERIAW15.WAV \
-i SIBERIAW16.WAV -i SIBERIAW17.WAV -i SIBERIAW18.WAV -i SIBERIAW19.WAV -i
SIBERIAW20.WAV \
-i SIBERIAW21.WAV -i SIBERIAW22.WAV \
-i TRANSWE19.WAV \
-i SIBERIAC1.WAV -i SIBERIAC2.WAV -i SIBERIAC3.WAV -i SIBERIAC4.WAV -i
SIBERIAC5.WAV \
-i SIBERIAC6.WAV -i SIBERIAC7.WAV -i SIBERIAC8.WAV -i SIBERIAC9.WAV -i
SIBERIAC10.WAV \
-i SIBERIAC11.WAV -i SIBERIAC12.WAV -i SIBERIAC13.WAV -i SIBERIAC14.WAV -i
SIBERIAC15.WAV \
-i SIBERIAC16.WAV -i SIBERIAC17.WAV -i SIBERIAC18.WAV -i SIBERIAC19.WAV -i
SIBERIAC20.WAV \
-i SIBERIAC21.WAV -i SIBERIAC22.WAV -i SIBERIAC23.WAV -i SIBERIAC24.WAV -i
SIBERIAC25.WAV \
-i TRANSCS21.WAV \
-filter_complex "
[1]adelay=158792S|158792S[E2];[2]adelay=291139S|291139S[E3];[3]adelay=423476S|423476S[E4];
[4]adelay=555820S|555820S[E5];[5]adelay=714633S|714633S[E6];[6]adelay=873439S|873439S[E7];
[7]adelay=1058736S|1058736S[E8];[8]adelay=1244019S|1244019S[E9];[9]adelay=1376339S|1376339S[E10];
[10]adelay=1508682S|1508682S[E11];[11]adelay=1667496S|1667496S[E12];[12]adelay=1826306S|1826306S[E13];
[13]adelay=1932181S|1932181S[E14];[14]adelay=2118076S|2118076S[E15];[15]adelay=2255399S|2255399S[E16];
[16]adelay=2364348S|2364348S[E17];[17]adelay=2499655S|2499655S[E18];[18]adelay=2587883S|2587883S[E19];
[19]adelay=2720226S|2720226S[E20];[20]adelay=2852568S|2852568S[E21];[21]adelay=2984913S|2984913S[E22];
[22]adelay=3073138S|3073138S[E23];[23]adelay=3161367S|3161367S[E24];[24]adelay=3249594S|3249594S[E25];
[25]adelay=3336742S|3336742S[E26];[26]adelay=3421112S|3421112S[E27];[27]adelay=3576621S|3576621S[E28];
[28]adelay=3788369S|3788369S[E29];[29]adelay=3787479S|3787479S[EW100];[30]adelay=3841308S|3841308S[W1];
[31]adelay=4089467S|4089467S[W2];[32]adelay=4337597S|4337597S[W3];[33]adelay=4585740S|4585740S[W4];
[34]adelay=4751180S|4751180S[W5];[35]adelay=4916595S|4916595S[W6];[36]adelay=5040665S|5040665S[W7];
[37]adelay=5206096S|5206096S[W8];[38]adelay=5309485S|5309485S[W9];[39]adelay=5412881S|5412881S[W10];
[40]adelay=5516287S|5516287S[W11];[41]adelay=5619668S|5619668S[W12];[42]adelay=5785089S|5785089S[W13];
[43]adelay=5950523S|5950523S[W14];[44]adelay=6033235S|6033235S[W15];[45]adelay=6198660S|6198660S[W16];
[46]adelay=6364091S|6364091S[W17];[47]adelay=6446814S|6446814S[W18];[48]adelay=6598676S|6598676S[W19];
[49]adelay=6722118S|6722118S[W20];[50]adelay=6872561S|6872561S[W21];[51]adelay=7038663S|7038663S[W22];
[52]adelay=7204260S|7204260S[WE19];[53]adelay=7286129S|7286129S[C1];[54]adelay=7534279S|7534279S[C2];
[55]adelay=7783363S|7783363S[C3];[56]adelay=8030562S|8030562S[C4];[57]adelay=8195988S|8195988S[C5];
[58]adelay=8361416S|8361416S[C6];[59]adelay=8485486S|8485486S[C7];[60]adelay=8650917S|8650917S[C8];
[61]adelay=8754307S|8754307S[C9];[62]adelay=8857700S|8857700S[C10];[63]adelay=8961094S|8961094S[C11];
[64]adelay=9064486S|9064486S[C12];[65]adelay=9271272S|9271272S[C13];[66]adelay=9436697S|9436697S[C14];
[67]adelay=9519413S|9519413S[C15];[68]adelay=9726192S|9726192S[C16];[69]adelay=9850266S|9850266S[C17];
[70]adelay=9932983S|9932983S[C18];[71]adelay=10181124S|10181124S[C19];[72]adelay=10429267S|10429267S[C20];
[73]adelay=10677408S|10677408S[C21];[74]adelay=10842834S|10842834S[C22];[75]adelay=10946226S|10946226S[C23];
[76]adelay=11049615S|11049615S[C24];[77]adelay=11153012S|11153012S[C25];[78]adelay=11401044S|11401044S[CS21];
[0][E2][E3][E4][E5][E6][E7][E8][E9][E10][E11][E12][E13][E14][E15][E16][E17][E18][E19][E20][E21][E22][E23]
[E24][E25][E26][E27][E28][E29]
[EW100]
[W1][W2][W3][W4][W5][W6][W7][W8][W9][W10][W11][W12][W13][W14][W15][W16][W17][W18][W19][W20][W21][W22]
[WE19]
[C1][C2][C3][C4][C5][C6][C7][C8][C9][C10][C11][C12][C13][C14][C15][C16][C17][C18][C19][C20][C21][C22][C23]
[C24][C25]
[CS21]
amix=inputs=79:dropout_transition=270.28,volume=37.3dB
" -f wav <some output>
}}}
- With the adelay filter all segments start at a very specific moment in
time. (having to specify the delay for '''both''' channels is rather
cumbersome, hence my [https://trac.ffmpeg.org/ticket/8032 "one delay-
string for all channels"] request)
- I do `-filter_complex "[...]amix=inputs=79" -f null -` to get the
duration of the entire score: `time=00:04:30.28` to enter as parameter for
the `dropout_transition`-option. I want the volume to be left untouched,
but since there's no way to turn "volume renormalization" off at the
moment, entering the duration for `dropout_transition` appears to do the
least damage.
- This obviously still results in a very low volume, so I do
`-filter_complex
"[...]amix=inputs=79:dropout_transition=270.28,volumedetect" -f null -`:
`mean_volume: -59.3 dB, max_volume: -37.3 dB`...
- ...to crank up the volume to (I think) where it was before:
`-filter_complex
"[...]amix=inputs=79:dropout_transition=270.28,volume=37.3dB"`.
Mixing audio-segments at full volume without any kind of normalization is
a rather basic feature, or at least it should be in my opinion.
Therefore I request "volume renormalization" to be optional and that you
have to enable it specifically through `amix=inputs=79:normalize=1` for
instance.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/8033#comment:4>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list