[FFmpeg-trac] #8033(avfilter:new): libavfilter/af_amix.c: mixing without volume renormalization

FFmpeg trac at avcodec.org
Wed Jul 24 15:44:33 EEST 2019


#8033: libavfilter/af_amix.c: mixing without volume renormalization
-------------------------------------+------------------------------------
             Reporter:  CoRoNe       |                    Owner:
                 Type:  enhancement  |                   Status:  new
             Priority:  wish         |                Component:  avfilter
              Version:  git-master   |               Resolution:
             Keywords:  amix         |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+------------------------------------

Comment (by CoRoNe):

 I have found no way to edit my first post, so...

 About 7 years ago I've used Audacity to assemble the soundtrack of the
 videogame No One Lives Forever 2. See
 [https://www.youtube.com/watch?v=4Y3aKcQ0HK4] for example.
 The soundtrack comprised of lots of small segments that can all
 dynamically be loaded by the videogame.

 A year later I've assembled the soundtrack once more, but this time with
 Avisynth. See the attached `SIBERIA.avs` for example.

 Now I wanted to see if ffmpeg is up for the task. Just for fun.

 FFmpeg used:
 {{{
 ffmpeg version N-94137-g89b9690-Reino Copyright (c) 2000-2019 the FFmpeg
 developers
   built with gcc 8.3.0 (GCC)
   configuration: --arch=x86 --target-os=mingw32 --cross-prefix=/cygdrive/m
 /ffmpeg-windows-build-helpers-
 master/ffmpeg_local_builds/sandbox/cross_compilers/mingw-w64-i686/bin/i686-w64-mingw32-
 --pkg-config=pkg-config --pkg-config-flags=--static --extra-version=Reino
 --enable-gray --enable-version3 --disable-debug --disable-doc --disable-
 htmlpages --disable-manpages --disable-podpages --disable-txtpages
 --disable-w32threads --enable-avisynth --enable-frei0r --enable-
 filter=frei0r --enable-gmp --enable-gpl --enable-libaom --enable-libass
 --enable-libbluray --enable-libbs2b --enable-libcaca --extra-
 cflags=-DCACA_STATIC --enable-libfdk-aac --enable-libflite --enable-
 libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme
 --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libmysofa
 --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264
 --enable-libopenmpt --enable-libopus --enable-librubberband --enable-
 libsnappy --enable-libsoxr
 --enable-libspeex --enable-libtheora --enable-libtwolame --extra-
 cflags=-DLIBTWOLAME_STATIC --enable-libvidstab --enable-libvmaf --enable-
 libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwebp
 --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxml2
 --enable-libxvid --enable-libzimg --enable-libzvbi --enable-mbedtls
 --extra-cflags='-march=pentium3' --extra-cflags='-mtune=athlon-xp'
 --extra-cflags=-O2 --extra-cflags='-mfpmath=sse' --extra-cflags=-msse
 --enable-static --disable-shared --prefix=/cygdrive/m/ffmpeg-windows-
 build-helpers-
 master/ffmpeg_local_builds/sandbox/cross_compilers/mingw-w64-i686/i686-w64-mingw32
   libavutil      56. 30.100 / 56. 30.100
   libavcodec     58. 53.100 / 58. 53.100
   libavformat    58. 28.101 / 58. 28.101
   libavdevice    58.  7.100 / 58.  7.100
   libavfilter     7. 55.100 /  7. 55.100
   libswscale      5.  4.101 /  5.  4.101
   libswresample   3.  4.100 /  3.  4.100
   libpostproc    55.  4.100 / 55.  4.100
 }}}
 The soundtrack has an "explore", "warning" and "combat" section.
 {{{
 "Explore":  'SIBERIAE{1-29}.WAV'
 Transition: 'TRANSEWSTR100.WAV'
 "Warning":  'SIBERIAW{1-22}.WAV'
 Transition: 'TRANSWE19.WAV'
 "Combat":   'SIBERIAC{1-25}.WAV'
 Transition: 'TRANSCS21.WAV'
 }}}
 These segments are actually mp3 disguised as wav (blame the videogame).
 Sadly it's not a matter of simply concatenating all segments to create the
 entire score, because each segment starts at a very specific moment in
 time and creates overlap with the rest of them. Then all of them need to
 be mixed (at full volume).
 I'm actually looking for a way to do a `MixAudio(clip1,clip2,1.0,1.0)`
 [http://avisynth.nl/index.php/MixAudio (Avisynth)] with FFmpeg.

 First let's have a look at amix's options:
 {{{
 ffmpeg -h filter=amix
 [...]
 amix AVOptions:
   inputs            <int>        ..F.A.... Number of inputs. (from 1 to
 1024) (default 2)
   duration          <int>        ..F.A.... How to determine the end-of-
 stream. (from 0 to 2) (default longest)
      longest                      ..F.A.... Duration of longest input.
      shortest                     ..F.A.... Duration of shortest input.
      first                        ..F.A.... Duration of first input.
   dropout_transition <float>      ..F.A.... Transition time, in seconds,
 for volume renormalization when an input stream ends. (from 0 to INT_MAX)
 (default 2)
   weights           <string>     ..F.A.... Set weight for each input.
 (default "1 1")
 }}}
 I figure `weights=1 1` does the same `1.0,1.0` in
 `MixAudio(clip1,clip2,1.0,1.0)`, so that's good.
 But then it appears amix will ''always'' do "volume renormalization". You
 can change the `dropout_transition`-option some what, but there's no way
 to turn if off.

 This is the command I'm using / had to use so far:
 {{{
 ffmpeg \
 -i SIBERIAE1.WAV -i SIBERIAE2.WAV -i SIBERIAE3.WAV -i SIBERIAE4.WAV -i
 SIBERIAE5.WAV \
 -i SIBERIAE6.WAV -i SIBERIAE7.WAV -i SIBERIAE8.WAV -i SIBERIAE9.WAV -i
 SIBERIAE10.WAV \
 -i SIBERIAE11.WAV -i SIBERIAE12.WAV -i SIBERIAE13.WAV -i SIBERIAE14.WAV -i
 SIBERIAE15.WAV \
 -i SIBERIAE16.WAV -i SIBERIAE17.WAV -i SIBERIAE18.WAV -i SIBERIAE19.WAV -i
 SIBERIAE20.WAV \
 -i SIBERIAE21.WAV -i SIBERIAE22.WAV -i SIBERIAE23.WAV -i SIBERIAE24.WAV -i
 SIBERIAE25.WAV \
 -i SIBERIAE26.WAV -i SIBERIAE27.WAV -i SIBERIAE28.WAV -i SIBERIAE29.WAV \
 -i TRANSEWSTR100.WAV \
 -i SIBERIAW1.WAV -i SIBERIAW2.WAV -i SIBERIAW3.WAV -i SIBERIAW4.WAV -i
 SIBERIAW5.WAV \
 -i SIBERIAW6.WAV -i SIBERIAW7.WAV -i SIBERIAW8.WAV -i SIBERIAW9.WAV -i
 SIBERIAW10.WAV \
 -i SIBERIAW11.WAV -i SIBERIAW12.WAV -i SIBERIAW13.WAV -i SIBERIAW14.WAV -i
 SIBERIAW15.WAV \
 -i SIBERIAW16.WAV -i SIBERIAW17.WAV -i SIBERIAW18.WAV -i SIBERIAW19.WAV -i
 SIBERIAW20.WAV \
 -i SIBERIAW21.WAV -i SIBERIAW22.WAV \
 -i TRANSWE19.WAV \
 -i SIBERIAC1.WAV -i SIBERIAC2.WAV -i SIBERIAC3.WAV -i SIBERIAC4.WAV -i
 SIBERIAC5.WAV \
 -i SIBERIAC6.WAV -i SIBERIAC7.WAV -i SIBERIAC8.WAV -i SIBERIAC9.WAV -i
 SIBERIAC10.WAV \
 -i SIBERIAC11.WAV -i SIBERIAC12.WAV -i SIBERIAC13.WAV -i SIBERIAC14.WAV -i
 SIBERIAC15.WAV \
 -i SIBERIAC16.WAV -i SIBERIAC17.WAV -i SIBERIAC18.WAV -i SIBERIAC19.WAV -i
 SIBERIAC20.WAV \
 -i SIBERIAC21.WAV -i SIBERIAC22.WAV -i SIBERIAC23.WAV -i SIBERIAC24.WAV -i
 SIBERIAC25.WAV \
 -i TRANSCS21.WAV \
 -filter_complex "
 [1]adelay=158792S|158792S[E2];[2]adelay=291139S|291139S[E3];[3]adelay=423476S|423476S[E4];
 [4]adelay=555820S|555820S[E5];[5]adelay=714633S|714633S[E6];[6]adelay=873439S|873439S[E7];
 [7]adelay=1058736S|1058736S[E8];[8]adelay=1244019S|1244019S[E9];[9]adelay=1376339S|1376339S[E10];
 [10]adelay=1508682S|1508682S[E11];[11]adelay=1667496S|1667496S[E12];[12]adelay=1826306S|1826306S[E13];
 [13]adelay=1932181S|1932181S[E14];[14]adelay=2118076S|2118076S[E15];[15]adelay=2255399S|2255399S[E16];
 [16]adelay=2364348S|2364348S[E17];[17]adelay=2499655S|2499655S[E18];[18]adelay=2587883S|2587883S[E19];
 [19]adelay=2720226S|2720226S[E20];[20]adelay=2852568S|2852568S[E21];[21]adelay=2984913S|2984913S[E22];
 [22]adelay=3073138S|3073138S[E23];[23]adelay=3161367S|3161367S[E24];[24]adelay=3249594S|3249594S[E25];
 [25]adelay=3336742S|3336742S[E26];[26]adelay=3421112S|3421112S[E27];[27]adelay=3576621S|3576621S[E28];
 [28]adelay=3788369S|3788369S[E29];[29]adelay=3787479S|3787479S[EW100];[30]adelay=3841308S|3841308S[W1];
 [31]adelay=4089467S|4089467S[W2];[32]adelay=4337597S|4337597S[W3];[33]adelay=4585740S|4585740S[W4];
 [34]adelay=4751180S|4751180S[W5];[35]adelay=4916595S|4916595S[W6];[36]adelay=5040665S|5040665S[W7];
 [37]adelay=5206096S|5206096S[W8];[38]adelay=5309485S|5309485S[W9];[39]adelay=5412881S|5412881S[W10];
 [40]adelay=5516287S|5516287S[W11];[41]adelay=5619668S|5619668S[W12];[42]adelay=5785089S|5785089S[W13];
 [43]adelay=5950523S|5950523S[W14];[44]adelay=6033235S|6033235S[W15];[45]adelay=6198660S|6198660S[W16];
 [46]adelay=6364091S|6364091S[W17];[47]adelay=6446814S|6446814S[W18];[48]adelay=6598676S|6598676S[W19];
 [49]adelay=6722118S|6722118S[W20];[50]adelay=6872561S|6872561S[W21];[51]adelay=7038663S|7038663S[W22];
 [52]adelay=7204260S|7204260S[WE19];[53]adelay=7286129S|7286129S[C1];[54]adelay=7534279S|7534279S[C2];
 [55]adelay=7783363S|7783363S[C3];[56]adelay=8030562S|8030562S[C4];[57]adelay=8195988S|8195988S[C5];
 [58]adelay=8361416S|8361416S[C6];[59]adelay=8485486S|8485486S[C7];[60]adelay=8650917S|8650917S[C8];
 [61]adelay=8754307S|8754307S[C9];[62]adelay=8857700S|8857700S[C10];[63]adelay=8961094S|8961094S[C11];
 [64]adelay=9064486S|9064486S[C12];[65]adelay=9271272S|9271272S[C13];[66]adelay=9436697S|9436697S[C14];
 [67]adelay=9519413S|9519413S[C15];[68]adelay=9726192S|9726192S[C16];[69]adelay=9850266S|9850266S[C17];
 [70]adelay=9932983S|9932983S[C18];[71]adelay=10181124S|10181124S[C19];[72]adelay=10429267S|10429267S[C20];
 [73]adelay=10677408S|10677408S[C21];[74]adelay=10842834S|10842834S[C22];[75]adelay=10946226S|10946226S[C23];
 [76]adelay=11049615S|11049615S[C24];[77]adelay=11153012S|11153012S[C25];[78]adelay=11401044S|11401044S[CS21];
 [0][E2][E3][E4][E5][E6][E7][E8][E9][E10][E11][E12][E13][E14][E15][E16][E17][E18][E19][E20][E21][E22][E23]
   [E24][E25][E26][E27][E28][E29]
   [EW100]
 [W1][W2][W3][W4][W5][W6][W7][W8][W9][W10][W11][W12][W13][W14][W15][W16][W17][W18][W19][W20][W21][W22]
   [WE19]
 [C1][C2][C3][C4][C5][C6][C7][C8][C9][C10][C11][C12][C13][C14][C15][C16][C17][C18][C19][C20][C21][C22][C23]
   [C24][C25]
   [CS21]
   amix=inputs=79:dropout_transition=270.28,volume=37.3dB
 " -f wav <some output>
 }}}
 - With the adelay filter all segments start at a very specific moment in
 time. (having to specify the delay for '''both''' channels is rather
 cumbersome, hence my [https://trac.ffmpeg.org/ticket/8032 "one delay-
 string for all channels"] request)
 - I do `-filter_complex "[...]amix=inputs=79" -f null -` to get the
 duration of the entire score: `time=00:04:30.28` to enter as parameter for
 the `dropout_transition`-option. I want the volume to be left untouched,
 but since there's no way to turn "volume renormalization" off at the
 moment, entering the duration for `dropout_transition` appears to do the
 least damage.
 - This obviously still results in a very low volume, so I do
 `-filter_complex
 "[...]amix=inputs=79:dropout_transition=270.28,volumedetect" -f null -`:
 `mean_volume: -59.3 dB, max_volume: -37.3 dB`...
 - ...to crank up the volume to (I think) where it was before:
 `-filter_complex
 "[...]amix=inputs=79:dropout_transition=270.28,volume=37.3dB"`.

 Mixing audio-segments at full volume without any kind of normalization is
 a rather basic feature, or at least it should be in my opinion.
 Therefore I request "volume renormalization" to be optional and that you
 have to enable it specifically through `amix=inputs=79:normalize=1` for
 instance.

--
Ticket URL: <https://trac.ffmpeg.org/ticket/8033#comment:4>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list