[FFmpeg-devel] [PATCH 2/3] lavfi/ebur128: add metadata injection. - volnorm.patch (0/1)

Wed May 1 22:10:54 CEST 2013

Hi,

I am picking up this 'old' discussion, because I saw none of the
one-pass normalization proposals had made its way to FFmpeg yet. Unless
I am very mistaken all current methods are two pass: first analyze the
volume level and then normalize the sound level. Correct me if I am
wrong.

Nicolas George in gmane.comp.video.ffmpeg.devel (Sat, 16 Mar 2013
12:36:40 +0100):
>Le quintidi 25 ventôse, an CCXXI, Jan Ehrhardt a écrit :
>> Actually it is useful for volume normalization, in those cases where the
>> overall sound level is either too low or too high. I have used Clément's
>> first patch set for over 2 weeks now on about 200 videofiles with an
>> average duration of 1 hour. It worked exactly as what we expected:
>> lowering the sound level of a few recordings and increasing the volume
>> of most recordings.
>
>It will work if the volume of the whole movie is approximately constant, but
>not at all if you have, for example, a very loud opening and then a quieter
>program.

http://permalink.gmane.org/gmane.comp.video.ffmpeg.devel/159978 was
Clément's first attempt. It is working quite well for us and replaced
the (one pass) -af volnorm filter in MEncoder with only a few flaws.

>Of course, no normalization system can deal with quick changes of volume,
>but with this example, it will take the integrated loudness more than one
>minute to digest the 25 seconds of loud beginning. That is too much.

We experienced this flaw only with a test video and (as far as I know)
with none of the 750 one-hour videos our users transcoded. But I did not
look at all of them...

In the test video the first three normalization frames had a loudness
(I) of -70:

t: 0.0999792  M:-120.7 S:-120.7     I: -70.0 LUFS     LRA:   0.0 LU
t: 0.199979   M:-120.7 S:-120.7     I: -70.0 LUFS     LRA:   0.0 LU
t: 0.299979   M:-120.7 S:-120.7     I: -70.0 LUFS     LRA:   0.0 LU
t: 0.399979   M: -20.7 S:-120.7     I: -20.7 LUFS     LRA:   0.0 LU

Our FFmpeg tried to adjust the volume three times with +47dB (70-23),
apparently enough to lead to all kinds of buffer overflows. The result:
a one hour video with a measured I of 10.0 (the maximum).

So I went looking for ways to maximize the adjustment. My first working
example looked like this:

     loudness = av_strtod(e->value, NULL);
     new_volume = fmax(-20,fmin(20,(-23 - loudness)));
     set_fixed_volume(vol, pow(10, new_volume / 20));

The idea: maximize the adjustment within the range -20 up to +20
(measured from the -23 target). This solved the buffer overflow
problems, but had the issue Nicolas George predicted. After the three
initial frames the volume went up to -2 and it took about 30 seconds to
return to -17. An unwanted sound spike at the beginning of the video.

The question arose: how to minimize the adjustments at the beginning of
a video? I went back to f_ebur128.c and inserted another variable to the
metadata: the pts. I could use the pts in af_volume.c to maximize the
change in loudness during the initial seconds. My arbitrary choice:
allow -1/+1 after the first second, -2/+2 after the second second,
-20/+20 after 20 seconds or any longer duration. Of course, it is
possible to lengthen the initial duration to, say, a minute and lower
the maximum adjustment to -10/+10. But the idea is clear. Essential part
of the patch:

    if (vol->metadata) {
        double loudness, new_volume, pts, timestamp, mx;
        AVDictionaryEntry *t, *e;
        t = av_dict_get(buf->metadata, "lavfi.r128.pts", NULL, 0);
        mx = 20; 
        if (t) {
            pts = av_strtod(t->value, NULL);
            timestamp = pts / 48000; /* assume 48kHz */
            mx = fmin(mx, timestamp);
        }
        e = av_dict_get(buf->metadata, vol->metadata, NULL, 0);
        if (e) {
            loudness = av_strtod(e->value, NULL);
            new_volume = fmax(-mx,fmin(mx,(-23 - loudness)));
            set_fixed_volume(vol, pow(10, new_volume / 20));
        }
    }

The mx variable defines the min/max adjustment. By setting an absolute
maximum of 20 and by dividing the pts by 48k, I got the described setup
of -1/+1 per second.

Complete patch attached (if my nntp client handles it correctly).
Applied to yesterdays FFmpeg.

Jan

PS. I also made some changes to the av_log messages: hide them normally,
but show them with -loglevel verbose.