[FFmpeg-devel] On libavfilter: A Summary of Issues

Mon Feb 18 04:36:53 CET 2013

Clément Bœsch in gmane.comp.video.ffmpeg.devel (Mon, 18 Feb 2013
03:37:04 +0100):
>On Mon, Feb 18, 2013 at 03:24:58AM +0100, Jan Ehrhardt wrote:
>> 
>> I still like the idea of MEncoder's volnorm filter very much. Even in
>> one pass encoding MEncoder adjusts the volume to an acceptable level (in
>> my case it mostly gains volume). Would such a thing be possible with
>> ebur128 and metadata injection?
>
>AFAICT, yes. We already have metadata injection in lavfi (see
>silencedetect or the scene detection in select filter), and so if ebur128
>filter was injecting metadata in frames, then a communication would be
>possible between ebur128 and volume filter (for example).
>
>The main problem is that you need to adjust ebur128 so it is resizing
>frames to windows to something like 100ms (IIRC), which might require some
>thinking in the filter. The reason is that a loudness result is associated
>for this time frame, so you need to transmit metadata to volume at this
>rate.

The EBU R128 specs talk about 400ms: http://tech.ebu.ch/loudness
It looks like FFmpeg's ebur128 filter already uses this time frame as a
default: https://ffmpeg.org/ffmpeg-filters.html#ebur128

And the EBU specs also define a short-term loudness of 3 seconds, which
is implemented in the ebur128 filter as well. AFAIK, 3 seconds comes
close to MEncoder's volnorm filter.

I do not know the inner workings of the ebur128 filter, but it looks
logical to me that it is constantly (i.e. at every audio frame) updating
the values for the Momentary loudness and the Short-term loudness. If
so, then there is at every audio frame a value that can be used as input
for the volume filter. Then it must also be possible to inject that as
metadata into the audio frame and use it as input for the volume filter.

Maybe kierank knows more about the way r128 has to be implemented. Or he
might be able to retieve that information. I have sent him a message
with a reference to this discussion.

>Another - maybe smarter - solution would be to auto inject a asetnsamples
>before a ebur128 filter given various properties of the filtergraph. Or
>maybe the ebur128 filter could alter the filtergraph itself (if that's
>possible without much hack) to insert the filter before if a "metadata"
>mode is requested.

This goes above my head. I just grasped the concept of metadata
injection, but do not yet know how it really works. I took a glimse on
your patches for metadata inhection and the silencedetect filter...

On #ffmpeg-devel kierank pointed out that r128 might be going to be
mandatory in some countries for realtime encoding. It would be a big
bonus for FFmpeg if it supported this.

Jan