[FFmpeg-user] Questions about the concat video filter

Tue Jan 23 18:20:08 EET 2018

Hi Katherine,

> On Jan 23, 2018, at 5:09 AM, Katherine Frances <knfrances at GMAIL.COM> wrote:
> 
> Hi all,
> 
> I've been investigating the concat filter recently, and after the
> documentation at https://ffmpeg.org/ffmpeg-filters.html#concat and the wiki
> page at https://trac.ffmpeg.org/wiki/Concatenate#differentcodec, I have a
> couple of further questions.
> 
> The documentation says,
> 
> Related streams do not always have exactly the same duration, for various
>> reasons including codec frame size or sloppy authoring. For that reason,
>> related synchronized streams (e.g. a video and its audio track) should be
>> concatenated at once.
>> 
> 
> It's the wording "at once" that is tripping me up. Does it mean 'together'?
> i.e., all streams from a given input should be specified in order, as shown
> in the examples, rather than grouping e.g. the video streams from each
> input, then the audio streams from each input?

Some audio encodings don’t support arbitrary sample counts, so it’s not uncommon that the audio and video encodings of a file has minor fluctuations in their duration. Also video and audio sampling rates often has different timebases so it’s usually not feasible for their durations to be the same. For instance you may have a second of 48000 Hz audio, but it is not possible to have one second of an NTSC (30000/1001 fps) recording.
So for instance concatenating a segment of video + audio with another segment of video + audio would be more accurate, then concatenating 2 audio encodings and separately concatenating 2 video encodings and then hopefully that they will still align.

> Further,
> 
> The concat filter will use the duration of the longest stream in each
>> segment (except the last one), and if necessary pad shorter audio streams
>> with silence.
>> 
> 
> Can I then assume that in the case in which the video stream is shorter, it
> will be padded with black slug?

No unless you provide that black slug. In this case the duration of the last frame will be longer to cover the missing video. For instance:

ffmpeg -f lavfi -i smptebars=s=320x240:d=2.9 -f lavfi -i aevalsrc="0.1*sin(1000*2*PI*t):d=3" -f lavfi -i testsrc2=s=320x240:d=2 -f lavfi -i "aevalsrc=0:d=3" -f lavfi -i testsrc=s=320x240:d=1 -f lavfi -i aevalsrc="-2+random(0):d=3" -filter_complex "[0:v][1:a][2:v][3:a][4:v][5:a]concat=n=3:v=1:a=1[v][a]" -map "[v]" -map "[a]" -y concat.mkv

This concatenates (3 sec video + 3 sec audio with 2 sec video + 3 sec audio with 1 sec video + 3 sec audio). You’ll see the video frames are just played for a longer time to keep the overall duration of the segment as determined by the longest segment stream.

> Further, if the specs for the output streams are not explicitly set by the
> user, how does FFmpeg 'decide' what codec & options etc to use? The docs
> say:
> 
> the filtering system will automatically select a common pixel format for
>> video streams, and a common sample format, sample rate and channel layout
>> for audio streams, but other settings, such as resolution, must be
>> converted explicitly by the user.
>> 
> 
> but I don't find the wording isn't very clear. From my tests, it seems to
> default to something similar to the specs of the lower-quality input? I
> wonder if someone can confirm, or give some more information about the
> default output codec settings.

If you know what you want the output to be then you should clarify, otherwise it leaves the guesswork to ffmpeg which may or may not be want you want.

> Disclaimer: these might be noob questions to some of you, but for me it's
> not obvious, so I would appreciate clarification. Thank you.

np, welcome.

[…]

Kind Regards,
Dave Rice