[FFmpeg-user] Why does specifying audio input before webcam video cause them to go out of sync?

Justyn ffmpeg at justyn.co
Thu Aug 6 19:13:51 EEST 2020


Hi all

I am using FFmpeg to stream from a webcam and a pulseaudio source to an
RTMP server.

I know that argument order has an effect in FFmpeg.

But I have found that if I specify the audio input stream before the video
input stream then the audio is delayed, about half a second behind the
video.

Since these are just two input streams combined together for the output,
why does the order have an effect?

I have stripped down and tested the below commands in order to simplify
this post, in fact I am using hardware acceleration, AAC and various other
codec options, the effect of the input ordering is always the same.

## FFmpeg command specifying video input first (no delay):

ffmpeg -f v4l2 -input_format mjpeg -framerate 30 -video_size 1280x720 -i
/dev/video1 -f pulse -i default -c:v libx264 -preset veryfast -f flv rtmp://
a.rtmp.youtube.com/live2/${STREAM_KEY}

## FFmpeg command specifying audio input first (audio 0.5 seconds behind
video):

ffmpeg -f pulse -i default -f v4l2 -input_format mjpeg -framerate 30
-video_size 1280x720 -i /dev/video1 -c:v libx264 -preset veryfast -f flv
rtmp://a.rtmp.youtube.com/live2/${STREAM_KEY}

The stdout messages from FFmpeg seem to be the same, except the stream
order.

## Output when video input is first (no delay):

Input #0, video4linux2,v4l2, from '/dev/video1':
  Duration: N/A, start: 331644.817465, bitrate: N/A
    Stream #0:0: Video: mjpeg (Baseline), yuvj422p(pc,
bt470bg/unknown/unknown), 1280x720, 30 fps, 30 tbr, 1000k tbn, 1000k tbc
Guessed Channel Layout for Input Stream #1.0 : stereo
Input #1, pulse, from 'default':
  Duration: N/A, start: 1596371796.728130, bitrate: 1536 kb/s
    Stream #1:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))
  Stream #1:0 -> #0:1 (pcm_s16le (native) -> mp3 (libmp3lame))

## Output when audio input is first (audio 0.5 seconds behind video):

Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, pulse, from 'default':
  Duration: N/A, start: 1596371788.496242, bitrate: 1536 kb/s
    Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
Input #1, video4linux2,v4l2, from '/dev/video1':
  Duration: N/A, start: 331637.326454, bitrate: N/A
    Stream #1:0: Video: mjpeg (Baseline), yuvj422p(pc,
bt470bg/unknown/unknown), 1280x720, 30 fps, 30 tbr, 1000k tbn, 1000k tbc
Stream mapping:
  Stream #1:0 -> #0:0 (mjpeg (native) -> h264 (libx264))
  Stream #0:0 -> #0:1 (pcm_s16le (native) -> mp3 (libmp3lame))

As you can see, the stream mapping is correct in each case.

FFmpeg is version n4.3.1 compiled from git, on Ubuntu 20.04.

What's going on? Any insights appreciated.

Thanks
Justyn

ps I originally asked this on StackExchange, where it was suggested that I
ask on this list:
https://unix.stackexchange.com/questions/602499/why-does-specifying-audio-input-before-webcam-video-input-in-ffmpeg-cause-them-t


More information about the ffmpeg-user mailing list