[FFmpeg-user] Synchronizing A/V streams from independent sources?

Nicolas George george at nsup.org
Sun Jun 1 20:38:10 CEST 2014

Le tridi 13 prairial, an CCXXII, Jeff a écrit :
> For some reason, the source file which has my preferred video stream
> runs slightly faster (1:20:32) than the source of my audio stream
> (1:23:58.000431). Obviously, these streams cannot be synchronized by
> shifting the start time.

Looking at the numbers, it is pretty obvious that the first video has 25
frames per second, which is the standard for PAL, while the second one has
24000/1001 (approximatively 23.976), which is one of the standards for NTSC.

I urge you to use these exact numbers in your computations. Remember than
even a 0.005% difference will give a noticeable A-V desync over the run of
your program.

> I have, so far, tried 'setpts=PTS*1.xxxxxxxx' to slow and extend the
> video, but this method introduces jerkiness regardless of whether I
> choose '-vsync vfr' or 'cfr'. I get better (smooth, natural

That happens because setpts is too versatile, the framework can not guess
what you are doing with it.

> sounding) results by speeding up the audio with 'atempo=1.xxxxxx' or
> with 'asetpts=PTS*0.9xxxxxxx,aresample=async=5000'.

Non-technical note: if you want the best results, you should inquire about
the original format of your content, in order to produce an output that
matches it as closely as possible. For example, if your content comes from a
TV recording in the NTSC world, you should probably convert towards the NTSC
speed. Or the other way around if what you have is an American DVD (NTSC) of
a British show (originally PAL) for example.

You also probably need to compare the pitch of the audio: the PAL<->NTSC
conversion is frequently done by just changing the speed, without correcting
the pitch. The difference is 0.724 semitone, someone with absolute pitch
will be able to notice it.

> The main problem with stretching the video or shrinking the audio is
> that granularity limitations are preventing a perfect match in
> running times that would synchronize the two streams from beginning
> to end. Is there something I am missing, some other approach to
> take?

There is something you are missing: the notion of time base.

All timestamps are handled as integers, as a multiple of a base interval
called time base. To optimize things, the time base is selected separately
for each stream. For streams at constant frame rate, it is usually set to
the normal interval between frames.

In other words, with your 25 FPS that you are trying to convert to 24 FPS,
the timestamps are 0, 1, 2, ..., 22, 23, 24, 25, all multiples of 1/25
seconds, which means:

0.00, 0.04, 0.08, ..., 0.88, 0.92, 0.96, 1.00

The setpts filter then converts to 24 FPS:

0.0000, 0.0417, 0.0833, ..., 0.9167, 0.9583, 1.0000, 1.0417

And then it is converted back to integers as multiples of 1/25 seconds:

0, 1, 2, ..., 22, 23, 25, 26

And voilà, the jerkiness.

What you need is to decompose all steps so that ffmpeg gets the computation

First, use the settb filter to choose a time base capable of expressing all
timestamps with enough precision. In this particular case, this is easy:
1/24000 is capable of expressing all timestamps exactly, 25 FPS means 960
between frames while ~24 has 1001 between frames.

Then add your setpts to do the speed change.

And then add the fps filter to explain to ffmpeg that your formula is just a
speed change and let it set the target properties correctly.

And test each step carefully using the showinfo filter.


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-user/attachments/20140601/dd6b93a4/attachment.asc>

More information about the ffmpeg-user mailing list