[FFmpeg-user] AAC in MOV 1 frame off

Peter Wright peter at peterphi.com
Wed Apr 2 01:45:44 CEST 2014

On 1 April 2014 at 21:57:20, Carl Eugen Hoyos (cehoyos at ag.or.at) wrote:
Peter Wright <peter <at> peterphi.com> writes: 

> I am having a problem with AV sync being off by 1 
> frame when using AAC and MP3 

Iirc, this was reported several times, but I 
believe that neither myself nor another FFmpeg 
developer understood the issue: 
How can you know that AV-sync is off by 0.04 seconds? 

Carl Eugen 
Hi Carl,

Firstly, I’m assuming that quicktime / final cut pro do the correct thing (I’m trying to write files that will be used by end-users in FCP and quicktime so from my perspective I want the files to work in quicktime) and using them to play back a frame of video and its associated audio one at a time. However, there does seem to be something odd going on anyway (with the first audio packet having a PTS value that puts it before the first video frame - see below)

The sample I’m generating (see previous e-mail for command) using lavfilter is emitting 48k samples/second (sine wave with a beep once a second). This should mean that the first 1920 audio samples in the stream are a “beep”, with the next 46080 being a constant tone. When I play it back with ffplay the first thing I can hear is the beep, then the tone. When I play back in Quicktime the initial beep is not audible. This problem also manifests for an audio-only MOV file.

On playback the tone isn’t off by precisely 0.04 seconds, but it’s roughly that: when I play only frame 00:00:00:23 I can hear a small click which I think is part of the 1920 sample beep (which is clearly audible when 00:00:00:24 is played). I imported the waveform from the .MOV into Audacity - the beep starts at sample 45,888 which is 2112 samples too early (it’s also 1920+192 - I’m not sure if 192 appearing is a coincidence)

When I looked at the file with ffprobe -show_packets I see that the first audio packet has a PTS Time of -0.021333 (and a duration time of 0.021333), whereas the first video packet has a PTS Time of 0.0000. So if I understand things correctly that entire audio packet should be played back before the first frame of video is displayed (and while 1024 audio samples doesn’t completely explain why the sample is ~1920+192 samples early it does make me think that’s something odd is happening)

So, to recap: the problem goes away when emitting PCM (and both codecs which exhibit this problem, AAC and MP3, have a negative PTS on the first packet)

I’d like to help figure out what’s going on here if I can, do you have any pointers for where I should look to get you more detail? I have some familiarity with the code but not a great deal with the specifics of the encoders and the mov muxer (one experiment I’d like to try is to find out who’s assigning that negative PTS value and see what happens if it’s changed to zero instead)

More information about the ffmpeg-user mailing list