[FFmpeg-user] AAC in MOV 1 frame off
peter at peterphi.com
Wed Apr 2 18:00:01 CEST 2014
On 2 April 2014 at 00:45:45, Peter Wright (peter at peterphi.com) wrote:
On 1 April 2014 at 21:57:20, Carl Eugen Hoyos (cehoyos at ag.or.at) wrote:
Peter Wright <peter <at> peterphi.com> writes:
> I am having a problem with AV sync being off by 1
> frame when using AAC and MP3
Iirc, this was reported several times, but I
believe that neither myself nor another FFmpeg
developer understood the issue:
How can you know that AV-sync is off by 0.04 seconds?
Firstly, I’m assuming that quicktime / final cut pro do the correct thing (I’m trying to write files that will be used by end-users in FCP and quicktime so from my perspective I want the files to work in quicktime) and using them to play back a frame of video and its associated audio one at a time. However, there does seem to be something odd going on anyway (with the first audio packet having a PTS value that puts it before the first video frame - see below)
The sample I’m generating (see previous e-mail for command) using lavfilter is emitting 48k samples/second (sine wave with a beep once a second). This should mean that the first 1920 audio samples in the stream are a “beep”, with the next 46080 being a constant tone. When I play it back with ffplay the first thing I can hear is the beep, then the tone. When I play back in Quicktime the initial beep is not audible. This problem also manifests for an audio-only MOV file.
On playback the tone isn’t off by precisely 0.04 seconds, but it’s roughly that: when I play only frame 00:00:00:23 I can hear a small click which I think is part of the 1920 sample beep (which is clearly audible when 00:00:00:24 is played). I imported the waveform from the .MOV into Audacity - the beep starts at sample 45,888 which is 2112 samples too early (it’s also 1920+192 - I’m not sure if 192 appearing is a coincidence)
When I looked at the file with ffprobe -show_packets I see that the first audio packet has a PTS Time of -0.021333 (and a duration time of 0.021333), whereas the first video packet has a PTS Time of 0.0000. So if I understand things correctly that entire audio packet should be played back before the first frame of video is displayed (and while 1024 audio samples doesn’t completely explain why the sample is ~1920+192 samples early it does make me think that’s something odd is happening)
So, to recap: the problem goes away when emitting PCM (and both codecs which exhibit this problem, AAC and MP3, have a negative PTS on the first packet)
I’d like to help figure out what’s going on here if I can, do you have any pointers for where I should look to get you more detail? I have some familiarity with the code but not a great deal with the specifics of the encoders and the mov muxer (one experiment I’d like to try is to find out who’s assigning that negative PTS value and see what happens if it’s changed to zero instead)
I think I know what’s happening here - and, specifically, why the second beep starts 2112 samples early (I had originally thought 2112 was 1920+192 but it seems to be a magic number from historical AAC encoders). It also explains why the problem occurs for MP3.
Appendix G of the Quicktime Specification ( https://developer.apple.com/library/mac/documentation/quicktime/qtff/QTFFAppenG/QTFFAppenG.html ) describes how AAC is positioned relative to video, by using the Edit List atom and the Sample Group Structures atom. Crucially, “In the absence of the sample group structures, the classic solution of expecting an implicit encoding delay of 2112 samples and the edit list to start at the beginning of encoder delay will be assumed as described in the previous section”. FFmpeg doesn’t write an spgd atom, so it triggers this behaviour in Apple’s code.
That indicates that it’s necessary to write sgpd and sbgp atoms for the audio tracks (to confirm that the MOV is being encoded with explicit delay information). I tried writing some initial code for this (patch attached for movenc.c) but don’t see a difference, so I assume the fallback behaviour of Apple’s code is still being triggered.
Interestingly, the files generated by Final Cut Pro X export to MOV don’t contain the sgdp/sbgp atoms
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 2159 bytes
Desc: not available
More information about the ffmpeg-user