[FFmpeg-trac] #5910(undetermined:new): AAC to PCM conversion inserts extra silence in the beginning
trac at avcodec.org
Wed Oct 26 17:33:42 EEST 2016
#5910: AAC to PCM conversion inserts extra silence in the beginning
Reporter: | Type: defect
jwilhelmsson | Priority: normal
Status: new | Version:
Component: | unspecified
undetermined | Blocked By:
Keywords: aac pcm | Reproduced by developer: 0
Analyzed by developer: 0 |
Summary of the bug:
When converting AAC audio files/streams to PCM extra silence is inserted
in the beginning of the output file.
This may very well be the same issue as ticket #2325, but since I believe
I have more information I elected to create a new one.
The long version:
My company dub cartoons, so we receive many kinds of video formats from
our various clients. Recently one of them complained that our final
delivery was out of sync compared to the original material, and that's how
this issue was discovered. The reference files from the client were mp4:s
with aac audio, and when I converted said audio into wav files (for use in
our recording software, Steinberg Nuendo) extra silence got inserted in
the beginning, making us record everything out of sync.
That it was ffmpeg that was in the wrong was concluded by comparing with
files converted by ProTools, Nuendo, and QuickTime - which are all the
same and different from the ffmpeg output.
After lots of testing I concluded that it's the AAC to PCM conversion
that's the culprit (ie. the video container format is mostly irrelevant),
and also that the length of the inserted silence varies between different
files. I haven't been able to pinpoint exactly what causes the difference.
Attached are five aac files, plus wav files converted by ffmpeg (3.1.4)
and QuickTime Pro (7.7.9) clearly showing the difference. Since the files
come from commercial productions I've only included 7 to 10 seconds from
each, but it's enough to see the error.
Two of the files insert approximately 44 milliseconds (or about 2100
samples) of silence, two insert 108 milliseconds (about 5200 samples), and
one oddly enough gets only 32 milliseconds of silence even though the
audio is shifted 44 ms (this is easy to see since it starts with a test
How to reproduce:
The aac files were converted by ffmpeg with the command (I'll attach
outputs in separate messages below):
ffmpeg -i input -c:a pcm_s24le -ar 48k output
They were also converted with QuickTime Pro with the same settings (24
bits, 48kHz). I then compared the waveforms in both Nuendo and Audacity.
The offset values were measured by manually marking an area in Audacity,
so they are very approximate.
The attached files come from one movie and two tv series (two episodes
each). The movie files are called "g", and the series "tj" and "td". The
aac files were extracted from the original mp4 files by stream copying:
ffmpeg -i input -c:a copy -t 10 output
The movie file starts with a test tone, and is also the one which differs
32 ms in the beginning of the test tone, but 44 at the end of it.
The error is the same when converting directly from the mp4 file and when
converting from an extracted aac.
Extracting PCM wav from a mov container produces no errors.
If I convert the aac stream to a new aac file there's still an error, but
only half as long. I've only tested this on one file, but it produced a 22
ms gap instead of 44 ms.
Compounded converting does not compound the error. Ie: Converting from aac
to aac, and then converting that output file to aac again does not
increase the error.
Converting to a different bitrate/sample rate does not affect the result.
I've done a lot of testing, but it's very possible that I've forgotten
some vital information in this report, so please ask if you need more
Ticket URL: <https://trac.ffmpeg.org/ticket/5910>
FFmpeg issue tracker
More information about the FFmpeg-trac