[FFmpeg-trac] #7182(avformat:new): Asynchronity when muxing Opus in Matroska
FFmpeg
trac at avcodec.org
Wed May 2 18:34:59 EEST 2018
#7182: Asynchronity when muxing Opus in Matroska
-----------------------------------+--------------------------------------
Reporter: mkver | Type: defect
Status: new | Priority: normal
Component: avformat | Version: git-master
Keywords: mkv, opus | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-----------------------------------+--------------------------------------
Muxing opus in Matroska currently leads to asynchronity because the muxer
doesn't account for the fact that Matroska's CodecDelay element already
contains an implicit delay.
Before turning to the more explicit explanation, let me say that I used
this version of ffmpeg (latest version of Zeranoe's builds, still from
today; I'm declaring the version to be git-master although git-master is
ahead by one completely unrelated commit):
{{{
ffmpeg version N-90920-ge07b1913fc Copyright (c) 2000-2018 the FFmpeg
developers
built with gcc 7.3.0 (GCC)
configuration: --disable-static --enable-shared --enable-gpl --enable-
version3 --enable-sdl2 --enable-bzlib --enable-fontconfig --enable-gnutls
--enable-iconv --enable-libass --enable-libbluray --enable-libfreetype
--enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb
--enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy
--enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx
--enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265
--enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp
--enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-
libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-
libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va
--enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth
libavutil 56. 18.100 / 56. 18.100
libavcodec 58. 19.100 / 58. 19.100
libavformat 58. 13.100 / 58. 13.100
libavdevice 58. 4.100 / 58. 4.100
libavfilter 7. 21.100 / 7. 21.100
libswscale 5. 2.100 / 5. 2.100
libswresample 3. 2.100 / 3. 2.100
libpostproc 55. 2.100 / 55. 2.100
}}}
The nullsrc and anullsrc filter create tracks whose timestamps (both pts
and dts) start at zero:
{{{
ffmpeg.exe -f lavfi -i nullsrc -f lavfi -i anullsrc -t 0.2 -f framehash
-hash crc32 -
#format: frame checksums
#version: 2
#hash: CRC32
#software: Lavf58.13.100
#tb 0: 1/25
#media_type 0: video
#codec_id 0: rawvideo
#dimensions 0: 320x240
#sar 0: 1/1
#tb 1: 1/44100
#media_type 1: audio
#codec_id 1: pcm_s16le
#sample_rate 1: 44100
#channel_layout 1: 3
#channel_layout_name 1: stereo
#stream#, dts, pts, duration, size, hash
0, 0, 0, 1, 115200, 2a01c517
1, 0, 0, 1024, 4096, c71c0011
1, 1024, 1024, 1024, 4096, c71c0011
0, 1, 1, 1, 115200, 2a01c517
1, 2048, 2048, 1024, 4096, c71c0011
1, 3072, 3072, 1024, 4096, c71c0011
0, 2, 2, 1, 115200, 2a01c517
1, 4096, 4096, 1024, 4096, c71c0011
1, 5120, 5120, 1024, 4096, c71c0011
0, 3, 3, 1, 115200, 2a01c517
1, 6144, 6144, 1024, 4096, c71c0011
0, 4, 4, 1, 115200, 2a01c517
1, 7168, 7168, 1024, 4096, c71c0011
1, 8192, 8192, 628, 2512, 3f99da8d
}}}
If one encodes the audio, the pts and dts of the audio are shifted by the
amount of samples of encoder delay the encoding process entails so that
the output audio that actually corresponds to input samples has the same
timestamps as the corresponding input samples:
{{{
ffmpeg.exe -f lavfi -i nullsrc -f lavfi -i anullsrc -c:a libopus -t 0.5 -f
framehash -hash crc32 -
#format: frame checksums
#version: 2
#hash: CRC32
#extradata 1, 19, ea5d642a
#software: Lavf58.13.100
#tb 0: 1/25
#media_type 0: video
#codec_id 0: rawvideo
#dimensions 0: 320x240
#sar 0: 1/1
#tb 1: 1/48000
#media_type 1: audio
#codec_id 1: opus
#sample_rate 1: 48000
#channel_layout 1: 3
#channel_layout_name 1: stereo
#stream#, dts, pts, duration, size, hash
1, -312, -312, 960, 3, 8abe71cf
0, 0, 0, 1, 115200, 2a01c517
1, 648, 648, 960, 3, 8abe71cf
1, 1608, 1608, 960, 3, 8abe71cf
0, 1, 1, 1, 115200, 2a01c517
1, 2568, 2568, 960, 3, 8abe71cf
1, 3528, 3528, 960, 3, 8abe71cf
0, 2, 2, 1, 115200, 2a01c517
1, 4488, 4488, 960, 3, 8abe71cf
1, 5448, 5448, 960, 3, 8abe71cf
0, 3, 3, 1, 115200, 2a01c517
1, 6408, 6408, 960, 3, 8abe71cf
1, 7368, 7368, 960, 3, 8abe71cf
0, 4, 4, 1, 115200, 2a01c517
1, 8328, 8328, 960, 3, 8abe71cf
1, 9288, 9288, 960, 3, 8abe71cf
0, 5, 5, 1, 115200, 2a01c517
1, 10248, 10248, 960, 3, 8abe71cf
1, 11208, 11208, 960, 3, 8abe71cf
0, 6, 6, 1, 115200, 2a01c517
1, 12168, 12168, 960, 3, 8abe71cf
1, 13128, 13128, 960, 3, 8abe71cf
0, 7, 7, 1, 115200, 2a01c517
1, 14088, 14088, 960, 3, 8abe71cf
1, 15048, 15048, 960, 3, 8abe71cf
0, 8, 8, 1, 115200, 2a01c517
1, 16008, 16008, 960, 3, 8abe71cf
1, 16968, 16968, 960, 3, 8abe71cf
0, 9, 9, 1, 115200, 2a01c517
1, 17928, 17928, 960, 3, 8abe71cf
1, 18888, 18888, 960, 3, 8abe71cf
0, 10, 10, 1, 115200, 2a01c517
1, 19848, 19848, 960, 3, 8abe71cf
1, 20808, 20808, 960, 3, 8abe71cf
0, 11, 11, 1, 115200, 2a01c517
1, 21768, 21768, 960, 3, 8abe71cf
1, 22728, 22728, 960, 3, 8abe71cf
0, 12, 12, 1, 115200, 2a01c517
1, 23688, 23688, 312, 3, 8abe71cf, S=1, 10,
6ba9ada3
}}}
If one now muxes this into Matroska (in order to use a valid codec in
Matroska, I encoded the video with libx264 and -tune zerolatency in order
not to run into #4536), the -312 samples (6.5ms) encoder delay from above
lead to a shift of all timestamps by the same amount to make all
timestamps non-negative; this happens with every audio codec and is not
Opus-specific:
{{{
ffmpeg.exe -f lavfi -i nullsrc -f lavfi -i anullsrc -c:v libx264 -tune
zerolatency -c:a libopus -t 0.5 -f matroska test.mkv
mkvinfo -s test.mkv
Track 1: video, codec ID: V_MPEG4/ISO/AVC (h.264 profile: High @L1.3),
mkvmerge/mkvextract track ID: 0, language: und, default duration: 40.000ms
(25.000 frames/fields per second for a video track), pixel width: 320,
pixel height: 240
Track 2: audio, codec ID: A_OPUS, mkvmerge/mkvextract track ID: 1,
language: und, channels: 2, sampling freq: 48000, bits per sample: 16
I frame, track 2, timestamp 00:00:00.000000000, size 3, adler 0x05f302fa
I frame, track 1, timestamp 00:00:00.007000000, size 812, adler 0x080a17e4
I frame, track 2, timestamp 00:00:00.021000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.041000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.047000000, size 51, adler 0xa07a11ec
I frame, track 2, timestamp 00:00:00.061000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.081000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.087000000, size 61, adler 0x76721649
I frame, track 2, timestamp 00:00:00.101000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.121000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.127000000, size 65, adler 0x23a11875
I frame, track 2, timestamp 00:00:00.141000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.161000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.167000000, size 65, adler 0x249f181b
I frame, track 2, timestamp 00:00:00.181000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.201000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.207000000, size 65, adler 0x334918bd
I frame, track 2, timestamp 00:00:00.221000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.241000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.247000000, size 65, adler 0x34021860
I frame, track 2, timestamp 00:00:00.261000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.281000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.287000000, size 65, adler 0x42ac1902
I frame, track 2, timestamp 00:00:00.301000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.321000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.327000000, size 65, adler 0x085c17a3
I frame, track 2, timestamp 00:00:00.341000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.361000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.367000000, size 65, adler 0x17061845
I frame, track 2, timestamp 00:00:00.381000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.401000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.407000000, size 65, adler 0x180417eb
I frame, track 2, timestamp 00:00:00.421000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.441000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.447000000, size 65, adler 0x2669188a
I frame, track 2, timestamp 00:00:00.461000000, size 3, adler 0x05f302fa
I frame, track 2, timestamp 00:00:00.481000000, size 3, adler 0x05f302fa
P frame, track 1, timestamp 00:00:00.487000000, size 65, adler 0x2722182d
I frame, track 2, timestamp 00:00:00.501000000, size 3, adler 0x05f302fa
}}}
So the encoder delay gets backed into the usual timestamps. But for Opus
the encoding delay also gets signalled via the CodecDelay element in the
Opus track header. The semantics of this field imply that the first 6.5ms
of audio should be discarded and that the audio for time t has Matroska
time t+6.5ms (i.e. the second opus block at 20ms actually has a timestamp
of 13.5ms). This means that the synchronization of the opus track and the
other tracks shifted by the encoder delay as can be seen e.g. in the
output of the Matroska demuxer:
{{{
ffmpeg.exe -copyts -i test.mkv -c copy -f framehash -hash crc32 -
#format: frame checksums
#version: 2
#hash: CRC32
#extradata 0, 40, 8237cd92
#extradata 1, 19, ea5d642a
#software: Lavf58.13.100
#tb 0: 1/1000
#media_type 0: video
#codec_id 0: h264
#dimensions 0: 320x240
#sar 0: 1/1
#tb 1: 1/1000
#media_type 1: audio
#codec_id 1: opus
#sample_rate 1: 48000
#channel_layout 1: 3
#channel_layout_name 1: stereo
#stream#, dts, pts, duration, size, hash
1, -7, -7, 20, 3, 8abe71cf
0, 7, 7, 40, 812, dbac8e3e
1, 14, 14, 20, 3, 8abe71cf
1, 34, 34, 20, 3, 8abe71cf
0, 47, 47, 40, 51, 4885e758
1, 54, 54, 20, 3, 8abe71cf
1, 74, 74, 20, 3, 8abe71cf
0, 87, 87, 40, 61, 5c29c696
1, 94, 94, 20, 3, 8abe71cf
1, 114, 114, 20, 3, 8abe71cf
0, 127, 127, 40, 65, 2832137b
1, 134, 134, 20, 3, 8abe71cf
1, 154, 154, 20, 3, 8abe71cf
0, 167, 167, 40, 65, 985e3247
1, 174, 174, 20, 3, 8abe71cf
1, 194, 194, 20, 3, 8abe71cf
0, 207, 207, 40, 65, 85567570
1, 214, 214, 20, 3, 8abe71cf
1, 234, 234, 20, 3, 8abe71cf
0, 247, 247, 40, 65, c623be44
1, 254, 254, 20, 3, 8abe71cf
1, 274, 274, 20, 3, 8abe71cf
0, 287, 287, 40, 65, db2bf973
1, 294, 294, 20, 3, 8abe71cf
1, 314, 314, 20, 3, 8abe71cf
0, 327, 327, 40, 65, 49d46f1e
1, 334, 334, 20, 3, 8abe71cf
1, 354, 354, 20, 3, 8abe71cf
0, 367, 367, 40, 65, 54dc2829
1, 374, 374, 20, 3, 8abe71cf
1, 394, 394, 20, 3, 8abe71cf
0, 407, 407, 40, 65, 584b1113
1, 414, 414, 20, 3, 8abe71cf
1, 434, 434, 20, 3, 8abe71cf
0, 447, 447, 40, 65, 0aa1a42a
1, 454, 454, 20, 3, 8abe71cf
1, 474, 474, 20, 3, 8abe71cf
0, 487, 487, 40, 65, f52f7718
1, 494, 494, 20, 3, 8abe71cf, S=1, 10,
6ba9ada3
}}}
(Without -copyts the timestamps would be shifted to make them non-
negative.)
As one sees, this is essentially a shift by the encoder delay. If one
makes roundtrips demuxer->muxer, the tracks get ever more out of sync.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/7182>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list