[FFmpeg-trac] #4178(avformat:open): Opus audio in MKV container

FFmpeg trac at avcodec.org
Sun Apr 22 09:35:05 EEST 2018


#4178: Opus audio in MKV container
-------------------------------------+-------------------------------------
             Reporter:  agressiv     |                    Owner:  vigneshvg
                 Type:  defect       |                   Status:  open
             Priority:  important    |                Component:  avformat
              Version:  git-master   |               Resolution:
             Keywords:  mkv opus     |               Blocked By:
  regression                         |  Reproduced by developer:  1
             Blocking:               |
Analyzed by developer:  1            |
-------------------------------------+-------------------------------------

Comment (by mkver):

 I can reliably create such files with ffmpeg and have a theory on why this
 is happening. The ultrashort answer is: Bad things can happen if the
 timestamps that the libopus encoder receives aren't perfect.

 Before I come to the long answer, let me add that I used the current git-
 master to produce the framehash logs that you will see. In more detail
 {{{
 ffmpeg version N-90800-g8592ae1a1e Copyright (c) 2000-2018 the FFmpeg
 developers
 built with gcc 7.3.0 (Rev1, Built by MSYS2 project)
 configuration: --disable-static --enable-shared --disable-amf --disable-
 cuda --disable-cuvid --disable-d3d11va --disable-nvenc --disable-ffnvcodec
 --disable-debug --enable-libopus --enable-libbluray --enable-libmfx
 --enable-libsoxr --enable-libwavpack --enable-gpl --enable-openssl
 --enable-avisynth --enable-libfdk-aac --enable-libzvbi --disable-
 encoder=dca --disable-encoder=nellymoser --disable-encoder=real_144
 --disable-encoder=truehd --disable-encoder=vorbis --disable-encoder=sonic
 --disable-encoder=sonicls --disable-encoder=amv --disable-encoder=asv1
 --disable-encoder=asv2 --disable-encoder=flashsv --disable-
 encoder=flashsv2 --disable-encoder=roqvideo --disable-encoder=svq1
 --disable-encoder=zmbv --disable-encoder=zlib --disable-encoder=snow
 --disable-encoder=cinepak --disable-encoder=a64multi --disable-
 encoder=a64multi5 --disable-encoder=h261 --disable-encoder=h263 --disable-
 encoder=h263p --disable-encoder=wmv7 --disable-encoder=wmav1 --disable-
 encoder=wmav2 --disable-encoder=wmv8 --enable-nonfree --shlibdir=/local64
 /bin-video
 libavutil      56. 15.100 / 56. 15.100
 libavcodec     58. 19.100 / 58. 19.100
 libavformat    58. 13.100 / 58. 13.100
 libavdevice    58.  4.100 / 58.  4.100
 libavfilter     7. 19.100 /  7. 19.100
 libswscale      5.  2.100 /  5.  2.100
 libswresample   3.  2.100 /  3.  2.100
 libpostproc    55.  2.100 / 55.  2.100
 }}}
 So the other logs won't have the version field.

 a) First some information about granule positions: The granule positions
 in ogg pages indicate the position in the stream after decoding all
 packets which are completely within that page. They are restricted as
 follows:
 i) All pages with completed packets except the first and the last MUST
 have a granule position equal to the number of samples contained in
 packets that complete on that page plus the granule position of the most
 recent page with completed packets. (From section 4 of
 [https://tools.ietf.org/html/rfc7845 RFC 7845].)
 ii) If a page has the 'end of stream' flag set, then instead of the above
 the difference between the number of samples contained in the packets that
 complete on that page and the difference between the granule position of
 said page and the most recent page with completed packets indicates how
 many samples should be trimmed away at the end; if there was no earlier
 page with completed packets, then one should work as if the granule
 position of the most recent earlier page with completed packets were zero
 (in this case one also has to apply the preskip). (This is 4.4 of RFC
 7845.)
 iii) If the first page with completed packets isn't also the last page
 (then ii) applies) then it must have a granule position that is >= the sum
 of the number of samples contained in packages that complete on that page
 (the pre-skip is ignored in calculating the sum) so that there are no
 negative granule positions when working backwards. The granule position
 may be larger than the sum (useful for synchronization with other streams
 in the same multiplex); if the sum is larger then the stream is completely
 invalid (yes, the whole stream, not only the first page or the samples
 which would have negative granule positions). (This is 4.5 of RFC 7845.)

 b) The easiest way to produce malformed files is by using a negative
 -itsoffset:
 {{{
 ffmpeg -itsoffset -0.5 -i test.dts -c:a libopus offset.-0.5.opus
 }}}
 opusinfo (a part of the opus-tools package from the creators of the opus
 codec) complains about this file:
 {{{
 Processing file "offset.-0.5.opus"...

 New logical stream (#1, serial: 7d2420f4): type opus
 Encoded with Lavf58.13.100
 User comments section follows...
         encoder=Lavc58.19.100 libopus
 WARNING: Samples with negative granpos in stream 1
 Opus stream 1:
         Pre-skip: 312
         Playback gain: 0 dB
         Channels: 2
         Original sample rate: 48000Hz
         Packet duration:   20.0ms (max),   20.0ms (avg),   20.0ms (min)
         Page duration:   1000.0ms (max),  968.4ms (avg),   20.0ms (min)
         Total data length: 386932 bytes (overhead: 0.811%)
         Playback length: 0m:30.005s
         Average bitrate: 103.2 kb/s, w/o overhead: 102.3 kb/s
 Logical stream 1 ended
 }}}
 Let's use the framehash muxer to see what the timestamps are when they
 leave the encoder:
 {{{
 ffmpeg -itsoffset -0.5 -i fl.dts -c:a libopus -f framehash -hash crc32 -
 ...
 #format: frame checksums
 #version: 2
 #hash: CRC32
 #extradata 0,                              19, ea5d642a
 #software: Lavf58.13.100
 #tb 0: 1/48000
 #media_type 0: audio
 #codec_id 0: opus
 #sample_rate 0: 48000
 #channel_layout 0: 3
 #channel_layout_name 0: stereo
 #stream#, dts,        pts, duration,     size, hash
 0,     -24312,     -24312,      960,      425, 996864ad
 0,     -23352,     -23352,      960,      241, 1fcc1d4d
 0,     -22392,     -22392,      960,      228, 05f1dd79
 0,     -21432,     -21432,      960,      225, e56a7998
 0,     -20472,     -20472,      960,      224, a12a261d
 0,     -19512,     -19512,      960,      226, 27020d0e
 0,     -18552,     -18552,      960,      249, ab31aeb9
 0,     -17592,     -17592,      960,      241, 44e9b2e4
 0,     -16632,     -16632,      960,      241, 8d5dbc65
 0,     -15672,     -15672,      960,      253, 54c603d7
 0,     -14712,     -14712,      960,      256, f9acaea3
 0,     -13752,     -13752,      960,      254, 308a7027
 0,     -12792,     -12792,      960,      262, 297c12b8
 0,     -11832,     -11832,      960,      271, 86b889ca
 0,     -10872,     -10872,      960,      266, 07e95927
 0,      -9912,      -9912,      960,      271, eedd9414
 0,      -8952,      -8952,      960,      275, 856a747f
 0,      -7992,      -7992,      960,      275, 48c4343e
 0,      -7032,      -7032,      960,      281, a54c56ad
 0,      -6072,      -6072,      960,      275, a5ede609
 0,      -5112,      -5112,      960,      271, f5795567
 0,      -4152,      -4152,      960,      270, cb1f8e24
 0,      -3192,      -3192,      960,      282, 9c81d325
 0,      -2232,      -2232,      960,      287, c4bec144
 0,      -1272,      -1272,      960,      276, 6978978a
 0,       -312,       -312,      960,      280, 928ce969
 0,        648,        648,      960,      288, a8b67809
 0,       1608,       1608,      960,      289, d08817ee
 0,       2568,       2568,      960,      281, 785f7424
 0,       3528,       3528,      960,      271, 01a150fd
 0,       4488,       4488,      960,      279, 1d1a6926
 0,       5448,       5448,      960,      299, 4ad6192a
 0,       6408,       6408,      960,      401, 1ba1ba43
 0,       7368,       7368,      960,      297, 722e745b
 0,       8328,       8328,      960,      399, bb637945
 0,       9288,       9288,      960,      296, b746197e
 0,      10248,      10248,      960,      276, 44dde335
 0,      11208,      11208,      960,      279, 3ffcb2f5
 0,      12168,      12168,      960,      293, 481af07f
 0,      13128,      13128,      960,      286, fbc2d89c
 0,      14088,      14088,      960,      278, 2983e9a8
 0,      15048,      15048,      960,      283, 6a8c6b1b
 0,      16008,      16008,      960,      285, b7b3a531
 0,      16968,      16968,      960,      285, 5ee67d70
 0,      17928,      17928,      960,      266, f3ad421b
 0,      18888,      18888,      960,      261, bea0961e
 0,      19848,      19848,      960,      272, d463ae16
 0,      20808,      20808,      960,      383, 3b03279b
 0,      21768,      21768,      960,      278, 9401e990
 0,      22728,      22728,      960,      270, abfad09a
 0,      23688,      23688,      960,      295, 1da5ee48
 0,      24648,      24648,      960,      266, 21c45f34
 0,      25608,      25608,      960,      256, 5003d43c
 0,      26568,      26568,      960,      274, dae2fa79
 0,      27528,      27528,      960,      268, 80b438cb
 }}}
 -0.5s are 24000 samples and the remaining difference of 312 samples are
 due to libopus' pre-skip of 312 samples. If one analyzes the generated
 file directly one sees that the first page contains 50 packets with 960
 samples each, i.e. 48000 samples (of which the first 312 are invalid), but
 the granule position of the first page shows 24000; if one simply
 calculated backwards, this means that the first packet would have started
 at -24000 which is against a) iii) above. Needless to say that the first
 page doesn't have the 'end of stream' flag set.
 Given the fact that according to the spec the whole stream has to be
 treated as invalid it is actually strange that opusinfo emits only a
 warning and not an error. The reference decoder, too, doesn't treat the
 stream as invalid: Instead it treats the first page as if it has end
 trimming although it doesn't have the 'end of stream' flag. In our sample
 this means that from the first page with 48000 samples the last 24000
 samples are stripped away because of end trimming and the first 312
 because of pre-skip. And indeed comparing the input file with the output
 of the reference decoder shows that they are essentially the same for the
 first 23688 samples and then totally different.

 This shows that the ogg muxer should automatically shift the granule
 positions to make them (both those implied and those explicitly written)
 non-negative. (But there is a problem here: In ogg, the relationsship
 between a timestamp and the granule position is codec-dependant and it
 needn't be a linear relationsship like for opus so shifting other tracks
 might be complicated.)

 c) If one uses an itsoffset larger than the page_duration of the ogg
 muxer, opusinfo complains even more: "Negative or zero granulepos (-14400)
 on Opus stream outside of headers. This file was created by a buggy
 encoder"

 d) Here is another way to come into a situation like b), but without using
 itsoffset. It has to do with odd behaviour (I'd call it a bug) of the
 native opus decoder. Instead of stripping the pre-skip away like the
 reference decoder does, it simply gives them negative timestamps. Here is
 a part of framehash's output of what this looks like for a non-defective
 file:
 {{{
 #format: frame checksums
 #version: 2
 #hash: CRC32
 #software: Lavf58.13.100
 #tb 0: 1/48000
 #media_type 0: audio
 #codec_id 0: pcm_s16le
 #sample_rate 0: 48000
 #channel_layout 0: 3
 #channel_layout_name 0: stereo
 #stream#, dts,        pts, duration,     size, hash
 0,       -312,       -312,      960,     3840, 1908c39f
 0,        648,        648,      960,     3840, 239ecff4
 0,       1608,       1608,      960,     3840, c5dd9714
 0,       2568,       2568,      960,     3840, 1173d416
 0,       3528,       3528,      960,     3840, e4e9ca53
 0,       4488,       4488,      960,     3840, dbc3e9f0
 0,       5448,       5448,      960,     3840, 2187b445
 0,       6408,       6408,      960,     3840, 25180cb2
 0,       7368,       7368,      960,     3840, 788bf31b
 0,       8328,       8328,      960,     3840, 1c3b1f55
 0,       9288,       9288,      960,     3840, a67eae2f
 0,      10248,      10248,      960,     3840, 17cc83a0
 }}}
 So if one uses an ordinary opus file as input, decodes it with the native
 decoder (the default decoder) and encodes this with libopus, one is in the
 very same situation as b). If one uses the libopus decoder, the result is
 fine.
 This behaviour of the native decoder is actually at the heart of #4692.
 This was exactly the situation which made me realize what's going on. See
 [https://forum.doom9.org/showthread.php?p=1839926#post1839926 here].

 e) I can also explain anthontex's observation with the exception of the
 part where he claims that streams <=5.1 seem to work. This time the root
 cause is lacing (that is used by default by mkvmerge for e.g. dts/dca
 tracks) probably coupled with strange timestamp rounding. Notice that
 test.dts is actually a stereo dts track.
 The timestamps from the dts file are good:
 {{{
 ffmpeg -i test.dts -f framehash -hash crc32 -
 ...
 #format: frame checksums
 #version: 2
 #hash: CRC32
 #software: Lavf58.13.100
 #tb 0: 1/48000
 #media_type 0: audio
 #codec_id 0: pcm_s16le
 #sample_rate 0: 48000
 #channel_layout 0: 3
 #channel_layout_name 0: stereo
 #stream#, dts,        pts, duration,     size, hash
 0,          0,          0,      512,     2048, 52bbda48
 0,        512,        512,      512,     2048, 2b037e9f
 0,       1024,       1024,      512,     2048, f69e3985
 0,       1536,       1536,      512,     2048, 04f27523
 0,       2048,       2048,      512,     2048, 0c9b0963
 0,       2560,       2560,      512,     2048, de6e37eb
 0,       3072,       3072,      512,     2048, 2230f372
 0,       3584,       3584,      512,     2048, b4275a94
 0,       4096,       4096,      512,     2048, e2efc7d5
 0,       4608,       4608,      512,     2048, e6ff0c6f
 0,       5120,       5120,      512,     2048, 43d5c355
 0,       5632,       5632,      512,     2048, f689afdb
 0,       6144,       6144,      512,     2048, 7ce06f4f
 0,       6656,       6656,      512,     2048, d639e9c7
 0,       7168,       7168,      512,     2048, 87aee60f
 0,       7680,       7680,      512,     2048, 6e32d1e1
 0,       8192,       8192,      512,     2048, 99b53229
 0,       8704,       8704,      512,     2048, 46803053
 0,       9216,       9216,      512,     2048, 4e4143b5
 0,       9728,       9728,      512,     2048, 2116fa38
 ...
 }}}
 If one remuxes test.dts with mkvmerge and specifies a timecode/timestamp-
 scale factor of 1000000 (for files who don't have a video track, mkvmerge
 by default uses a timecode/timestamp-scale factor that is small enough so
 that 1 tick of the timebase is less than the time of one sample so that
 timecodes/timestamps in the file are actually sample accurate; if there is
 a video track, it defaults to 1000000 (i.e. 1ms precision)), the
 timestamps aren't good any more (the merged file is called
 "Test.Laced.Big.TS.mka" ("TS" means TimestampScale)):
 {{{
 ffmpeg -i Test.Laced.Big.TS.mka -f framehash -hash crc32 -
 ...
 #format: frame checksums
 #version: 2
 #hash: CRC32
 #software: Lavf58.13.100
 #tb 0: 1/48000
 #media_type 0: audio
 #codec_id 0: pcm_s16le
 #sample_rate 0: 48000
 #channel_layout 0: 3
 #channel_layout_name 0: stereo
 #stream#, dts,        pts, duration,     size, hash
 0,          0,          0,      512,     2048, 52bbda48
 0,        504,        504,      512,     2048, 2b037e9f
 0,       1016,       1016,      512,     2048, f69e3985
 0,       1512,       1512,      512,     2048, 04f27523
 0,       2024,       2024,      512,     2048, 0c9b0963
 0,       2536,       2536,      512,     2048, de6e37eb
 0,       3048,       3048,      512,     2048, 2230f372
 0,       3560,       3560,      512,     2048, b4275a94
 0,       4072,       4072,      512,     2048, e2efc7d5
 0,       4584,       4584,      512,     2048, e6ff0c6f
 0,       5096,       5096,      512,     2048, 43d5c355
 0,       5592,       5592,      512,     2048, f689afdb
 0,       6104,       6104,      512,     2048, 7ce06f4f
 0,       6616,       6616,      512,     2048, d639e9c7
 0,       7128,       7128,      512,     2048, 87aee60f
 0,       7640,       7640,      512,     2048, 6e32d1e1
 0,       8184,       8184,      512,     2048, 99b53229
 0,       8696,       8696,      512,     2048, 46803053
 0,       9208,       9208,      512,     2048, 4e4143b5
 0,       9720,       9720,      512,     2048, 2116fa38
 0,      10232,      10232,      512,     2048, ffd0d2d3
 0,      10744,      10744,      512,     2048, ab0e8d25
 0,      11256,      11256,      512,     2048, d75d5dbf
 0,      11768,      11768,      512,     2048, 495f20b4
 0,      12280,      12280,      512,     2048, c73a83e5
 0,      12792,      12792,      512,     2048, 1a8bd665
 0,      13304,      13304,      512,     2048, 37baf488
 0,      13800,      13800,      512,     2048, 75a43386
 0,      14312,      14312,      512,     2048, bac86852
 0,      14824,      14824,      512,     2048, cfa03cf6
 0,      15336,      15336,      512,     2048, ec85b2cf
 0,      15848,      15848,      512,     2048, 568417f0
 0,      16360,      16360,      512,     2048, de55f656
 0,      16872,      16872,      512,     2048, b4471f41
 0,      17384,      17384,      512,     2048, a8b615d7
 0,      17880,      17880,      512,     2048, 634e69bd
 0,      18392,      18392,      512,     2048, 28bb1df8
 0,      18904,      18904,      512,     2048, 7a2b2546
 0,      19416,      19416,      512,     2048, dd67f369
 0,      19928,      19928,      512,     2048, 72468c87
 0,      20472,      20472,      512,     2048, 31358846
 0,      20984,      20984,      512,     2048, 1b25d341
 0,      21496,      21496,      512,     2048, 0f188f8e
 0,      22008,      22008,      512,     2048, d4c28420
 0,      22520,      22520,      512,     2048, c2a2cc15
 0,      23032,      23032,      512,     2048, 97348c24
 0,      23544,      23544,      512,     2048, 8266b6bd
 0,      24056,      24056,      512,     2048, 9492736f
 0,      24568,      24568,      512,     2048, a0eb4084
 0,      25080,      25080,      512,     2048, 84f6ec09
 0,      25592,      25592,      512,     2048, 050f991a
 0,      26088,      26088,      512,     2048, deee9a7e
 0,      26600,      26600,      512,     2048, 12b66ef5
 0,      27112,      27112,      512,     2048, 38780750
 0,      27624,      27624,      512,     2048, e309fbb0
 0,      28136,      28136,      512,     2048, ee05c406
 0,      28648,      28648,      512,     2048, fe965280
 0,      29160,      29160,      512,     2048, 0e456d8f
 0,      29672,      29672,      512,     2048, 8868c0a4
 0,      30168,      30168,      512,     2048, b67200db
 0,      30680,      30680,      512,     2048, 98452104
 0,      31192,      31192,      512,     2048, 1c1d5dfa
 0,      31704,      31704,      512,     2048, 3bb376e9
 0,      32216,      32216,      512,     2048, 5cb27573
 0,      32760,      32760,      512,     2048, ce8afcf9
 0,      33272,      33272,      512,     2048, 671aafd3
 0,      33784,      33784,      512,     2048, 6b0ef4ae
 0,      34296,      34296,      512,     2048, 6190ea4e
 0,      34808,      34808,      512,     2048, 9cc28eec
 0,      35320,      35320,      512,     2048, 2fcb40a1
 0,      35832,      35832,      512,     2048, c97c7941
 0,      36344,      36344,      512,     2048, a8ddb89e
 0,      36856,      36856,      512,     2048, 1f03cd39
 0,      37368,      37368,      512,     2048, 0ae93b83
 0,      37880,      37880,      512,     2048, 2f2c98d4
 0,      38376,      38376,      512,     2048, 77460589
 0,      38888,      38888,      512,     2048, a4d05c57
 0,      39400,      39400,      512,     2048, df5b2b8d
 0,      39912,      39912,      512,     2048, 19602dd2
 0,      40424,      40424,      512,     2048, 53e32a7f
 0,      40936,      40936,      512,     2048, 2f4acb24
 0,      41448,      41448,      512,     2048, 29b3fd40
 0,      41960,      41960,      512,     2048, 5cd68804
 0,      42456,      42456,      512,     2048, 1a8765dd
 0,      42968,      42968,      512,     2048, 5fbc5ae7
 0,      43480,      43480,      512,     2048, d41ba16a
 0,      43992,      43992,      512,     2048, 36614005
 0,      44504,      44504,      512,     2048, b959f96a
 0,      45048,      45048,      512,     2048, 764ffe4b
 0,      45560,      45560,      512,     2048, 45e7b0d8
 0,      46072,      46072,      512,     2048, e190ddd4
 0,      46584,      46584,      512,     2048, 89c9b204
 0,      47096,      47096,      512,     2048, 6d5e6559
 0,      47608,      47608,      512,     2048, faa96307
 0,      48120,      48120,      512,     2048, cf00e88c
 0,      48632,      48632,      512,     2048, 9a7a02d3
 ...
 }}}
 If I encode Test.Laced.Big.TS.mka with libopus to Test.Laced.Big.TS.opus,
 the resulting file is again defective:
 {{{
 Processing file "Test.Laced.Big.TS.opus"...

 New logical stream (#1, serial: fb757aa2): type opus
 Encoded with Lavf58.13.100
 User comments section follows...
         BPS-eng=1508966
         DURATION-eng=00:00:30.006000000
         NUMBER_OF_FRAMES-eng=2813
         NUMBER_OF_BYTES-eng=5659756
         _STATISTICS_WRITING_APP-eng=mkvmerge v22.0.0 ('At The End Of The
 World') 64-bit
         _STATISTICS_WRITING_DATE_UTC-eng=2018-04-22 04:35:34
         _STATISTICS_TAGS-eng=BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
         encoder=Lavc58.19.100 libopus
 WARNING: Samples with negative granpos in stream 1
 WARNING: Sample count ahead of granule (633600>633568) in stream 1
 WARNING: Sample count ahead of granule (681600>681568) in stream 1
 WARNING: Sample count ahead of granule (729600>729568) in stream 1
 WARNING: Sample count ahead of granule (777600>777568) in stream 1
 WARNING: Sample count ahead of granule (825600>825584) in stream 1
 WARNING: Sample count ahead of granule (873600>873584) in stream 1
 WARNING: Sample count ahead of granule (921600>921584) in stream 1
 WARNING: Sample count ahead of granule (969600>969584) in stream 1
 WARNING: Sample count ahead of granule (1017600>1017584) in stream 1
 Opus stream 1:
         Pre-skip: 312
         Playback gain: 0 dB
         Channels: 2
         Original sample rate: 48000Hz
         Packet duration:   20.0ms (max),   20.0ms (avg),   20.0ms (min)
         Page duration:   1020.0ms (max), 1000.7ms (avg),  780.0ms (min)
         Total data length: 387229 bytes (overhead: 0.887%)
         Playback length: 0m:30.005s
         Average bitrate: 103.2 kb/s, w/o overhead: 102.3 kb/s
 Logical stream 1 ended
 }}}
 The reason for the "Samples with negative granpos" warning despite the
 first sample not having a negative timestamp is that the ogg format
 doesn't explicitly signal the granule position for every packet, but only
 for every page (in which a packet is completed) and the ogg muxer uses a
 page duration of 1s by default. In order to fill this 1s, one needs
 48000-312 = 47688 samples from the input file and for this one needs the
 first 94 packets. The first sample of the last of these 94 packets should
 have be sample number 47616 (zero-based), but according to the above
 framehash it has the timestamp 47608. Consequently sample number 47687 has
 the timestamp 47679 and therefore the granule position of the first page
 is 47679+1+312 = 47992 (the +1 comes from the fact that the granule
 position indicate the position after decoding the whole content of the
 page) and that is exactly what is in the output file. That of course means
 that the output file is invalid.
 Because not every output packet has a granule position, not every
 gap/overlap in the samples that are fed to the libopus encoder end up
 having an influence on the output file. If the sum of the durations/number
 of samples just happens to conincide with the granule position delta, then
 everything's fine. This explains why
 If one decodes the just created opus file with the reference decoder it
 again ignores that a) ii) has the prerequisite of the 'end of stream' flag
 being set and discards several samples. This leads to audible distortions
 at around sample 633248 (= 633568 (from above opusinfo message) - 312
 (pre-skip) - 8 (the number of samples from the end of the first page that
 got skipped)).

 f) Here is a bit more about the Matroska timestamps:
 i) If one uses a timestamp-scale of 1000000 and no lacing the timestamps
 are fine despite the second dts packet having a timestamp of 11ms whereas
 the first dts packet has only a duration of 10 2/3 ms:
 {{{
 #format: frame checksums
 #version: 2
 #hash: CRC32
 #software: Lavf58.13.100
 #tb 0: 1/48000
 #media_type 0: audio
 #codec_id 0: pcm_s16le
 #sample_rate 0: 48000
 #channel_layout 0: 3
 #channel_layout_name 0: stereo
 #stream#, dts,        pts, duration,     size, hash
 0,          0,          0,      512,     2048, 52bbda48
 0,        512,        512,      512,     2048, 2b037e9f
 0,       1024,       1024,      512,     2048, f69e3985
 0,       1536,       1536,      512,     2048, 04f27523
 0,       2048,       2048,      512,     2048, 0c9b0963
 0,       2560,       2560,      512,     2048, de6e37eb
 0,       3072,       3072,      512,     2048, 2230f372
 0,       3584,       3584,      512,     2048, b4275a94
 0,       4096,       4096,      512,     2048, e2efc7d5
 0,       4608,       4608,      512,     2048, e6ff0c6f
 0,       5120,       5120,      512,     2048, 43d5c355
 0,       5632,       5632,      512,     2048, f689afdb
 0,       6144,       6144,      512,     2048, 7ce06f4f
 0,       6656,       6656,      512,     2048, d639e9c7
 0,       7168,       7168,      512,     2048, 87aee60f
 0,       7680,       7680,      512,     2048, 6e32d1e1
 0,       8192,       8192,      512,     2048, 99b53229
 0,       8704,       8704,      512,     2048, 46803053
 0,       9216,       9216,      512,     2048, 4e4143b5
 0,       9728,       9728,      512,     2048, 2116fa38
 ...
 }}}
 The result is the same whether the unlaced Matroska file has a default
 duration or not.
 ii) As has already been said, for files with audio but no video track
 mkvmerge uses a smaller TimestampScale (namely 20832 for 48kHz) by
 default. With lacing the timestamps are as follows:
 {{{
 #format: frame checksums
 #version: 2
 #hash: CRC32
 #software: Lavf58.13.100
 #tb 0: 1/48000
 #media_type 0: audio
 #codec_id 0: pcm_s16le
 #sample_rate 0: 48000
 #channel_layout 0: 3
 #channel_layout_name 0: stereo
 #stream#, dts,        pts, duration,     size, hash
 0,          0,          0,      512,     2048, 52bbda48
 0,        512,        512,      512,     2048, 2b037e9f
 0,       1024,       1024,      512,     2048, f69e3985
 0,       1536,       1536,      512,     2048, 04f27523
 0,       2048,       2048,      512,     2048, 0c9b0963
 0,       2560,       2560,      512,     2048, de6e37eb
 0,       3072,       3072,      512,     2048, 2230f372
 0,       3584,       3584,      512,     2048, b4275a94
 0,       4096,       4096,      512,     2048, e2efc7d5
 0,       4608,       4608,      512,     2048, e6ff0c6f
 0,       5120,       5120,      512,     2048, 43d5c355
 0,       5632,       5632,      512,     2048, f689afdb
 0,       6144,       6144,      512,     2048, 7ce06f4f
 0,       6656,       6656,      512,     2048, d639e9c7
 0,       7168,       7168,      512,     2048, 87aee60f
 ...

 0,      22528,      22528,      512,     2048, c2a2cc15
 0,      23040,      23040,      512,     2048, 97348c24
 0,      23551,      23551,      512,     2048, 8266b6bd
 0,      24063,      24063,      512,     2048, 9492736f
 0,      24576,      24576,      512,     2048, a0eb4084
 0,      25088,      25088,      512,     2048, 84f6ec09
 ...
 }}}
 So they are not perfect (there mustn't be any odd timestamps like 23551),
 but way better.
 Notice that also the first eight packets get different timestamps from the
 timestamps they had in e) with the bigger timestamp-scale. This is despite
 them being in the same lace and the lace both starting precisely at the
 same time (namely at absolute zero which coincides for every
 TimestampScale). This happens even when one trims the files to contain
 only eight packets (which are all in the same lace). So a lower
 TimecodeScale in this case leads to better results despite the file with
 the lower TimestampScale not containing any more information about the
 timestamps than the file with the default 1000000 TimecodeScale.
 iii) Using a small TimestampScale and no lacing leads to good timestamps
 (as expected).
 iv) It seems that also the DefaultDuration is involved: If one uses the
 file from e) (laced, TimestampScale 1000000) and deletes the
 DefaultDuration header element (MKVToolNix has a tool named mkvpropedit
 for that) one gets even worse timestamps (e.g. the 3384 should actually be
 3584):
 {{{
 #format: frame checksums
 #version: 2
 #hash: CRC32
 #software: Lavf58.13.100
 #tb 0: 1/48000
 #media_type 0: audio
 #codec_id 0: pcm_s16le
 #sample_rate 0: 48000
 #channel_layout 0: 3
 #channel_layout_name 0: stereo
 #stream#, dts,        pts, duration,     size, hash
 0,          0,          0,      512,     2048, 52bbda48
 0,        504,        504,      512,     2048, 2b037e9f
 0,        984,        984,      512,     2048, f69e3985
 0,       1464,       1464,      512,     2048, 04f27523
 0,       1944,       1944,      512,     2048, 0c9b0963
 0,       2424,       2424,      512,     2048, de6e37eb
 0,       2904,       2904,      512,     2048, 2230f372
 0,       3384,       3384,      512,     2048, b4275a94
 0,       4080,       4080,      512,     2048, e2efc7d5
 0,       4584,       4584,      512,     2048, e6ff0c6f
 0,       5064,       5064,      512,     2048, 43d5c355
 0,       5544,       5544,      512,     2048, f689afdb
 0,       6024,       6024,      512,     2048, 7ce06f4f
 0,       6504,       6504,      512,     2048, d639e9c7
 0,       6984,       6984,      512,     2048, 87aee60f
 0,       7464,       7464,      512,     2048, 6e32d1e1
 0,       8208,       8208,      512,     2048, 99b53229
 ...
 }}}
 And consequently one gets way more errors from opusinfo if one encodes the
 above:
 {{{
 Processing file "I:\Neuer Ordner (2)\test.laced.big.ts.no.defdur.opus"...

 New logical stream (#1, serial: 0bcf313d): type opus
 Encoded with Lavf58.13.100
 User comments section follows...
         BPS-eng=1508966
         DURATION-eng=00:00:30.006000000
         NUMBER_OF_FRAMES-eng=2813
         NUMBER_OF_BYTES-eng=5659756
         _STATISTICS_WRITING_APP-eng=mkvmerge v22.0.0 ('At The End Of The
 World') 64-bit
         _STATISTICS_WRITING_DATE_UTC-eng=2018-04-22 04:35:34
         _STATISTICS_TAGS-eng=BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
         encoder=Lavc58.19.100 libopus
 WARNING: Samples with negative granpos in stream 1
 WARNING: Sample count behind granule (96960<96992) in stream 1
 WARNING: Sample count behind granule (145920<145952) in stream 1
 WARNING: Sample count behind granule (194880<194920) in stream 1
 WARNING: Sample count ahead of granule (243840>243808) in stream 1
 WARNING: Sample count behind granule (291840<291872) in stream 1
 WARNING: Sample count behind granule (340800<340840) in stream 1
 WARNING: Sample count behind granule (389760<389800) in stream 1
 WARNING: Sample count behind granule (438720<438760) in stream 1
 WARNING: Sample count behind granule (486720<486760) in stream 1
 WARNING: Sample count behind granule (535680<535720) in stream 1
 WARNING: Sample count behind granule (584640<584680) in stream 1
 WARNING: Sample count ahead of granule (633600>633408) in stream 1
 WARNING: Sample count ahead of granule (681600>681472) in stream 1
 WARNING: Sample count behind granule (730560<730600) in stream 1
 WARNING: Sample count ahead of granule (779520>779328) in stream 1
 WARNING: Sample count ahead of granule (827520>827392) in stream 1
 WARNING: Sample count ahead of granule (875520>875456) in stream 1
 WARNING: Sample count behind granule (1118400<1118408) in stream 1
 WARNING: Sample count behind granule (1167360<1167368) in stream 1
 WARNING: Sample count ahead of granule (1216320>1216288) in stream 1
 WARNING: Sample count behind granule (1264320<1264328) in stream 1
 WARNING: Sample count behind granule (1313280<1313288) in stream 1
 WARNING: Sample count ahead of granule (1362240>1362064) in stream 1
 WARNING: Sample count ahead of granule (1410240>1410128) in stream 1
 Opus stream 1:
         Pre-skip: 312
         Playback gain: 0 dB
         Channels: 2
         Original sample rate: 48000Hz
         Packet duration:   20.0ms (max),   20.0ms (avg),   20.0ms (min)
         Page duration:   1020.0ms (max), 1000.7ms (avg),  640.0ms (min)
         Total data length: 387229 bytes (overhead: 0.887%)
         Playback length: 0m:30.006s
         Average bitrate: 103.2 kb/s, w/o overhead: 102.3 kb/s
 Logical stream 1 ended
 }}}

 PS: Yes, mkvmerge was also buggy. In fact, I think it still is and will
 soon open a bug report for it. For
 [https://gitlab.com/mbunkus/mkvtoolnix/issues/2100 example] up until
 version 15.0 it used lacing in BlockGroups with DiscardPadding (the result
 was that the information to which audio packet the DiscardPadding actually
 applies is lost (upon remuxing mkvmerge treated every packet in the block
 as if the DiscarPadding element applied to them). But I have never ever
 observed it creating bad output if the input file didn't have any issues.
 PPS: And I also have some good news: The actual packets (without the
 container stuff) of the file created by e) and f) iv) both completely
 coincide with what one gets when one direct encodes test.dts. The only
 thing that is truly lost is the end trimming.

--
Ticket URL: <https://trac.ffmpeg.org/ticket/4178#comment:18>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list