[FFmpeg-devel] [PATCH 00/37 v3] Matroska demuxer patches

Andreas Rheinhardt andreas.rheinhardt at gmail.com
Fri May 17 01:29:44 EEST 2019


Hello,

here is the patchset for the Matroska demuxer that I already announced
last week [1]. It is an improvement upon an earlier patchset from March [2]
which has been kindly reviewed by Steve Lhomme. Just as the earlier 
attempt, it is intended to get rid of a bogus error message introduced 
in 9326117b.

1. Here is a recap of why this bogus error message exists:
a) The ebml parser contained in the Matroska muxer basically has two
modes of operation: It can parse simple elements and it can parse whole
master elements all at once. The latter is of course implemented via the
former. Also one can designate certain element IDs as "EBML_STOP" type
so that when such an element is encountered, parsing immediately stops
without parsing the length and performing the checks (regarding whether
an element is truly contained in the master element it is supposed to be
contained in --these checks were introduced in 9326117b).
b) Normal parsing (via matroska_parse_cluster_incremental) uses a syntax
list for parsing that contains elements at different levels in the
hierarchy: The elements that can be found in a cluster and (certain)
level 1 elements (i.e. the level of the cluster); in particular,
it contains a cluster as EBML_STOP type element. When a cluster is
encountered and ebml_parse returns, the current level is ended (if there
already was an open cluster) and the new cluster is entered (via a
designated syntax list containing a cluster as master element) and
parsed until a SimpleBlock or a BlockGroup is encountered (these two are
of course EBML_STOP elements in the used syntax). Then the first syntax
list (the one containing both the cluster as well as the elements
permitted in a cluster as entries) is used again for parsing of said
blocks and all the blocks that follow until one encounters another
cluster or an error occurs.
c) The problem with this approach is that clusters are not closed
automatically when they are finished, but only when another cluster is
encountered. This works (i.e. yields no error message) when a cluster
follows another cluster (because no checks are performed for EBML_STOP
type elements), but if e.g. the cues (the index) is located after a
cluster (as happens with most Matroska files in existence), then parsing
the cues will yield an error: After all, the preceding cluster ended
right before the cues began, but the level hasn't been ended, so the
length check will fail if the preceding cluster wasn't an
unknown-length cluster. (In the latter case, the cues are considered
part of the preceding cluster by the current code.) That's the reason
behind this (harmless) error.

2. a) When ebml_parse parses a Master element by itself, it also takes
care of ending the level (i.e. the level automatically entered during
parsing of the master element) itself; but there is no check at the end
of ebml_parse whether a level has just ended right after the last
element; if there were, then the above problem wouldn't exist.
b) So my approach is to add such a check and make ebml_parse end the
levels all by itself; the Matroska functions shouldn't have to end
levels manually (as happens currently when a new cluster is encountered),
but the EBML functions would do so given that the levels are an EBML
concept after all.
c) But all is slightly complicated by the fact that there also can be
unknown-length master elements. These end when an element not allowed in
them but on a higher level is encountered. So in this situation it is
impossible to perform the check whether a level has ended after
consuming an element; instead, said check needs to be performed directly
after the id of the next element is read, without reading any more data
if it turns out that said element belongs to a higher level. And that's
the way it has been implemented.

3. I have also added improvements to various checks and other things.
You can find them explained in the various commit messages. Just two things:
a) I am unsure how to check whether a read error or attempted reading
beyond EOF should be checked: Via avio_feof(pb) or via pb->eof_reached?
The former tries to read again (refill the buffer) when pb->eof_reached
was already set; does this mean that if one wants to know whether a read
was not successfull one should check pb->eof_reached, but when one is
more interested into whether one can perform future reads, one should
use avio_feof (after all, in case of livestreaming, new data might have
arrived after the earlier failed read)? That's at least the interpretation
that I had when I wrote the patchset.
b) ffio_limit is used to check for whether there is enough data left to
skip if an element intended to be skipped is encountered. There are
three issues with this function:
i) It takes an int, although it should be an int64_t for our needs.
ii) It emits its own error message (in case it fails) which is not
appropriate for our needs (it claims to be truncating a packet, but in
our case, no packet is truncated; instead it is treated as an indication
that an error happened and a resync is performed).
iii) For some reason it allows the remaining size to be one less than
the size given as argument.
Not understanding iii) is what made me hesitate to factor the needed
part of ffio_limit out of it and using the new function in the Matroska
demuxer. Would be nice if someone could explain the rationale behind
this. This originated (without any explanation) in commit 559ae20d and
got carried over from there since then.

- Andreas

[1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243769.html
[2]: https://ffmpeg.org/pipermail/ffmpeg-devel/2019-March/241694.html


Andreas Rheinhardt (37):
  avformat/matroskadec: Remove unused variables
  avformat/matroskadec: Correct outdated error message
  avformat/matroskadec: Compactify structure
  avformat/matroskadec: Don't zero unnecessarily
  avformat/matroskadec: Get rid of cluster size field assumption
  avformat/matroskadec: Use generic size check for signed integers
  avformat/matroskadec: Set offset of first cluster
  avformat/matroskadec: Don't copy attached pictures
  avformat/matroskadec: Remove redundant initialization
  avformat/matroskadec: Properly check return values
  avformat/matroskadec: Improve read error/EOF checks I
  avformat/matroskadec: Improve read error/EOF checks II
  avformat/matroskadec: Improve error/EOF checks III
  avformat/matroskadec: Remove non-incremental parsing of clusters
  avformat/matroskadec: Don't keep old blocks
  avformat/matroskadec: Treat SimpleBlock as EBML_BIN
  avformat/matroskadec: Don't abort resyncing upon seek failure
  avformat/matroskadec: Add function to reset status
  avformat/matroskadec: Use proper levels after discontínuity
  avformat/matroskadec: Refactor some functions
  avformat/matroskadec: Introduce a "last known good" position
  avformat/matroskadec: Link to parents in syntax tables
  avformat/matroskadec: Redo level handling
  avformat/matroskadec: Make cluster parsing level compatible
  avformat/matroskadec: Don't reset cluster position
  avformat/matroskadec: Combine arrays
  avformat/matroskadec: Redo EOF handling
  avformat/matroskadec: Reuse positions
  avformat/matroskadec: Typos, nits and cosmetics
  avformat/matroskadec: Don't skip too much when unseekable
  avformat/matroskadec: Improve invalid length error handling
  avformat/matroskadec: Accept more unknown-length elements
  avformat/matroskadec: Fix probing of unknown-length headers
  avformat/matroskadec: Accept more unknown-length elements II
  avformat/matroskadec: Reindent after previous commit
  avformat/matroskadec: Use file offsets for level 1 elements
  avformat/matroskadec: Improve check for level 1 duplicates

 libavformat/matroskadec.c | 1006 +++++++++++++++++++++----------------
 1 file changed, 582 insertions(+), 424 deletions(-)

-- 
2.21.0



More information about the ffmpeg-devel mailing list