[FFmpeg-devel] [PATCH 05/10] avformat/matroskadec: Remove non-incremental parsing of clusters

Michael Niedermayer michael at niedermayer.cc
Tue Mar 12 19:56:28 EET 2019


On Tue, Mar 12, 2019 at 05:05:00AM +0000, Andreas Rheinhardt wrote:
> Michael Niedermayer:
> > On Sun, Mar 10, 2019 at 11:03:00PM +0000, Andreas Rheinhardt wrote:
> >> Michael Niedermayer:
> >>> On Fri, Mar 08, 2019 at 10:25:59AM +0100, Andreas Rheinhardt wrote:
> >>>> When the new incremental parser was introduced, the old parser was
> >>>> kept, because the new parser was unable to handle the way SSA packets
> >>>> are put into Matroska. But since 2014 (since
> >>>> c7d8dbad14ed5fa3c217a4fc1790021d6c0b6416) this is no longer needed, so
> >>>> that the old parser can be completely removed.
> >>>>
> >>>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt at googlemail.com>
> >>>> ---
> >>>>  libavformat/matroskadec.c | 72 ++++++---------------------------------
> >>>>  1 file changed, 10 insertions(+), 62 deletions(-)
> >>>
> >>> This affects seeking:
> >>>
> >>> libavformat/tests/seek issues/388/Matrix.Reloaded.Trailer-640x346-XviD-1.0beta2-HE_AAC_subtitled.mkv -duration 400 >oldseek
> >>>
> >>> The file appears to be available here:
> >>> https://samples.ffmpeg.org/Matroska/matrix/Matrix.Reloaded.Trailer-640x346-XviD-1.0beta2-HE_AAC_subtitled.mkv
> >>>
> >>> --- oldseek	2019-03-08 23:08:21.380042329 +0100
> >>> +++ newseek	2019-03-08 23:08:02.048041745 +0100
> >>> @@ -8,7 +8,7 @@
> >>>  ret: 0         st:13 flags:1 dts: 86.750000 pts: 86.750000 pos:     -1 size: 50436
> >>>  ret:-1         st: 1 flags:0  ts: 250.577000
> >>>  ret: 0         st: 1 flags:1  ts: 13.471000
> >>> -ret: 0         st:13 flags:1 dts: 1.776000 pts: 1.776000 pos:     -1 size: 50436
> >>> +ret: 0         st:13 flags:1 dts: 0.000000 pts: 0.000000 pos:     -1 size: 50436
> >>>  ret:-1         st: 2 flags:0  ts: 176.365000
> >>>  ret: 0         st: 2 flags:1  ts: 339.259000
> >>>  ret: 0         st:13 flags:1 dts: 145.080000 pts: 145.080000 pos:     -1 size: 50436
> >>>
> >> This is not unexpected. The reason is that the old parser always
> >> parses a whole cluster at once while the new parser parses clusters
> >> incrementally, i.e. one block (I use block as a general term for
> >> SimpleBlock or BlockGroup) at a time. The goal of incremental parsing
> >> was reduced latency (and with my patch #6 one also achieves a
> >> reduction of memory used and an increase in performance as well).
> >>
> >> With the old parser, the very first cluster gets parsed during
> >> avformat_find_stream_info() and index entries were created for all the
> >> keyframes contained therein (the cues only contain entries for the
> >> main video track, so this only affects the other tracks). The 1.776
> >> pts corresponds to the block with the highest timestamp for track #1
> >> (or track #2 in Matroska's counting) in the first cluster (with a
> >> timestamp of 1776ms).
> >>
> >> With the incremental parser, only one block of the audio track gets
> >> parsed during avformat_find_stream_info() (it has a timestamp of 0)
> >> and so only one index entry gets created for it.
> >>
> >> So when the seek based upon the audio track is performed, the used
> >> index point has a timestamp of either 0ms or 1776ms. Then the
> >> current_dts is updated based upon this timestamp.
> >>
> >> Now this file has an attached picture which gets translated into its
> >> own track and upon every seek this picture will be put into the
> >> raw_packet_buffer from which it will be read by ff_read_packet as the
> >> first packet after each seek. Given that this packet doesn't have
> >> timestamps, the timestamp will be set equal to current_dts (in
> >> compute_pkt_fields()). Hence the discrepancy.
> >>
> >> Btw: Because of things like this, a fate test had to be changed during
> >> the initial introduction of incremental parsing (in
> >> 8336eb6f85e4b94b9c198b16bd0ac4178f4dba86).
> > 
> > Sounds like undesirable behavior if not a bug
> > 
> > The seek request is to 13.471000 and the result should not depend on
> > what has been parsed or not prior to the seek request.
> > also if a packet can be produced for 1.776000 which adequatly fullfills
> > the seek reuest that is better than a more distant one as that would
> > increase latency experienced by the user
> > 
> If all one cares about is that the returned packet is near to the
> desired timestamp, then matroska_read_seek succeeds in two cases:
> a) The currently available index entries for the desired track are so
> fine-grained that one can use this index to seek accurately. This
> includes the standard case that one seeks according to a video track
> for which the file provides cues (for the keyframes).
> b) The desired timestamp is so big that it is beyond the last index
> entry or the returned index entry is the last index entry. In this
> case the file is read from the last index entry onward, so that a very
> precise timestamp can be found.
> 
> If one only seeks wrt the same track and if said track either has a
> good index or no index, then this works well: It's clear that it works
> in case there is a good index and if there is no index at all (because
> in this case the index that is being built up in the background is
> good for a whole initial portion of the file and said portion won't
> have "holes").
> 
> But if one changes the track wrt to one seeks, problems can arise.
> Think of the case where one first seeks via the cues according to the
> video track to positions t_0 and t_1 and reads a few blocks at both
> times. This includes the creation of index entries for other tracks
> (at least for continuous tracks like audio tracks). If one now seeks
> wrt to such an audio track to time t' with t_0<t'<t_1, one might end
> up in a place pretty far away from t'. This happens for both parsing
> methods.
> 
> There is a case where the incremental parser fares better: If the
> clusters referenced in the cues all begin with video keyframes
> (standard mkvmerge behaviour, so pretty normal) and one only wants to
> read a single frame from t_0 and t_1, then these frames will be video
> keyframes and therefore no index entries for the other tracks will
> have been created, so that if one seeks with respect to a different
> track than the video track, the whole file will be read until one
> reaches the desired timestamp. I therefore view the fact that
> matroska_read_seek's result depends upon what has already been parsed
> as a general bug in matroska_read_seek, not as a drawback of
> incremental parsing compared to classical parsing.

thanks for the detailed analysis, and yes i agree this is a bug in
existing code



> 
> Here are the results for what I have just described. Old parsing:
> ret: 0         st: 1 flags:1 dts: 0.000000 pts: 0.000000 pos:   6653
> size:  1792
> ret: 0         st:-1 flags:0  ts:-1.000000
> ret: 0         st: 0 flags:1 dts: NOPTS    pts: 0.016000 pos:  20998
> size:134602
> ret: 0         st:-1 flags:1  ts: 161.894167
> ret: 0         st: 0 flags:1 dts: NOPTS    pts: 161.696000
> pos:212409707 size:128647
> ret: 0         st: 0 flags:0  ts: 324.788000
> ret: 0         st: 0 flags:1 dts: NOPTS    pts: 324.936000
> pos:356642724 size:195297
> ret: 0         st: 0 flags:1  ts: 87.683000
> ret: 0         st: 0 flags:1 dts: NOPTS    pts: 87.456000
> pos:122474581 size:251000
> ret: 0         st: 1 flags:0  ts: 250.577000
> ret: 0         st: 1 flags:1 dts: 324.960000 pts: 324.960000
> pos:356873874 size:  1792
> ret: 0         st: 1 flags:1  ts: 13.471000
> ret: 0         st: 1 flags:1 dts: 0.512000 pts: 0.512000 pos: 450228
> size:  1792
> 
> Incremental parsing:
> ret: 0         st: 1 flags:1 dts: 0.000000 pts: 0.000000 pos:   6653
> size:  1792
> ret: 0         st:-1 flags:0  ts:-1.000000
> ret: 0         st: 0 flags:1 dts: NOPTS    pts: 0.016000 pos:  20998
> size:134602
> ret: 0         st:-1 flags:1  ts: 161.894167
> ret: 0         st: 0 flags:1 dts: NOPTS    pts: 161.696000
> pos:212409707 size:128647
> ret: 0         st: 0 flags:0  ts: 324.788000
> ret: 0         st: 0 flags:1 dts: NOPTS    pts: 324.936000
> pos:356642724 size:195297
> ret: 0         st: 0 flags:1  ts: 87.683000
> ret: 0         st: 0 flags:1 dts: NOPTS    pts: 87.456000
> pos:122474581 size:251000
> ret: 0         st: 1 flags:0  ts: 250.577000
> ret: 0         st: 1 flags:1 dts: 250.592000 pts: 250.592000
> pos:296449118 size:  1792
> ret: 0         st: 1 flags:1  ts: 13.471000
> ret: 0         st: 2 flags:1 dts: 13.216000 pts: 13.216000
> pos:18014477 size:    20
> 
> 
> 
> (I see two main avenues to improve matroska_read_seek:
> 1. Use the index of other tracks if the best index entry for the
> desired track is too far away. (Problems: What about files with wrong
> interleavement? What is too far away? If the targeted track has gaps
> in it, then one might end up parsing a lot unnecessarily.)
> 2. If the found index is too far away, one could simply parse the file
> a bit more.
> 
> 2. would have the disadvantage that even video tracks with proper cues
> (i.e. an entry for every keyframe), but long GOPs would be parsed.

> This could be mitigated by restricting it to tracks for which no cue
> entries were present (i.e. when cue entries are present for a track,
> then it is presumed that there is a cue entry for every keyframe of
> said track) and which can be presumed to have no holes in it (i.e. no
> subtitle tracks).)

yes, this sounds reasonable to me but others may have other oppinions
and there may of course be unexpected problems with this or any other
approuch
 
Thnaks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Whats the most studid thing your enemy could do ? Blow himself up
Whats the most studid thing you could do ? Give up your rights and
freedom because your enemy blew himself up.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190312/f397c737/attachment.sig>


More information about the ffmpeg-devel mailing list