[FFmpeg-devel] [PATCH] Matroska demuxer adds WebVTT support

Thu Aug 8 12:28:17 CEST 2013

On Thu, Aug 01, 2013 at 01:51:43PM +0200, Nicolas George wrote:
> Le quartidi 14 thermidor, an CCXXI, Clement Boesch a écrit :
> > Actually, I'm not so sure about having SSA deprecated (aside from mkv
> > format), but that can be discussed (in another thread please).
> 
> Please share the reasons indeed.
> 
> > Anyway, about the whole current thread, I'm sorry I didn't have time to
> > read every single post, but I'd like to have a few words before random
> > development is done, like changing the current (de)muxing of WebVTT.
> 
> Of course. If you had not replied today, I would have insisted on exactly
> that.
> 
> > Summary of the current situation: WebVTT "codec" is defined by a payload
> > (the text with its markup) and two extra information. WebVTT and WebM
> > formats mux them differently:
> > 
> >   - in WebVTT a cue looks like this: [<chapter>\n]<timestamp>[<settings>]\n<text>\n\n
> >   - in WebM   a cue looks like this: <chapter>\n<settings>\n<text>
> > 
> > Please correct me if that's incorrect.
> 
> I believe this is technically correct, but by using the "cue" vocable, you
> forget a very important difference:
> 
> - in WebVTT, a cue is part of a text file that is being parsed in full;
> 
> - in WebM, a cue is a packetized in a container format with generic
>   structures and no WebVTT-specific code.
> 
> > Now the problem in my opinion is that WebM uses a full textual way of
> > muxing the 3 informations: requiring some strchr or similar in a binary
> > parser is a bit insane (and dangerous? what about a non null terminated
> > payload?). IMO it would have been much more wise to mux it with \0
> > separators, but whatever.
> 
> I believe you did not think this argument through: memchr(data, '\n', size)
> works just as well as memchr(data, 0, size), and you can not use any str*
> function on untrusted data anyway (unless you rely on padding for
> 0-termination, but I consider that bad practice).
> 
> Note that I disagree with the argument you give, but I fully agree with the
> conclusion itself: \n is a very bad choice for a delimiter in Matroska
> packets. Because \n can also appear in the payload of one of the fields.
> That will make extensibility much harder.
> 
> >			    The point is, if someone decides to mux WebVTT
> > in another format, he might come with a different way and more relevant
> > way of muxing it.
> 
> We will deal with that if that ever happens.
> 
> > TL;DR: the <chapter>, <settings> and <text> are so weirdla (badly?) muxed
> > in *both* WebVTT format and WebM that it, in my opinion, makes sense to
> > separate them at AVPacket level like it is now (payload for <text> and
> > side data for the two other extra info), and let every muxer sanely mux it
> > using the API interface.
> 
> That means adding specific code for the codec in all muxers: you need to
> have a _very_ good reason to do that, having a "weird" packet format is not
> enough.
> 
> Also, note that ASS packets have exactly the same issues, and you agreed
> that it was better to eliminate the special cases from the Matroska code.
> 

If you're suggesting to use the WebM representation as the standard, we
have to deal with:

 - retro compat: changing the format of the packets for WebVTT format
   create version clashes between lavc & lavf (+ deprecation of the side
   data types)
 - move the complexity into lavf for webvtt muxing and demuxing
 - create more hacks if webvtt is muxed differently in another format
   (will you oppose to a sane muxing of webvtt in NUT?)

OTOH, adding a special case only in WebM demuxer where a bad muxing
decision was made doesn't look broken. That way of muxing into WebM is
merely the result of two successive bad decisions, it doesn't look correct
to rely on its layout (and relying on the original WebVTT layout with
timestamps will lead to the ASS insanity you know: writing timestamps into
demuxer etc).

Also note that the hack I wanted to remove with ASS/SSA was the insane
string *constructions* of SSA packets within the matroska demuxer, which
is not the case here: the text is extracted, and extra data put into side
data.

Now if you insist on comparing with ASS/SSA, it might be a better solution
for that case to abstract the differences between ASS and SSA into side
data (like the ReadOrder), but the problem is that ASS/SSA is really on a
complete different level from a sanity PoV, so it's really hard to compare
with that. And the main issue with ASS/SSA is mainly at the decoding
level now, the different codecs might not be that much a problem. Feel
free to propose some changes, but I'm a bit tired of jazzing with
subtitles issues recently...

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130808/de2d18e9/attachment.asc>