[FFmpeg-devel] [RFC/PATCH] Pass PRIVATE_STREAM_2 MPEG-PS packets to caller

Mon Feb 25 17:23:18 CET 2013

On 25/02/13 04:36, Michael Niedermayer wrote:
> On Mon, Feb 25, 2013 at 12:47:47AM +0100, Richard wrote:
>>
>> Specifically, I started with the problem of playing audio DVDs (in
>> the sense of a normal video DVD with primarily only audio).  These
>> have a single video frame followed by an arbitrary number of audio
>> frames.  At the moment, the demultiplexing code in Myth ends up
>> reading and buffering too much audio data because it is normally
>> slowed down by the availability of video buffers.  Now it could be
>> argued that it should be making more use of the timecodes in the
>> audio and video packets but that is also difficult when
>> discontinuities are to be expected and there's no way to ensure the
>> demultiplexer/decoder is informed about the discontinuity ahead of
>> time (ideally 'now' but certainly not afterwards).
>>
>> For my in-depth ideas with respect to single images with audio, I
>> can point you here:
>
>> http://irc.mythtv.org/ircLog/channel/4/2013-01-24:15:41
>
> IRC> peper03: Playing with VobEdit and looking at the PCI and DSI
> IRC> structures here http://dvdnav.mplayerhq.hu/dvdinfo/ shows that we
> IRC> *can* work out how long to show a   still frame for.
>
> so AVPacket.duration could be set according to this information?
> also i remember something about sequence end codes and still pictures
> they could be used too if they are there

The duration would be the duration of the video frame, not of the data 
packet.  The contents of the data packet would allow you to calculate 
the duration of the video frame and even then, you couldn't calculate 
the total duration.  Only the duration until the next data packet.
The sequence end code allows you to determine *which* frame is the last 
frame.  That can also be done using the 'vobu_se_e_ptm' field in the PCI 
packet - whichever works best in any given architecture.

Does it make sense to give a pure data packet a duration?  I suppose you 
could use the start and end fields of the PCI packet to indicate the 
time frame the data applies to.  In that case, it would be necessary to 
'merge' the two packets together as the DSI packet has no time 
information at all (apart from SCR).

>>> for example random buffering + no timestamps just feels wrong also
>>> this feels a bit like a hack, to output a private stream raw like
>>> that. we dont do that for audio or video either, there you get clean
>>> packets one for each frame with timestamps, a codec id and various
>>> other things
>>
>> The problem is that the contents of packets with startcode '0x1bf'
>> is not uniquely defined.  In the context of DVDs, it is, but given
>> any MPEG program stream, it isn't.  That makes it difficult to
>> implement a parser. The PCI structure indicates the start and end
>> PTS values for the following VOBU, so the start PTS could be used,
>> but the DSI structure only references the system clock reference.
>> It might be possible to write a parser to decode the contents based
>> on 'if this byte is X and that byte is Y and those bytes are all
>> zero, then we've almost certainly got a PCI structure' but I'm not
>> convinced that is that much cleaner.  The contents are simply
>> undefined so there'll always be a chance of false-positives and
>> false-negatives.
>
> all the probing code that detects formats & codecs checks if a byte
> is X and another is Y and so on. Whats your point here?

Sorry, my point was that there's no 'magic number' to definitely 
identify the contents.  All you could do is say 'if the length of the 
packet is 980 and the first byte is zero then it's *probably* a PCI 
packet'.  There are probably other 'sanity checks' that could be 
performed to increase the reliability but it's not as clean as having 
some sort of defined identifier.  If that's good enough for you, I don't 
have a problem with it.

> we also return ac3 frames instead of private stream data ...

Yes, that thought came to me as well after I'd written my last reply. 
The contents of private stream 1 packets are also undefined.  The only 
difference is that private stream 1 packets will also contain the usual 
header information.  So presumably they are determined to be AC3 packets 
on the basis of sanity checks, or is there some form of magic number?

So, working on the assumption that the contents can be determined to a 
sufficiently reliably degree do you prefer the following:

1) Define codec ID 'AV_CODEC_DVD_NAV' instead of 'AV_CODEC_PRIVATE_STREAM_2'
2) Implement a parser to combine the two packets (PCI and DSI) found on DVDs
3) Set 'pts' to the 'vobu_s_ptm' field in the PCI packet
4) Set 'dts' to AV_NOPTS_VALUE
5) Potentially set 'duration' to (vobu_e_ptm - vobu_s_ptm)

?

As I said, I don't know whether it makes sense to set the duration field 
for a pure data packet but I don't have a problem to do it if that's 
what you'd prefer.

> IMO
> all things for which proper fields exist like pts/dts/duration/codec_id
> and so on should be set correctly before some raw packets of any
> stream are exported just to get to their correct values.

That's fine by me.  From my point of view, it's just a question of what 
to do with packets that are defined as 'user-defined'.  If you want to 
handle them 100% correctly, you can't make any assumptions about their 
contents as anyone could use them for any purpose.  Almost any attempt 
to determine their contents without context *could* fail.  All you can 
do is pass them on unprocessed to allow the calling application (which 
has the context required) can decode them.

On the other hand, you can take the approach that 99% of these packets 
ever encountered will contain data in a certain format (in this case for 
DVDs).  By performing a few sanity checks, you can increase your 
confidence even more so that the likelihood of misinterpreting the data 
is almost zero.

The first option is purer, as you are following the specifications.  The 
second is potentially more useful albeit with the proviso that you can't 
give a 100% guarantee that no packets will be misinterpreted.

A third option is to allow the calling application to provide the 
context required.  I don't know whether this is already possible and I'm 
not putting it forward as a suggested change, but it would be an 
alternative.

Personally, if you prefer the second option, I don't have a problem to 
implement it that way.  I assume that there aren't many MPEG streams out 
there that use these private streams so that a couple of checks should 
give sufficient confidence to interpret the contents successfully.

If you are ok with my suggestions above, I'll create a new patch to 
parse and merge the packets, setting the fields as required.

Richard.