[FFmpeg-devel] [PATCH] mov.c: read fragment start dts from fragmented mp4

Mika Raento mikie at iki.fi
Sat Oct 11 06:25:23 CEST 2014


On 10 October 2014 23:08, Mika Raento <mikie at iki.fi> wrote:
> Firstly, thank you for the detailed explanation.
>
> Secondly, how should we proceed?
>
> I am not confident I'm able to implement that correctly, especially
> with no test coverage.
>
> My current implementation improves discontinuous fragmented mp4s
> significantly (from unusable to close-to-perfect) while slightly
> worsening the timestamps for non-discontinuous fragmented mp4s. I
> definitely need it for our streams, and I think it would help other
> people in the same situation. I am quite willing to spend time on
> this, but I fear that I just don't have enough known inputs and
> outputs to verify my implementation.
>
> Normally fragments are supposed to start on key frames, which should
> have pts close to dts, but there are no guarantees.
>
> Some alternatives:
>
> 1. I can leave my implementation behind a flag. That's not very
> friendly to others, but breaks no existing usage.
>
> 2. We can merge my code as-is, and hope somebody more knowledgeable
> can fix it up later.
>
> 3. I can try to implement the algorithm described.

This is the one I picked.

I'm submitting a version that produces identical timestamps as master
for Michael's test case and fixes my discontinuous ismvs' timestamps.

    Mika

>
> 4. Somebody helps me with either implementation or by providing test cases.
>
> Opinions?
>
>     Mika
>
>
> On 10 October 2014 20:11, Yusuke Nakamura <muken.the.vfrmaniac at gmail.com> wrote:
>> 2014-10-10 13:38 GMT+09:00 Mika Raento <mikie at iki.fi>:
>>
>>> On 9 October 2014 23:37, Yusuke Nakamura <muken.the.vfrmaniac at gmail.com>
>>> wrote:
>>> > 2014-10-10 4:49 GMT+09:00 Michael Niedermayer <michaelni at gmx.at>:
>>> >
>>> >> On Thu, Oct 09, 2014 at 09:44:43PM +0200, Michael Niedermayer wrote:
>>> >> > On Thu, Oct 09, 2014 at 06:57:59PM +0300, Mika Raento wrote:
>>> >> > > If present, an MFRA box and its TFRAs are read for fragment start
>>> >> times.
>>> >> > >
>>> >> > > Without this change, timestamps for discontinuous fragmented mp4 are
>>> >> > > wrong, and cause audio/video desync and are not usable for
>>> generating
>>> >> > > HLS.
>>> >> > > ---
>>> >> > >  libavformat/isom.h |  15 ++++++
>>> >> > >  libavformat/mov.c  | 140
>>> >> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> > >  2 files changed, 155 insertions(+)
>>> >> >
>>> >> > this seems to break some files
>>> >> >
>>> >> > for example a file generated with the following 2 commands:
>>> >> > ffmpeg -i matrixbench_mpeg2.mpg -t 10 in.mp4
>>> >> > l-smash/cli/remuxer -i in.mp4 --fragment 1 -o test.mp4
>>> >> >
>>> >> > ive not investigated why this doesnt work
>>> >>
>>> >> maybe above was unclear, so to clarify before someone is confused
>>> >> test.mp4 from above plays with ffplay before te patch but not really
>>> >> aferwards. The 2 commads are just to create such file
>>> >>
>>> >> [...]
>>> >>
>>> >> --
>>> >> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>>> >>
>>> >> Good people do not need laws to tell them to act responsibly, while bad
>>> >> people will find a way around the laws. -- Plato
>>> >>
>>> >> _______________________________________________
>>> >> ffmpeg-devel mailing list
>>> >> ffmpeg-devel at ffmpeg.org
>>> >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>> >>
>>> >>
>>> > The 'time' field in the tfra box is defined in presentation timeline, not
>>> > composition or decode timeline.
>>> > Therefore, generally, the value of 'time' can't be used for DTS directly
>>> as
>>> > long as following 14496-12.
>>> > Maybe some derivatives of ISO Base Media file format define differently,
>>> > but the spec of ISO Base Media file format defines 'time' as the
>>> > presentation time of the sync sample.
>>> > Presentation times are composition times after the application of any
>>> edit
>>> > list for the track.
>>> >
>>> > I have also some samples which use the 'time' as DTS of sync sample.
>>> > Historically, the term 'presentation time' was not defined clearly before
>>> > 14496-12:2012, this fact possibly may have brought about such
>>> inconsistency.
>>>
>>> Hm. So my changes aren't correct if there is an edit list? Because
>>> AFAICT without edit lists mov.c sets pkt->pts = pkt->dts.
>>>
>>
>> Wrong. PTS == DTS is nothing to do with edit list. Generally, CTS != DTS
>> occurs only when frame reordering exists.
>> Even if there is no edit list for a track, there is implicit edit for that
>> track, and in this case PTS == (CTS + alpha)*mvhd.timescale/mdhd.timescale,
>> where the constant alpha depends on the implementation.
>>
>>
>>>
>>> Would you mind explaining how edit lists and fragment times are
>>> supposed to work together?
>>>
>>>
>> The tfra box is designed as the player seeks and finds sync sample on
>> presentation timeline i.e. by PTS in units of mdhd.timescale.
>> PTS comes from CTS via edit list, and CTS comes from DTS. So, basically you
>> can't get DTS directly from the 'time' field in the tfra box.
>>
>> Let's say mvhd.timescale=600, mdhd.timescale=24000 and the edit list
>> contains two edits (edit[0] and edit[1]),
>>   edit[0] = {segment_duration=600, media_time=-1, media_rate=1}; // empty
>> edit
>>   edit[1] = {segment_duration=1200, media_time=2002, media_rate=1};
>> and the track fragment run in a track fragment which you get from an entry
>> in the tfra box, where 'time' in that entry is equal to 48000 is as follows.
>>   trun.sample[0].sample_is_non_sync_sample = 1
>>   trun.sample[0].sample_duration=1001
>>   trun.sample[0].sample_composition_time_offset=1001
>>   trun.sample[1].sample_is_non_sync_sample = 0
>>   trun.sample[1].sample_duration=1001
>>   trun.sample[1].sample_composition_time_offset=1001
>> Then, time/mdhd.timescale*mvhd.timescale=1200, that is, the PTS of the sync
>> sample is equal to 1200 in mvhd.timescale.
>> And, the first edit is an empty edit, so the presentation of actual media
>> starts with 1200 - 600 = 600 in mvhd.timescale.
>> The CTS of the second sample in the trun.sample[1] is equal to 1001 + X,
>> where the X is the sum of the duration of all preceding samples.
>> The presentation starts with CTS=2002 because of media_time of the second
>> edit, so the PTS of the sync sample corresponds to X - 1001.
>> From this, X is equal to 49001, and the DTS of trun.sample[0] is equal to X
>> - trun.sample[0].sample_duration = 49001 - 1001 = 48000.
>>
>> |<--edit[0]-->|<---------edit[1]--------->|
>> |-------------|-------------|-------------|---->presentation timeline
>> 0             D             T'
>>               |-------------|------------------>composition timeline
>>           media_time        T
>>         |-----|-------|-----|------------------>decode timeline
>>         0 media_time  X     T
>>                       |<--->|
>>                      ct_offset
>>
>> D = edit[0].segment_duration = 600
>> T' = time/mdhd.timescale*mvhd.timescale = 1200
>> media_time = edit[1].media_time = 1001
>> ct_offset = trun.sample[1].sample_composition_time_offset = 1001
>> T=ct_offset + X
>> T-media_time = (T'-D)*mdhd.timescale/mvhd.timescale
>>
>>
>>>     Mika
>>>
>>> > _______________________________________________
>>> > ffmpeg-devel mailing list
>>> > ffmpeg-devel at ffmpeg.org
>>> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>> _______________________________________________
>>> ffmpeg-devel mailing list
>>> ffmpeg-devel at ffmpeg.org
>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


More information about the ffmpeg-devel mailing list