[FFmpeg-trac] #6915(avformat:open): DASH audio segments duration doesn't match exactly with video segments duration.

FFmpeg trac at avcodec.org
Thu Dec 21 18:17:54 EET 2017


#6915: DASH audio segments duration doesn't match exactly with video segments
duration.
------------------------------------+-------------------------------------
             Reporter:  beloko      |                    Owner:  stevenliu
                 Type:  defect      |                   Status:  open
             Priority:  normal      |                Component:  avformat
              Version:  git-master  |               Resolution:
             Keywords:              |               Blocked By:
             Blocking:              |  Reproduced by developer:  0
Analyzed by developer:  0           |
------------------------------------+-------------------------------------

Comment (by stevenliu):

 Replying to [comment:24 j_karthic]:
 > Replying to [comment:22 stevenliu]:
 > > Replying to [comment:21 j_karthic]:
 > > > Replying to [comment:19 stevenliu]:
 > > > > Replying to [comment:18 stevenliu]:
 > > > > > Replying to [comment:17 j_karthic]:
 > > > > > > Replying to [comment:16 stevenliu]:
 > > > > > > > Replying to [comment:15 j_karthic]:
 > > > > > > > > Replying to [comment:14 stevenliu]:
 > > > > > > > > > Replying to [comment:13 j_karthic]:
 > > > > > > > > > > Replying to [comment:12 stevenliu]:
 > > > > > > > > > > > Replying to [comment:11 j_karthic]:
 > > > > > > > > > > > > Replying to [comment:10 stevenliu]:
 > > > > > > > > > > > > > Replying to [comment:9 j_karthic]:
 > > > > > > > > > > > > > > Replying to [comment:8 stevenliu]:
 > > > > > > > > > > > > > > > Replying to [comment:7 j_karthic]:
 > > > > > > > > > > > > > > > > Well, if we round the target duration to
 '''nearest integer''' as per the spec, part of the problem is resolved.
 Right now we are '''ceiling to the upper integer''' which is not according
 to HLS specifications.
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > > {{{
 > > > > > > > > > > > > > > > > 4.3.3.1.  EXT-X-TARGETDURATION
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > >    The EXT-X-TARGETDURATION tag specifies
 the maximum Media Segment
 > > > > > > > > > > > > > > > >    duration.  The EXTINF duration of each
 Media Segment in the Playlist
 > > > > > > > > > > > > > > > >    file, when rounded to the nearest
 integer, MUST be less than or equal
 > > > > > > > > > > > > > > > >    to the target duration; longer segments
 can trigger playback stalls
 > > > > > > > > > > > > > > > >    or other errors.  It applies to the
 entire Playlist file.  Its format
 > > > > > > > > > > > > > > > >    is:
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > >    #EXT-X-TARGETDURATION:<s>
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > >    where s is a decimal-integer indicating
 the target duration in
 > > > > > > > > > > > > > > > >    seconds.  The EXT-X-TARGETDURATION tag
 is REQUIRED.
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > > }}}
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > > So as per the spec, the
 EXT-X-TARGETDURATION should rounded to 4 instead of 5.
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > > But when I submitted a patch for fixing
 such an issue in hlsenc, it was rejected by Steven. See thread
 http://ffmpeg.org/pipermail/ffmpeg-devel/2017-September/215630.html more
 details. During that time, I was not able to provide a concrete example
 with mediastreamvalidator like @beloko has done now. But I thought the
 spec was very clear about it without any room for confusion.
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > > @stevenliu
 > > > > > > > > > > > > > > > > If you have a change of mind after seeing
 these results, please let me know. Maybe I will send a new patch which
 fixes the target duration in dashenc, and we can take forward our
 discussions there.
 > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > The spec said "EXT-X-TARGETDURATION must >
 EXTINF", we have talk and you have sent twice patch about it. you can see
 the spec said "MUST be less than or equal to the target duration.",
 attention the MUST, MUST, MUST, we should don't care the tool, we should
 care the spec. If the tool different with the spec, that maybe apple's
 mistake, you should push apple.
 > > > > > > > > > > > > > > I think you missing the important line
 "'''when rounded to the nearest integer'''" in the spec. I am also only
 talking about the spec. This is just a pure argument, about interpreting
 that English line. This is where I feel some other people in the ffmpeg
 community should also pitch in moderate a discussion, rather than leaving
 it as "'''Maintainer is always right'''".
 > > > > > > > > > > > > > '''Maintainer''' not always right, but can you
 give me the mean '''MUST be less than or equal  to the target duration'''
 how should i understand it?
 > > > > > > > > > > > > "'''when rounded to the nearest integer, MUST be
 less than or equal to the target duration'''" is in one sentence. That
 should be read in conjunction with the phrase "when rounded to the nearest
 integer". It means after rounding duration to the nearest integer, it MUST
 be less than or equal to the target duration.
 > > > > > > > > > > > how should i understand the '''target duration'''?
 > > > > > > > > > > > do you mean, when the '''EXTINF''' is
 '''1.080000''', the '''EXT-X-TARGETDURATION''' should equal to '''1''' ?
 > > > > > > > > > > As per the spec, Yes.
 > > > > > > > > > I coding for hls from VERSION 1, from old VERSION,
 https://tools.ietf.org/html/draft-pantos-http-live-streaming-03
 > > > > > > > > > about the EXT-X-TARGETDURATION said:
 > > > > > > > > >
 > > > > > > > > >
 > > > > > > > > > {{{
 > > > > > > > > > 3.2.1.  EXT-X-TARGETDURATION
 > > > > > > > > >
 > > > > > > > > >    The EXT-X-TARGETDURATION tag specifies the maximum
 media file
 > > > > > > > > >    duration.  The EXTINF duration of each media file in
 the Playlist
 > > > > > > > > >    file MUST be less than or equal to the target
 duration.  This tag
 > > > > > > > > >    MUST appear once in the Playlist file.  Its format
 is:
 > > > > > > > > >
 > > > > > > > > >    #EXT-X-TARGETDURATION:<s>
 > > > > > > > > >
 > > > > > > > > >    where s is an integer indicating the target duration
 in seconds.
 > > > > > > > > > }}}
 > > > > > > > > >
 > > > > > > > > > so, i always understand it to EXT-X-TARGETDURATION must
 large than EXTINF,
 > > > > > > > > > I have sent a Email to hls team of apple for check this
 description, maybe there have some misunderstanding words here.
 > > > > > > > > In that version of HLS, floating point segment duration
 was not supported. The segment durations were already integers. Hence
 there is no question of rounding, when target duration was being defined.
 > > > > > > > > But later when the floating point segment duration was
 supported in HLS, the definition for target duration was modified to
 specify "when rounded to nearest integer" to handle floating point segment
 durations.
 > > > > > > >
 > > > > > > > Sorry for my poor English, but i think the '''when rounded
 to the nearest integer''' is description for '''The EXTINF duration of
 each Media Segment in the Playlist file,''', because the they in one
 sentence, and the next sentence said  '''longer segments can trigger
 playback stalls or other errors''', so, the reason to make there have no
 '''longer segments can trigger playback stalls or other errors''' error,
 not use lrint.
 > > > > > > > When EXTINF is 1.080000, set the EXT-X-TARGETDURATION to 2.
 > > > > > > Well, let me explain that entire sentence as a pseudo code.
 > > > > > >
 > > > > > > {{{
 > > > > > > The EXTINF duration of each Media Segment in the Playlist
 > > > > > > file, when rounded to the nearest integer, MUST be less than
 or equal
 > > > > > > to the target duration; longer segments can trigger playback
 stalls
 > > > > > > or other errors.
 > > > > > > }}}
 > > > > > >
 > > > > > > {{{
 > > > > > > for each media segment {
 > > > > > >  if (round(EXTINF_duration) <= target_duration) { //when
 rounded to the nearest integer, MUST be less than or equal to the target
 duration
 > > > > > >   Everything is fine here
 > > > > > >  } else { // longer segments ...
 > > > > > >   //Basically when (round(EXTINF_duration) > target_duration)
 > > > > > >   playback stalls or other errors can be triggered;
 > > > > > >  }
 > > > > > > }
 > > > > > > }}}
 > > > > > >
 > > > > > > '''longer segments can trigger playback stalls or other
 errors''', is still part of the same sentence. This means longer after
 rounding to the nearest integer.
 > > > > >
 > > > > > what will happen when just round the target duration ?i think
 the EXTINF will large than target duration
 > > > > round up it is no problem, but round down will have different
 result for users understand,This is why implement an api to process it for
 round up.
 > > > Well as long as it follows the spec, we need not worry. If the
 players were implemented as per the HLS spec then they should handle this
 case correctly.
 > >
 > > I will try to fix this problem
 > Oh! I am glad that we are on the same page. Thanks for your
 understanding.
 > Regarding the fix, I think fixing the target duration is good enough.
 For that I have already sent a patch. http://ffmpeg.org/pipermail/ffmpeg-
 devel/2017-December/222745.html
 > It is theoretically not possible to get audio duration to exactly
 4.00000 seconds. Because an AAC frame size is 1024 which is not a divisor
 of 4*48000 or 4*44100. So the audio segment can't be cut at exact 4.00000
 seconds. Even the HLS spec is fine with that. In section 6.2.4.  Providing
 Variant Streams, it mentions that the target duration of all variants to
 be same. No explicit mention for the EXTINF duration indirectly means that
 those durations need not be exactly same(as it is not possible
 theoretically).
 >
 > {{{
 >       Each Media Playlist in each Variant Stream MUST have the same
 >       target duration.  The only exceptions are SUBTITLES Renditions and
 >       Media Playlists containing an EXT-X-I-FRAMES-ONLY tag, which MAY
 >       have different target durations if they have an EXT-X-PLAYLIST-
 >       TYPE of VOD.
 > }}}



 {{{
 MacBook:xxx StevenLiu$ ./ffmpeg -hide_banner -i
 ~/bbb_sunflower_1080p_30fps_normal.mp4 -g 150 -r 100 -x264opts
 "scenecut=-1" -f dash -min_seg_duration 1000000 -window_size 99999 -t 5
 -hls_playlist 1 output_Steven.mpd
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
 '/Users/StevenLiu/bbb_sunflower_1080p_30fps_normal.mp4':
   Metadata:
     major_brand     : isom
     minor_version   : 1
     compatible_brands: isomavc1
     creation_time   : 2013-12-16T17:44:39.000000Z
     title           : Big Buck Bunny, Sunflower version
     artist          : Blender Foundation 2008, Janus Bager Kristensen 2013
     comment         : Creative Commons Attribution 3.0 -
 http://bbb3d.renderfarming.net
     genre           : Animation
     composer        : Sacha Goedegebure
   Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
 1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
 (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:39.000000Z
       handler_name    : GPAC ISO Video Handler
     Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
 s16p, 160 kb/s (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:42.000000Z
       handler_name    : GPAC ISO Audio Handler
     Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
 fltp, 320 kb/s (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:42.000000Z
       handler_name    : GPAC ISO Audio Handler
     Side data:
       audio service type: main
 Stream mapping:
   Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
   Stream #0:2 -> #0:1 (ac3 (native) -> aac (native))
 Press [q] to stop, [?] for help
 [libx264 @ 0x7f7fa5085200] using SAR=1/1
 [libx264 @ 0x7f7fa5085200] using cpu capabilities: MMX2 SSE2Fast SSSE3
 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
 [libx264 @ 0x7f7fa5085200] profile High, level 5.1
 [libx264 @ 0x7f7fa5085200] 264 - core 133 r2334M a3ac64b - H.264/MPEG-4
 AVC codec - Copyleft 2003-2013 - http://www.videolan.org/x264.html -
 options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7
 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1
 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6
 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0
 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1
 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=150 keyint_min=15
 scenecut=0 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0
 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
 [aac @ 0x7f7fa5086a00] Using a PCE to encode channel layout
 [dash @ 0x7f7fa6087400] No bit rate set for stream 0
 [dash @ 0x7f7fa6087400] Opening 'init-stream0.m4s' for writing
 [dash @ 0x7f7fa6087400] Opening 'init-stream1.m4s' for writing
 [dash @ 0x7f7fa6087400] Opening 'output_Steven.mpd.tmp' for writing
 Bandwidth info not available, set audio and video bitrates
 Output #0, dash, to 'output_Steven.mpd':
   Metadata:
     major_brand     : isom
     minor_version   : 1
     compatible_brands: isomavc1
     composer        : Sacha Goedegebure
     title           : Big Buck Bunny, Sunflower version
     artist          : Blender Foundation 2008, Janus Bager Kristensen 2013
     comment         : Creative Commons Attribution 3.0 -
 http://bbb3d.renderfarming.net
     genre           : Animation
     encoder         : Lavf58.3.100
     Stream #0:0(und): Video: h264 (libx264), yuv420p(progressive),
 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 100 fps, 12800 tbn, 100 tbc
 (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:39.000000Z
       handler_name    : GPAC ISO Video Handler
       encoder         : Lavc58.8.100 libx264
     Side data:
       cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
     Stream #0:1(und): Audio: aac (LC), 48000 Hz, 5.1(side), fltp, 394 kb/s
 (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:42.000000Z
       handler_name    : GPAC ISO Audio Handler
       encoder         : Lavc58.8.100 aac
     Side data:
       audio service type: main
 [dash @ 0x7f7fa6087400] Opening 'chunk-stream0-00001.m4s.tmp' for
 writingdrop=0 speed=0.532x
 [dash @ 0x7f7fa6087400] Opening 'chunk-stream1-00001.m4s.tmp' for writing
 [dash @ 0x7f7fa6087400] Opening 'output_Steven.mpd.tmp' for writing
 [dash @ 0x7f7fa6087400] Opening 'chunk-stream0-00002.m4s.tmp' for
 writingdrop=0 speed=0.526x
 [dash @ 0x7f7fa6087400] Opening 'chunk-stream1-00002.m4s.tmp' for writing
 [dash @ 0x7f7fa6087400] Opening 'output_Steven.mpd.tmp' for writing
 [dash @ 0x7f7fa6087400] Opening 'chunk-stream0-00003.m4s.tmp' for
 writingdrop=0 speed=0.455x
 [dash @ 0x7f7fa6087400] Opening 'chunk-stream1-00003.m4s.tmp' for writing
 [dash @ 0x7f7fa6087400] Opening 'output_Steven.mpd.tmp' for writing
 [dash @ 0x7f7fa6087400] Opening 'chunk-stream0-00004.m4s.tmp' for writing
 [dash @ 0x7f7fa6087400] Opening 'chunk-stream1-00004.m4s.tmp' for writing
 [dash @ 0x7f7fa6087400] Opening 'output_Steven.mpd.tmp' for writing
 frame=  500 fps= 41 q=-1.0 Lsize=N/A time=00:00:05.01 bitrate=N/A dup=354
 drop=0 speed=0.41x
 video:365kB audio:241kB subtitle:0kB other streams:0kB global headers:0kB
 muxing overhead: unknown
 [libx264 @ 0x7f7fa5085200] frame I:4     Avg QP:16.78  size: 20153
 [libx264 @ 0x7f7fa5085200] frame P:150   Avg QP:20.52  size:  1625
 [libx264 @ 0x7f7fa5085200] frame B:346   Avg QP:18.64  size:   141
 [libx264 @ 0x7f7fa5085200] consecutive B-frames:  5.4%  5.2%  5.4% 84.0%
 [libx264 @ 0x7f7fa5085200] mb I  I16..4: 92.8%  3.1%  4.1%
 [libx264 @ 0x7f7fa5085200] mb P  I16..4:  6.0%  1.9%  0.1%  P16..4:  2.5%
 0.2%  0.2%  0.0%  0.0%    skip:89.2%
 [libx264 @ 0x7f7fa5085200] mb B  I16..4:  0.2%  0.0%  0.0%  B16..8:  0.6%
 0.0%  0.0%  direct: 0.2%  skip:99.0%  L0:33.2% L1:65.7% BI: 1.1%
 [libx264 @ 0x7f7fa5085200] 8x8 transform intra:17.9% inter:71.0%
 [libx264 @ 0x7f7fa5085200] coded y,uvDC,uvAC intra: 3.2% 11.7% 2.2% inter:
 0.2% 0.7% 0.1%
 [libx264 @ 0x7f7fa5085200] i16 v,h,dc,p: 83% 12%  3%  1%
 [libx264 @ 0x7f7fa5085200] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 11% 62%  1%
 2%  1%  2%  1%  1%
 [libx264 @ 0x7f7fa5085200] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 22% 28%  4%
 4%  4%  4%  3%  4%
 [libx264 @ 0x7f7fa5085200] i8c dc,h,v,p: 77% 14%  9%  0%
 [libx264 @ 0x7f7fa5085200] Weighted P-Frames: Y:22.7% UV:22.0%
 [libx264 @ 0x7f7fa5085200] ref P L0: 83.9%  4.5%  9.7%  1.5%  0.4%
 [libx264 @ 0x7f7fa5085200] ref B L0: 73.8% 24.9%  1.3%
 [libx264 @ 0x7f7fa5085200] ref B L1: 97.2%  2.8%
 [libx264 @ 0x7f7fa5085200] kb/s:597.11
 [aac @ 0x7f7fa5086a00] Qavg: 190.813
 MacBook:xxx StevenLiu$ mediastreamvalidator master.m3u8
 mediastreamvalidator: Version 1.2(170822)

 [master.m3u8] Started loading root playlist
 [media_1.m3u8] Started loading media playlist
 Can't deal with multiple sample timings per sample buffer
 [media_1.m3u8] All media files delivered and have end tag, stopping

 --------------------------------------------------------------------------------
 media_1.m3u8
 --------------------------------------------------------------------------------
 Processed 4 out of 4 segments
 Average segment duration: 1.250000
 Total segment bitrates (all discontinuities): average: 397.05 kb/s, max:
 405.60 kb/s
 Playlist max bitrate: 394.000000 kb/s
 Audio Group ID: AUDIO


 Discontinuity: sequence: 0, parsed segment count: 4 of 4, duration: 5.000
 sec, average: 397.05 kb/s, max: 405.60 kb/s
 Track ID: 1
 Audio Codec: AAC-LC
 Audio sample rate: 48000 Hz
 Audio channels: 0
 Audio channel layout: (null)

 --------------------------------------------------------------------------------
 MUST fix issues
 --------------------------------------------------------------------------------

 Error: Playlist vs segment duration mismatch
 --> Detail:  Segment duration 2.0053, Playlist duration: 1.4933
 --> Source:  media_1.m3u8 - chunk-stream1-00003.m4s:73981 at 0

 MacBook:xxx StevenLiu$ git diff
 diff --git a/libavformat/dashenc.c b/libavformat/dashenc.c
 index 5687530f2d..5368a2334c 100644
 --- a/libavformat/dashenc.c
 +++ b/libavformat/dashenc.c
 @@ -358,7 +358,7 @@ static void output_segment_list(OutputStream *os,
 AVIOContext *out, DASHContext
              Segment *seg = os->segments[i];
              double duration = (double) seg->duration / timescale;
              if (target_duration <= duration)
 -                target_duration = hls_get_int_from_double(duration);
 +                target_duration = lrint(duration);
          }

          ff_hls_write_playlist_header(out_hls, 6, -1, target_duration,
 MacBook:xxx StevenLiu$
 }}}


 I think the problem is not the EXTINF and the EXT-X-TARGETDURATION
 problem, the problem the playlist's target duration not same.

 >
 > > > >
 > > > >
 > > > > > > > > >
 > > > > > > > > >
 > > > > > > > > > > >
 > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > >
 > > > > > > > > > > > > > > > >

--
Ticket URL: <https://trac.ffmpeg.org/ticket/6915#comment:26>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list