[FFmpeg-trac] #7637(avcodec:new): movenc.c does not properly handle subtitle durations (including pauses) exceeding INT_MAX - 1 microseconds
FFmpeg
trac at avcodec.org
Wed Dec 26 06:20:25 EET 2018
#7637: movenc.c does not properly handle subtitle durations (including pauses)
exceeding INT_MAX - 1 microseconds
---------------------------------+--------------------------------------
Reporter: erikbs | Type: defect
Status: new | Priority: normal
Component: avcodec | Version: git-master
Keywords: | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
---------------------------------+--------------------------------------
Summary of the bug:
Take this sample VTT/SRT file:
{{{
WEBVTT
1
00:35:47.484 --> 00:35:50.000
Durations that exceed the signed
int max value break the program
}}}
The first timecode translates to 2 147 484 000 microseconds, which is
slightly greater the greatest value a signed 32-bit integer can hold (i.e.
INT_MAX = 2 147 483 647). From what I understand, empty subtitle frames
are written when there are no subtitles to display, which in this case
means that a 35 min ~48 sec long empty frame is supposed to be written
first. This exceeds the max int value and in ways unknown to me breaks the
output file. The problem will occur in all of the following cases:
1. The first text block starts after more than 35 min 47 sec
2. The duration of any text block exceeds 35 min 47 sec
3. The time between any two consecutive text blocks exceeds 35 min 47 sec
However, it does ''not'' occur if there is more than 35 min 47 sec left of
the video after the last text block has been shown (I believe this is
because the subtitles stream stops right after the last block, but I’m not
sure – in the past ffmpeg would extend the last text block so that it
would not end until the video/audio did, at least that happened when
extracting the subtitles as an SRT file).
How to reproduce:
With input.mp4 being any mp4 video file that is at least 35 minutes and 50
seconds long and input.vtt being a text file containing the lines above,
consider the following command line and output:
{{{
% ffmpeg -i input.mp4 -i 'input.vtt' -c copy -c:s mov_text test.mp4 -y
ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers
built with Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM
3.5svn)
configuration: --prefix=/opt/local --enable-swscale --enable-avfilter
--enable-avresample --enable-libmp3lame --enable-libvorbis --enable-
libopus --enable-librsvg --enable-libtheora --enable-libopenjpeg --enable-
libmodplug --enable-libvpx --enable-libsoxr --enable-libspeex --enable-
libass --enable-libbluray --enable-lzma --enable-gnutls --enable-
fontconfig --enable-libfreetype --enable-libfribidi --disable-libjack
--disable-libopencore-amrnb --disable-libopencore-amrwb --disable-libxcb
--disable-libxcb-shm --disable-libxcb-xfixes --disable-indev=jack
--enable-opencl --disable-outdev=xv --enable-audiotoolbox --enable-
videotoolbox --enable-sdl2 --disable-securetransport
--mandir=/opt/local/share/man --enable-shared --enable-pthreads
--cc=/usr/bin/clang --arch=x86_64 --enable-x86asm --enable-libx265
--enable-gpl --enable-postproc --enable-libx264 --enable-libxvid
libavutil 56. 22.100 / 56. 22.100
libavcodec 58. 35.100 / 58. 35.100
libavformat 58. 20.100 / 58. 20.100
libavdevice 58. 5.100 / 58. 5.100
libavfilter 7. 40.101 / 7. 40.101
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 3.100 / 5. 3.100
libswresample 3. 3.100 / 3. 3.100
libpostproc 55. 3.100 / 55. 3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
creation_time : 2013-12-12T07:49:32.000000Z
Duration: 01:35:52.42, start: 0.000000, bitrate: 497 kb/s
Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 /
0x31637661), yuv420p, 640x262, 398 kb/s, 25 fps, 25 tbr, 50 tbn, 50 tbc
(default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz,
stereo, fltp, 95 kb/s (default)
Metadata:
creation_time : 2013-12-12T07:50:59.000000Z
handler_name : IsoMedia File Produced by Google, 5-11-2011
Input #1, webvtt, from 'input.vtt':
Duration: N/A, bitrate: N/A
Stream #1:0: Subtitle: webvtt
Output #0, mp4, to 'test.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
encoder : Lavf58.20.100
Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 /
0x31637661), yuv420p, 640x262, q=2-31, 398 kb/s, 25 fps, 25 tbr, 12800
tbn, 50 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz,
stereo, fltp, 95 kb/s (default)
Metadata:
creation_time : 2013-12-12T07:50:59.000000Z
handler_name : IsoMedia File Produced by Google, 5-11-2011
Stream #0:2: Subtitle: mov_text (tx3g / 0x67337874)
Metadata:
encoder : Lavc58.35.100 mov_text
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (copy)
Stream #1:0 -> #0:2 (webvtt (native) -> mov_text (native))
Press [q] to stop, [?] for help
frame=41005 fps=0.0 q=-1.0 size= 102912kB time=00:35:47.48 bitrate=
392.6kbits/
[mp4 @ 0x7fa8b2807600] Application provided duration: 2147484000 /
timestamp: 2147484000 is out of range for mov/mp4 format
[mp4 @ 0x7fa8b2807600] pts has no value
frame=78181 fps=78176 q=-1.0 size= 188160kB time=00:52:07.28 bitrate=
492.9kbit
frame=114025 fps=76013 q=-1.0 size= 275200kB time=01:16:01.11 bitrate=
494.3kbi
frame=139645 fps=69820 q=-1.0 size= 339712kB time=01:33:06.27 bitrate=
498.2kbi
frame=143809 fps=69301 q=-1.0 Lsize= 350392kB time=01:35:52.39 bitrate=
499.0kbits/s speed=2.77e+03x
video:280077kB audio:67411kB subtitle:0kB other streams:0kB global
headers:0kB muxing overhead: 0.835837%
}}}
The produced files is valid, but the subtitles track is messed up.
Investigation:
The result is that the subtitle line is printed out as soon as the movie
starts, instead of being shown at the correct time offset. If the file
contains more than one text block, they are shown right after each other,
but each with the correct duration.
In the Terminal printout, there will be an “Application provided duration:
####” warning for the text block that breaks the encoding and then one for
each text block following it. There will also be a “pts has no value”
warning for all of them.
Note that in the encoding status lines (“frame=... fps=... […] time=...”),
the timecode will ''always'' start with the first “bad” value, and not
increase until the encoder has passed that timecode. This provides a clue;
it means that something goes wrong even before the actual encoding starts.
I have not checked how (or if) MP4Box tackles this.
In this case I used the mov_text codec for the subtitles, but movtextenc.c
does not seem to care about durations at all, so the problem lies
somewhere else (i.e. it’s not mov_text specific).
I noticed that in movenc.c, AVPacket structs actually are integrity
checked, and a warning is printed out if the duration exceeds INT_MAX.
However, this event is not handled. Instead, the packet is thrown away
before any frame is written, which I guess is why the text blocks stack up
at the beginning of the video with my test files.
Possible solution:
If a subtitle frame (empty or not) will have a duration of more than
INT_MAX microseconds, ffmpeg should instead split it into identical frames
with durations of up to INT_MAX - 1. The mov_write_packet function in
libavformat/movenc.c is probably where it must be done, cf. how it already
calls mov_write_subtitle_end_packet() after the last subtitle packet/frame
has been written.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/7637>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list