[FFmpeg-trac] #7661(avformat:new): SubViewer .sub files with UTF8 encoding are decoded incorrectly

FFmpeg trac at avcodec.org
Wed Jan 9 10:13:22 EET 2019


#7661: SubViewer .sub files with UTF8 encoding are decoded incorrectly
-----------------------------------+--------------------------------------
             Reporter:  lukasf     |                     Type:  defect
               Status:  new        |                 Priority:  normal
            Component:  avformat   |                  Version:  git-master
             Keywords:  SubViewer  |               Blocked By:
             Blocking:             |  Reproduced by developer:  0
Analyzed by developer:  0          |
-----------------------------------+--------------------------------------
 Summary of the bug:

 When loading SubViewer subtitles from an external UTF8 encoded .sub file
 (not embedded into a movie), then the subtitles are decoded incorrectly.

 Looking at the code, the reason seems to be that the BOM is not skipped in
 the subviewer_read_header() function. Skipping the BOM like in
 microdvddec.c would probably fix this.

 We are using ffmpeg as library for video playback (with embedded and
 external subtitles), but the same bug can easily be reproduced when using
 the command line.

 I attached two UTF8 SubViewer files, one with header, one without header.
 Both output a broken first entry.

 How to reproduce:
 {{{
 % ffmpeg -i SubViewer_Header_UTF8.sub -map 0:s:0 output1.srt
 ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers
   built with gcc 8.2.1 (GCC) 20181017...

 Second file:

 % ffmpeg -i SubViewer_NoHeader_UTF8.sub -map 0:s:0 output2.srt

 }}}

 Expected output from both files:
 {{{
 1
 00:04:35,030 --> 00:04:38,820
 Hello guys... please sit down...

 2
 00:05:00,190 --> 00:05:03,470
 M. Franklin,
 are you crazy?
 }}}

 Output from SubViewer_Header_UTF8.sub:
 {{{
 1
 00:00:00,000 --> 00:00:00,000
 [INFORMATION]

 2
 00:04:35,030 --> 00:04:38,820
 Hello guys... please sit down...

 3
 00:05:00,190 --> 00:05:03,470
 M. Franklin,
 are you crazy?

 }}}

 Output from SubViewer_NoHeader_UTF8.sub:
 {{{
 1
 00:00:00,000 --> 00:00:00,000
 00:04:35.03,00:04:38.82
 Hello guys... please sit down...

 2
 00:05:00,190 --> 00:05:03,470
 M. Franklin,
 are you crazy?
 }}}

--
Ticket URL: <https://trac.ffmpeg.org/ticket/7661>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list