[FFmpeg-trac] #2431(undetermined:new): ffmpeg subtitle encoding of special characters does not working correctly
FFmpeg
trac at avcodec.org
Thu Apr 4 22:07:35 CEST 2013
#2431: ffmpeg subtitle encoding of special characters does not working correctly
-------------------------------------+-------------------------------------
Reporter: Nick | Owner:
Type: defect | Status: new
Priority: normal | Component:
Version: git-master | undetermined
Keywords: sub srt | Resolution:
Blocking: | Blocked By:
Analyzed by developer: 0 | Reproduced by developer: 0
-------------------------------------+-------------------------------------
Comment (by Nick):
You are right, the presence of the UTF-8 BOM is optional but here are
different software tools which can detect the right encoding type (meaning
ANSI text, UTF-8 with BOM or UTF-8 without BOM but not the code page).
I tested MP4Box with *.srt files in ANSI, UTF-8 and UTF-8 w/o BOM. MP4Box
seems to detect the encoding type and create in all three cases the same
result! It is possible!
Another example is the open source tool Notepad++, it can also detect the
encoding type. Maybe you can find in source code of such tools methods to
detect the right encoding type.
ISO-8859-1 and CP-1252 are not exactly the same but the used special
characters in my "subtitle_test.srt" are the same in both! Therefore the
little comment in my srt file ;-) ...
''"These are printable characters of ISO-8859-1:
(*str >= 32 && *str < 128) II (*str >= 160 && *str <= 255)"''
... for this range it is exactly the same.
For the most European Languages like French, German, Italian, Spanish and
more it is enough to use as default CP-1252 or ISO-8859-1.
'''More important for the imported subtitle file is the question:
"Is it plain text or is it already UTF-8?"'''
[[BR]]
My proposal to select a default code page for every subtitle stream:
- If no language is defined for the subtitle stream or the language is
unknown: [[BR]]
--> use CP-1252 as default (or ISO-8859-1)
- If a language is defined (e.g. with '''-metadata:s:s:0 language=ger'''):
[[BR]] --> use a selection table to set automatically a code page
- If a dedicated code page is selected by an option like
"''-sub_charenc''": [[BR]] --> use that setting instead of the other ones
--
Ticket URL: <https://ffmpeg.org/trac/ffmpeg/ticket/2431#comment:8>
FFmpeg <http://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list