[FFmpeg-trac] #6021(avcodec:new): tx3g / mov_text subtitles are not encoded correctly in some specific cases

FFmpeg trac at avcodec.org
Sat Dec 17 08:35:09 EET 2016


#6021: tx3g / mov_text subtitles are not encoded correctly in some specific cases
---------------------------------------+-----------------------------------
             Reporter:  erikbs         |                    Owner:
                 Type:  defect         |                   Status:  new
             Priority:  normal         |                Component:  avcodec
              Version:  git-master     |               Resolution:
             Keywords:  utf8 mov_text  |               Blocked By:
             Blocking:                 |  Reproduced by developer:  0
Analyzed by developer:  0              |
---------------------------------------+-----------------------------------

Comment (by erikbs):

 New patch submitted for correctly decoding styles when multibyte UTF-8
 characters are involved.

 About UTF-16:
 Given that the byte length, {{{uint64_t L}}}, of the string, {{{char
 *text}}}, is known, the number of UTF-16 characters in the string can be
 calculated as follows:
 {{{
 uint64_t utf16_char_len(const char *text, uint64_t L) {
     uint64_t l = 0;
     uint16_t c = 0, start = 0;
     uint16_t m[2] = {0xFC00, 0xDC00}; // Bit masks

     if (L >= 2) c = ((uint16_t)text[0] << 8) + (uint8_t)text[1];
     switch (c) {
         case 0xFFFE:    // Little Endian, swap mask byte order
             m[0] = 0x00FC; m[1] = 0x00DC;
         case 0xFEFF:
             start = 2;  // Skip the BOM
         default:
             for (uint64_t i = start; i < L; i += 2)
                 if (((((uint16_t)text[i] << 8) |
                     (uint8_t )text[i + 1] ) & m[0]) != m[1]) l++;
     }
     return l;
 }
 }}}
 This code expects to be fed valid UTF-16 data and assumes Big Endian when
 no BOM is present.

 The format specification only requires Big Endian support, but it demands
 that the BOM be present for UTF-16. Exactly how this is supposed to be
 encoded I don’t know.

--
Ticket URL: <https://trac.ffmpeg.org/ticket/6021#comment:6>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list