[FFmpeg-devel] [PATCH] libavcodec: Do not return encoding errors when -sub_charenc_mode is do_nothing

Nicolas George nicolas.george at normalesup.org
Fri Aug 30 23:04:43 CEST 2013


Le tridi 13 fructidor, an CCXXI, Reimar Döffinger a écrit :
> I think that is problematic. The raw packet data will usually consist of
> mixed data, for example English text from the "container" and some other
> language (maybe even a not-ASCII-compatible encoding?).
> This makes detecting the encoding via a dictionary much more difficult.
> The more extreme example would be SRT with comment lines containing the
> original text in one encoding and the actual subtitles in a different
> encoding.
> Yes, I have not seen this in practice, however I hope you at least
> agree that you cannot do much with the packet-level data here unless
> you re-implement have the subtitle decoder.

You can imagine difficult situations easily until the problem has no
solution at all. But are they really relevant? Right now, for all supported
codecs, the packet payload and the decoded text are very similar, the only
changes are a bit of markup ("<i>" becomes "{\i1}") and special characters
('|' becomes '\n').

I have no reason tu suspect this would change in the future. Encodings are a
thing of the past, all new formats are Unicode-based.

> All this is ignoring practical considerations like applications that
> simply have well working encoding detection that is possibly fine-tuned
> to their specific region/users and that don't want to spend time
> and effort to make FFmpeg similarly well.

Of course. But these fine-tuned detection functions can be applied to the
packet payload just as well as to the decoded text.

Furthermore, I have no objection, and possibly more distant plans, to let
application tell ffmpeg to use its fine-tuned encoding detection function.

> Also, for what you describe with "taking sub_charenc_mode into
> consideration and updating its value", how does that allow to
> make FFmpeg _not_ do the charset conversion?

"taking sub_charenc_mode into consideration" means that applications must
not try to perform encoding conversion on binary codecs; fortunately, the
only binary codec right now handles encodings in a sane way (only UTF-8 and
UTF-16 permitted ; the UTF-16 part is not yet implemented in lavc). Future
codecs are likely to be in the same situation.

"updating its value" answer your question: if lavc is not supposed to do the
conversion, set it to DO_NOTHING, and lavc will not try to recode anything.

It will still check that the recoding has been properly done, though: the
recoding has to be done, because the rest of the chain expects it.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130830/a3235cf5/attachment.asc>


More information about the ffmpeg-devel mailing list