[FFmpeg-devel] [PATCH] lavc: make invalid UTF-8 in subtitle output a non-fatal error

Nicolas George nicolas.george at normalesup.org
Fri Jun 28 22:08:38 CEST 2013

Le decadi 10 messidor, an CCXXI, Reimar Döffinger a écrit :
> Well, there are reasons why applications can't easily/always add the
> correct options.
> One of them which this is probably about is when the application wants to
> do its own detection and conversion, after some additional buffering for
> example.

This point seems completely acceptable, of course, this is obviously sound
design. But there is nothing to change to the API to achieve that: the
application just has to ensure that its conversion is done properly, i.e. it
produces packets with syntactically valid UTF-8. That is completely trivial
to do (and I would have no objection to some kind of
av_replace_invalid_utf8 helper function, too).

> Maybe a charenc that says "just dump through whatever raw data you have"
> would be acceptable to you, too?

Right now, the API guarantees that the output of avcodec_decode_subtitles is
always syntactically valid UTF-8. I believe that anything that loosens that
guarantee is wrong.

> Note the reason why I was considering it (and why falling back to some
> kind of "error concealment" might make sense) is that it is a simple way
> for the user to manually figure out the encoding.
> I believe many native speakers have some idea of how their encoding looks
> if incorrectly interpreted as ASCII or similar.

You are right, but this will mostly come automatically once encoding
auto-detection is implemented, and the proposals in this thread were


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130628/55bf321a/attachment.asc>

More information about the ffmpeg-devel mailing list