[FFmpeg-devel] [PATCH] lavc: make invalid UTF-8 in subtitle output a non-fatal error

Nicolas George nicolas.george at normalesup.org
Fri Jun 28 11:00:12 CEST 2013

Le decadi 10 messidor, an CCXXI, wm4 a écrit :
> Such as?

Depends on your input and what you want to do with it.

> I get them from libavformat demuxers, but also elsewhere. I actually
> can perform codepage auto-detection on subs read by libavformat
> demuxers (it's really awkward: read a number of subtitle packages,
> concatenate their contents, then run the charset detector on it). But
> it's disabled by default

Then enable it.

>			   and doesn't guarantee success anyway.

Success is not guaranteed in that you can not be sure to get the right
encoding, but you will always succeed in finding at least one encoding that
can work, since there are common encodings, including plain ISO-8859-1, that
can accept any byte sequence.

>								 In some
> cases, subtitles might be demuxed from interleaved files, in which
> auto-detection can't be reasonably performed.

Do you have any such file where conversion fails? If so, share it.

Also, you have only answered half the question: what do you intend to do
with the decoded subtitles. If garbaged output suits you, do not bother
decoding the subtitles, read them directly from /dev/urandom.

> I have the impression that you still believe the charset problem can
> be solved perfectly. This is not the case. Such problems are very common
> even today, and just showing an error message (or even dropping broken
> text) won't help.

Please provide a realistic scenario where you believe the encoding problem
can not be "solved perfectly" and where your proposal would have helped.


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130628/3dbf60fd/attachment.asc>

More information about the ffmpeg-devel mailing list