[FFmpeg-devel] [PATCH] lavc: make invalid UTF-8 in subtitle output a non-fatal error

Reimar Döffinger Reimar.Doeffinger at gmx.de
Fri Aug 9 20:28:01 CEST 2013

On Fri, Aug 09, 2013 at 12:14:38PM +0200, wm4 wrote:
> On Fri, 9 Aug 2013 12:00:45 +0200
> Nicolas George <nicolas.george at normalesup.org> wrote:
> > Le duodi 22 thermidor, an CCXXI, wm4 a écrit :
> > > How does that work for things like movtext?
> > 
> > # Text strings for display and font names are uniformly coded in UTF-8, or
> > # start with a UTF-16 BYTE ORDER MARK (\uFEFF) and by that indicate that the
> > # string which starts with the byte order mark is in UTF-16.
> Fair enough, but note how the current API can't even handle UTF-16 in
> this case.
> (And this still assumes that all files actually follow the timed text
> specification and don't use legacy charsets, as well as that movtext
> will be the only binary text subtitle format - forever.)
> You didn't answer the other (more important) questions.

While I kind of agree with your position, I think your approach
isn't helpful.
People will often argue or disagree with suggestions forever
if it's just discussed.
It would help far more if you could try to figure out what
the main objections are and to come up with a compromise
(preferably with a patch).
I know it can be frustrating, but some good points have come
up. For example we should ideally make a better job of providing
useful output for those applications that do _not_ want to
reimplement charset detection/conversion.
And I think this will help a lot of users, though I also
think there is a good argument for giving our users options
beyond that (and nicer than "have FFmpeg wrap it to UTF-8 and
then unwrap it back" preferably, though I realize that is a
bit of an issue if the input is UTF-16 since that one contains
0s, so I think there is no way to just pass that through unchanged
with the current API).

More information about the ffmpeg-devel mailing list