[FFmpeg-devel] [PATCH] lavc: make invalid UTF-8 in subtitle output a non-fatal error

Michael Niedermayer michaelni at gmx.at
Sat Aug 10 17:01:11 CEST 2013


On Thu, Aug 08, 2013 at 07:19:27PM +0200, Reimar Döffinger wrote:
> On Fri, Jun 28, 2013 at 10:08:39PM +0200, wm4 wrote:
> > On Fri, 28 Jun 2013 22:00:14 +0200
> > Nicolas George <nicolas.george at normalesup.org> wrote:
> > 
> > > I agree. I already told that I had work-in-progress encoding
> > > autodetection somewhere. But wm4's proposal is a step in the wrong
> > > direction for that.
> > 
> > Except auto-detection is not the solution. Not for every application,
> > and not for this specific problem. It's also fragile and complicated
> > and will most likely require (optional) external dependencies.
> 
> I think we are just going round in circles.
> Why not just implement a relatively simple solution like
> supporting a special charenc "noconv" that just passes
> things through (possibly broken) as they are, plus
> some documentation update that says in this case we
> make no promise the output is UTF-8?

I think a passthrough or noconv option that treats the encoding as
opaque would be a good idea.

Most encodings are supersets of ascii so that the alphanummeric chars,
punctuations and various brackets have the same representation like
8bit ascii has and no other code points use these values then.
Thus most things that a demuxer, parser or decoder does should not
need to know the encoding. spliting into tokens, finding alphanummeric
keys or tags, concatenation, even when the encodings are multibyte.
Thus such a passthrough mode appears practical to me.
The main exception to such ascii "superset" charsets that iam aware
of is UTF16. But that should be very easy to detect and convert.

Thus a mode in which ascii compatible encodings are left as is and
UTF16 is detected and converted to UTF8, seems quite doable and would
leave the charset detection and convertion to the user application.
It would also work without external iconv dependency

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Dictatorship naturally arises out of democracy, and the most aggravated
form of tyranny and slavery out of the most extreme liberty. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130810/c80fdcf0/attachment.asc>


More information about the ffmpeg-devel mailing list