[FFmpeg-devel] [PATCH] libavcodec: Do not return encoding errors when -sub_charenc_mode is do_nothing

Fri Aug 30 12:17:43 CEST 2013

On Fri, 30 Aug 2013 11:56:22 +0200
Nicolas George <nicolas.george at normalesup.org> wrote:

> Le tridi 13 fructidor, an CCXXI, Paul B Mahol a écrit :
> > Clearly you are not interested in optional solution, and instead
> > asks me to do your homework.
> 
> If you have something interesting to say on the subject, please say it,
> hopefully in more than two lines. If not, I would appreciate if you would
> stop muddying discussions with that kind of remarks.
> 
> There is no such thing as an optimal solution to this question, you should
> know it. The solution I suggested has worked well enough for thousands of
> Vim users for years, is very simple, can be controlled by the user if
> necessary, and is actually implemented, not vaporware.

Your simplistic patch (the text file one you sent to this list some
time ago) would be a disimprovement. It completely takes the control
over how charsets can be selected from the application, and replaces it
with a very naive algorithm that not only won't produce good results in
most cases, but makes it impossible for the application to handle this
situation gracefully (let's not even talk about user friendliness).

Ideally, we'd have perfect charset detection, and we'd have it in
ffmpeg. However, that is not possible. What's left is any of those:
automatic charset detection via statistic methods (Mozilla's code,
ENCA, libicu, libguess, maybe more), and via user intervention. User
intervention would mean that the user can select from a number of
possible candidates if the encoding couldn't be reliably detected.
Think GUIs. With your new patch, this would have to happen after ffmpeg
already turned the subtitles into mojibake, which surely is against the
goal.

The current design - charset detection on packet level - is not great,
but it works. The only thing that doesn't work are UTF16/UTF32 files.
You could handle these specially by using thr same trick as mplayer
does (convert and look if you can parse a subtitle, see
sub/subreader.c), or alternatively by requiring the user to do
something similar (if no input format is found, convert the probe data
to UTF16, and see if a format can be detected). The latter method would
suck as well, but it would work.

By the way, what makes you even think that vim is good at handling
charsets? We have no reason to believe that. In fact, I bet vim really
sucks at international text - how is it going to render complex scripts
like arabic or indic on the terminal?

Let me repeat again that probing a few hardcoded (or even user
provided) charsets by trying to run iconv on them makes the worst
charset detector possible.

> If you have better to propose, or even just suggest, please go ahead. If
> not, I do not see the point you are trying to make.
> 
> Regards,
>