[FFmpeg-devel] [RFC][PATCH] ffmpeg: add option to transform metadata using iconv

Nicolas George george at nsup.org
Thu Sep 24 20:37:54 CEST 2015


Le tridi 3 vendémiaire, an CCXXIV, James Darnley a écrit :
> As far as I understand the iconv API, it doesn't appear to do this for
> you.  So adding this feature would require writing code to handle more
> errors returned from the iconv() function.  That means a more
> complicated argument handling structure is needed.
> 
> I don't mind trying to write this but it would be better to do it behind
> the API you propose.

Of course. Actually, it is already there in the API, although I am not quite
satisfied because it can not be set as an option.

>			I will help you with it as best I can because I
> seem to have involuntarily volunteered myself.

I need some feedback to know if this kind of API is useful in FFmpeg (other
people are welcome to give advice too!), and to know if the actual API I
propose is suitable for various needs. But as for writing the code, I expect
it to be quite straightforward.

The question where I most need feedback is this: shall I make an API that
allows to convert from any encoding to any encoding, or an API that can
convert from any encoding to UTF-8 and from UTF-8 to any encoding?

There are pros and cons for each case. UTF-8 to/from anything is enough for
the needs of any sane program, and makes the handling of the replacement
character easier (because it can be specified in UTF-8 directly). OTOH,
any-to-any is more generic.

> I don't know what to say here.  I know the encodings needed for iconv
> because I arrived at them by brute force.  I wrote a short Lua script to
> iterate over a list of encodings supported by my iconv and arrived at
> this answer.  The command line tool called iconv is too clever for this
> because it returns an error when it can't convert.  As for ending in
> GBK, it is what the script told me.

Could you share the script and enough input to run it and reproduce the
results?

> This feature would not work if there was a misinterpretation in the
> middle.  As you say that would need A->B and C->D where B != C.  Perhaps
> this is why my solution isn't perfect, because there should be an
> assumption in the middle.
> 
> I could rework my code to allow for assumptions in the middle.  My case
> would then use "CP1252,UTF-8,UTF-8,GBK" as an argument.

I must say, I do not like your approach very much because it manipulates
text encoding in the middle of the program. All strings inside the program
should be in UTF-8.

I can propose this: add an option "metadata_text_encoding" to
AVFormatContext. If it is set on a demuxer, the demuxing framework uses it
to convert from it to UTF-8; and similarly, if it is set on a muxer, the
muxing framework uses it to convert from UTF-8 to it.

Then we can have a special syntax for it to specify bogus conversions.
Possibly: -metadata_text_encoding "[CP1252>UTF-8]GBK" to specify that the
text must first be converted from CP1252 to UTF-8 then considered to be GBK
(and converted to UTF-8). (Well, I consider the feature evil, so I will
probably not volunteer to implement it, but I will not oppose as long as it
can not be triggered too easily by an unsuspecting user.

What do you think of it?

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150924/9233caee/attachment.sig>


More information about the ffmpeg-devel mailing list