[FFmpeg-devel] [PATCH] "Mojibake" in Japanese

Tetsuya Yoshida tetu at eth0.jp
Mon Feb 6 18:12:15 CET 2012


Hi Carl!

> Please provide a sample.

For example, in the case of the 'あ'. ('あ' is Japanese multibyte character)
Byte code written in Shift JIS is '0x82 0xa0'.
Byte code written in UTF-8 is '0xe3 0x81 0x82'.
When the '0x82 0xa0' is written as ISO-8859-1, libavformat is read as UTF-8.
But UTF-8 does not have the character corresponding to the '0x82 0xa0'.
So Mojibake occurs.
Also, Since bytes length is changed by PUT_UTF8,
Outputted byte code will not match neither Shift JIS nor UTF-8 encodings.

libiconv convert between different character encodings.
In this case, '0x82 0xa0' will convert to '0xe3 0x81 0x82'.
So libavformat will be able to read correctly the ID3 Tags.

I was prepared to a mp3 file.
It is occurs Mojibake.
in title.
It is written the 'あ' by Shift JIS as ISO-8859-1 in title.

Also, I written libiconv sample source.
You will get image easily, if you run a program.
If that does not contain Japanese fonts on your computer, it would not be
displayed correctly.
Please run a program on your computer.

Line 1 of outputted file is written in Shift JIS.
If open as Shift JIS encoding, line 1 is correct display.

Line 2 of outputted file is written in UTF-8.
If open as UTF-8 encoding, line 2 is correct display.

==============================
$ gcc mojibake.c -liconv
$ ./a.out > mojibake.txt

$ vim -c 'e ++enc=sjis' mojibake.txt
ShiftJIS: あ ( 0x82 0xa0 )
UTF8: 縺? ( 0xe3 0x81 0x82 )

$ vim -c 'e ++enc=utf8' mojibake.txt
ShiftJIS: ?? ( 0x82 0xa0 )
UTF8: あ ( 0xe3 0x81 0x82 )
==============================

Did you understand Mojibake?

Yoshida Tetsuya

2012/2/6 Carl Eugen Hoyos <cehoyos at ag.or.at>

> tetu <tetu <at> eth0.jp> writes:
>
> > In Japan, ID3 Tags in a lot of mp3 file is written in Shift JIS as
> > ISO-8859-1.
> > So, when libavformat read ISO-8859-1 ID3 Tags as UTF-8, unable to
> > read correct string.
>
> Please provide a sample.
>
> Carl Eugen
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sjis.mp3
Type: audio/mpeg
Size: 2433 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120207/450dcf53/attachment.mp3>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mojibake.c
Type: text/x-csrc
Size: 547 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120207/450dcf53/attachment.bin>


More information about the ffmpeg-devel mailing list