#3363(ffprobe:new): ffprobe silently drops non-ASCII metadata in VQF files
#3363: ffprobe silently drops non-ASCII metadata in VQF files ---------------------------------+--------------------------------------- Reporter: trejkaz | Type: defect Status: new | Priority: normal Component: ffprobe | Version: unspecified Keywords: | Blocked By: Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | ---------------------------------+--------------------------------------- Summary of the bug: How to reproduce: {{{ % ffprobe -show_format -show_streams -print_format json test.vqf % ffprobe -version ffprobe version N-60503-g28975cb-tessus built on Jan 28 2014 18:43:59 with llvm-gcc 4.2.1 (LLVM build 2336.1.00) configuration: --prefix=/Users/tessus/data/ext/ffmpeg/sw --as=yasm --extra-version=tessus --disable-shared --enable-static --disable-ffplay --enable-gpl --enable-pthreads --enable-postproc --enable-libmp3lame --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libxvid --enable-libspeex --enable-bzlib --enable-zlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libxavs --enable-version3 --enable- libvo-aacenc --enable-libvo-amrwbenc --enable-libvpx --enable-libgsm --enable-libopus --enable-libmodplug --enable-fontconfig --enable- libfreetype --enable-libass --enable-libbluray --enable-filters --enable- runtime-cpudetect libavutil 52. 63.100 / 52. 63.100 libavcodec 55. 49.101 / 55. 49.101 libavformat 55. 28.100 / 55. 28.100 libavdevice 55. 7.100 / 55. 7.100 libavfilter 4. 1.101 / 4. 1.101 libswscale 2. 5.101 / 2. 5.101 libswresample 0. 17.104 / 0. 17.104 libpostproc 52. 3.100 / 52. 3.100 }}} [json @ 0x103000000] 1 invalid UTF-8 sequence(s) found in string 'Bl?mchen', replaced with '' The value ffprobe emits is "Blchen". The value it emitted before fixing #2502 was "Bl�mchen" (invalid character intentional) - which although containing an invalid character, at least retained all the valid characters. The current builds drop the "m" as well as the invalid character. The value I would like to see, however, is "Blümchen". If the issue is that the VQF module is doing something wrong to convert to Unicode, it would be good to get that fixed. If the issue is that VQF is one of those legacy formats where the encoding isn't known, would it be possible to have some way to specify the system encoding? I can't just change the encoding of the entire system, because doing that in a cross-platform way is not really practical. There is a sample exhibiting the issue in the mplayer samples: http://samples.mplayerhq.hu/vqf/handinha.vqf -- Ticket URL: <https://trac.ffmpeg.org/ticket/3363> FFmpeg <http://ffmpeg.org> FFmpeg issue tracker
#3363: ffprobe silently drops non-ASCII metadata in VQF files -------------------------------------+----------------------------------- Reporter: trejkaz | Owner: Type: defect | Status: new Priority: normal | Component: ffprobe Version: unspecified | Resolution: Keywords: | Blocked By: Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+----------------------------------- Comment (by cehoyos): Is this not reproducible with {{{ffmpeg}}}? -- Ticket URL: <https://trac.ffmpeg.org/ticket/3363#comment:1> FFmpeg <http://ffmpeg.org> FFmpeg issue tracker
#3363: ffprobe silently drops non-ASCII metadata in VQF files -------------------------------------+----------------------------------- Reporter: trejkaz | Owner: Type: defect | Status: new Priority: normal | Component: ffprobe Version: unspecified | Resolution: Keywords: | Blocked By: Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+----------------------------------- Comment (by trejkaz): {{{ffmpeg}}} outputs: {{{ Input #0, vqf, from '/Users/trejkaz/Downloads/handinha.vqf': Metadata: title : Hand in Hand (Gewalt ist doof!) comment : http://bluemchen.koti.com.pl copyright : Edel Records GmbH filename : handinha.vqf author : Bl?mchen size : 300441 }}} So it hasn't lost the character, but it has still mangled it. -- Ticket URL: <https://trac.ffmpeg.org/ticket/3363#comment:2> FFmpeg <http://ffmpeg.org> FFmpeg issue tracker
#3363: ffprobe silently drops non-ASCII metadata in VQF files -------------------------------------+----------------------------------- Reporter: trejkaz | Owner: Type: defect | Status: new Priority: normal | Component: ffprobe Version: unspecified | Resolution: Keywords: | Blocked By: Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+----------------------------------- Comment (by saste): Replying to [comment:2 trejkaz]:
{{{ffmpeg}}} outputs:
{{{ Input #0, vqf, from '/Users/trejkaz/Downloads/handinha.vqf': Metadata: title : Hand in Hand (Gewalt ist doof!) comment : http://bluemchen.koti.com.pl copyright : Edel Records GmbH filename : handinha.vqf author : Bl?mchen size : 300441 }}}
So it hasn't lost the character, but it has still mangled it.
This depends on our UTF-8 decoding mechanism. The '?' and the following character are interpreted as a single invalid UTF-8 sequence, and thus are consumed as a single "invalid" sequence. We could add a new flag for lazy decoding (starts from the second character if the whole sequence is invalid, which seems the system used by the terminal), or allow to set the text encoding. -- Ticket URL: <https://trac.ffmpeg.org/ticket/3363#comment:3> FFmpeg <http://ffmpeg.org> FFmpeg issue tracker
participants (1)
-
FFmpeg