[FFmpeg-trac] #2502(FFprobe:open): ffprobe Produces Invalid JSON

FFmpeg trac at avcodec.org
Fri Nov 8 17:54:16 CET 2013

#2502: ffprobe Produces Invalid JSON
             Reporter:  dnicolson    |                    Owner:
                 Type:  defect       |                   Status:  open
             Priority:  normal       |                Component:  FFprobe
              Version:  unspecified  |               Resolution:
             Keywords:  utf8         |               Blocked By:
             Blocking:               |  Reproduced by developer:  1
Analyzed by developer:  1            |
Changes (by saste):

 * analyzed:  0 => 1
 * keywords:   => utf8
 * status:  new => open
 * reproduced:  0 => 1


 Replying to [comment:14 dnicolson]:
 > I have made a reduced case and attached a file (test-pattern.avi), as
 > I created an AVI file with ffmpeg using the following command:
 > ffmpeg -i test-pattern-orig.avi -metadata title="æ" -metadata
 artist="`echo -e \"\xe6\"`" -vcodec copy -acodec copy test-pattern.avi
 > (backticks need to be added around the monospaced text).

 > This creates the file test-pattern.avi with the title as a UTF-8 encoded
 lowercase AE and the artist as a ISO-8859-1 encoded lowercase AE. VLC
 displays metadata in ISO-8859-1 so the artist is correctly displayed as
 "æ" but displays the title as "æ".

 AE in ISO8859-1 = 0xE6
 AE in UTF-8     = 0xC386

 As a consequence, AE encoded in UTF-8 will render in IS08859-1 as two
 distinct characters, and ISO8859-1 AE will not correspond to a valid UTF-8

 Now the problem is to understand what's the reference encoding. FFmpeg
 always assumes UTF-8, so you should provide metadata encoded in UTF-8
 format. Note that your command is broken since you're explicitly passing
 an invalid UTF-8 sequence to the metadata option (which expects UTF-8

 Currently there is no way to specify (nor autodetect) the assumed

 > Because ffprobe assumes all valid UTF-8 in the metadata, the following
 command produces invalid JSON:
 > ffprobe -v quiet -print_format json -show_format -show_streams test-
 pattern.avi | python -c 'import json,sys; json.load(sys.stdin)'

 > A possible solution would be to strip invalid UTF-8 characters, or maybe
 provide an alternate switch to replace invalid characters?

 Implemented in an experimental patchset, see ticket #1163.

Ticket URL: <https://ffmpeg.org/trac/ffmpeg/ticket/2502#comment:16>
FFmpeg <http://ffmpeg.org>
FFmpeg issue tracker

More information about the FFmpeg-trac mailing list