[FFmpeg-trac] #2502(FFprobe:open): ffprobe Produces Invalid JSON

FFmpeg trac at avcodec.org
Fri Nov 8 17:54:16 CET 2013


#2502: ffprobe Produces Invalid JSON
-------------------------------------+-----------------------------------
             Reporter:  dnicolson    |                    Owner:
                 Type:  defect       |                   Status:  open
             Priority:  normal       |                Component:  FFprobe
              Version:  unspecified  |               Resolution:
             Keywords:  utf8         |               Blocked By:
             Blocking:               |  Reproduced by developer:  1
Analyzed by developer:  1            |
-------------------------------------+-----------------------------------
Changes (by saste):

 * analyzed:  0 => 1
 * keywords:   => utf8
 * status:  new => open
 * reproduced:  0 => 1


Comment:

 Replying to [comment:14 dnicolson]:
 > I have made a reduced case and attached a file (test-pattern.avi), as
 requested.
 >
 > I created an AVI file with ffmpeg using the following command:
 >
 > ffmpeg -i test-pattern-orig.avi -metadata title="æ" -metadata
 artist="`echo -e \"\xe6\"`" -vcodec copy -acodec copy test-pattern.avi
 > (backticks need to be added around the monospaced text).
 >

 > This creates the file test-pattern.avi with the title as a UTF-8 encoded
 lowercase AE and the artist as a ISO-8859-1 encoded lowercase AE. VLC
 displays metadata in ISO-8859-1 so the artist is correctly displayed as
 "æ" but displays the title as "æ".

 AE in ISO8859-1 = 0xE6
 AE in UTF-8     = 0xC386

 As a consequence, AE encoded in UTF-8 will render in IS08859-1 as two
 distinct characters, and ISO8859-1 AE will not correspond to a valid UTF-8
 sequence.

 Now the problem is to understand what's the reference encoding. FFmpeg
 always assumes UTF-8, so you should provide metadata encoded in UTF-8
 format. Note that your command is broken since you're explicitly passing
 an invalid UTF-8 sequence to the metadata option (which expects UTF-8
 data).

 Currently there is no way to specify (nor autodetect) the assumed
 encoding.

 > Because ffprobe assumes all valid UTF-8 in the metadata, the following
 command produces invalid JSON:
 >
 > ffprobe -v quiet -print_format json -show_format -show_streams test-
 pattern.avi | python -c 'import json,sys; json.load(sys.stdin)'
 >

 > A possible solution would be to strip invalid UTF-8 characters, or maybe
 provide an alternate switch to replace invalid characters?

 Implemented in an experimental patchset, see ticket #1163.

-- 
Ticket URL: <https://ffmpeg.org/trac/ffmpeg/ticket/2502#comment:16>
FFmpeg <http://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list