[FFmpeg-trac] #10291(ffmpeg:new): FFmpeg removes IETF BCP-47 language tags from MKV files during remuxing or encoding

FFmpeg trac at avcodec.org
Thu Mar 30 05:26:00 EEST 2023


#10291: FFmpeg removes IETF BCP-47 language tags from MKV files during remuxing or
encoding
--------------------------------+--------------------------------------
             Reporter:  ptr727  |                     Type:  defect
               Status:  new     |                 Priority:  normal
            Component:  ffmpeg  |                  Version:  git-master
             Keywords:  mkv     |               Blocked By:
             Blocking:          |  Reproduced by developer:  0
Analyzed by developer:  0       |
--------------------------------+--------------------------------------
 **Summary:**
 When FFmpeg creates MKV files from MKV files, the LanguageIETF tags from
 the original file is not written, and the language granularity is lost.

 For reference see:
 - https://datatracker.ietf.org/doc/draft-ietf-cellar-matroska/
 - https://gitlab.com/mbunkus/mkvtoolnix/-/wikis/Languages-in-Matroska-and-
 MKVToolNix
 - https://github.com/ietf-wg-cellar/matroska-
 specification/blob/master/ebml_matroska.xml#L434
 - https://en.wikipedia.org/wiki/IETF_language_tag
 - https://r12a.github.io/app-subtags/


 Create media file snippet from MKV that contains IETF BCP-47 tags:

 {{{
 mkvmerge --split parts:00:00:00-00:01:00 --output MKV-IETF-Snippet.mkv
 MKV-IETF.mkv
 }}}


 Use MkvMerge to create a JSON file describing the MKV contents:

 {{{
 mkvmerge --identify MKV-IETF-Snippet.mkv --identification-format json
 }}}

 Note the presence of language and language_ietf tags in the file:

 {{{
 "language": "srp"
 "language_ietf": "sr-Latn-RS"
 }}}

 Similar output can be produced using MediaInfo and FfProbe:

 {{{
 mediainfo --Output=XML MKV-IETF-Snippet.mkv
 <Language>sr-Latn-RS</Language>
 }}}

 {{{
 ffprobe -loglevel quiet -show_streams -show_format -print_format json MKV-
 IETF-Snippet.mkv
 "language": "srp"
 }}}

 Note that FfProbe only uses the ISO693-3 tags, and ignores the IETF BCP-47
 tags.

 ReMux the file using FfMpeg

 {{{
 ffmpeg -i MKV-IETF-Snippet.mkv -map 0 -codec copy -f matroska MKV-IETF-
 Snippet-FfMpeg.mkv
 }}}

 Repeat the steps above to get the MKV tag information, and note that the
 IETF language tags have been stripped from the output file.

 {{{
 "language": "srp"
 }}}

 The "sr-Latn-RS" detailed language has been reduced the "srp", losing the
 regional specifics.

 Observed behavior: ffmpeg strips IETF language tags from files.
 Expected behavior: ffmpeg retains IETF tags (or all Matroska tags even if
 not interpreted) from the source file.
 Nice to have behavior: FfProbe emits IETF language tags.
-- 
Ticket URL: <https://trac.ffmpeg.org/ticket/10291>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list