[FFmpeg-trac] #10291(ffmpeg:new): FFmpeg removes IETF BCP-47 language tags from MKV files during remuxing or encoding
FFmpeg
trac at avcodec.org
Thu Mar 30 05:26:00 EEST 2023
#10291: FFmpeg removes IETF BCP-47 language tags from MKV files during remuxing or
encoding
--------------------------------+--------------------------------------
Reporter: ptr727 | Type: defect
Status: new | Priority: normal
Component: ffmpeg | Version: git-master
Keywords: mkv | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
--------------------------------+--------------------------------------
**Summary:**
When FFmpeg creates MKV files from MKV files, the LanguageIETF tags from
the original file is not written, and the language granularity is lost.
For reference see:
- https://datatracker.ietf.org/doc/draft-ietf-cellar-matroska/
- https://gitlab.com/mbunkus/mkvtoolnix/-/wikis/Languages-in-Matroska-and-
MKVToolNix
- https://github.com/ietf-wg-cellar/matroska-
specification/blob/master/ebml_matroska.xml#L434
- https://en.wikipedia.org/wiki/IETF_language_tag
- https://r12a.github.io/app-subtags/
Create media file snippet from MKV that contains IETF BCP-47 tags:
{{{
mkvmerge --split parts:00:00:00-00:01:00 --output MKV-IETF-Snippet.mkv
MKV-IETF.mkv
}}}
Use MkvMerge to create a JSON file describing the MKV contents:
{{{
mkvmerge --identify MKV-IETF-Snippet.mkv --identification-format json
}}}
Note the presence of language and language_ietf tags in the file:
{{{
"language": "srp"
"language_ietf": "sr-Latn-RS"
}}}
Similar output can be produced using MediaInfo and FfProbe:
{{{
mediainfo --Output=XML MKV-IETF-Snippet.mkv
<Language>sr-Latn-RS</Language>
}}}
{{{
ffprobe -loglevel quiet -show_streams -show_format -print_format json MKV-
IETF-Snippet.mkv
"language": "srp"
}}}
Note that FfProbe only uses the ISO693-3 tags, and ignores the IETF BCP-47
tags.
ReMux the file using FfMpeg
{{{
ffmpeg -i MKV-IETF-Snippet.mkv -map 0 -codec copy -f matroska MKV-IETF-
Snippet-FfMpeg.mkv
}}}
Repeat the steps above to get the MKV tag information, and note that the
IETF language tags have been stripped from the output file.
{{{
"language": "srp"
}}}
The "sr-Latn-RS" detailed language has been reduced the "srp", losing the
regional specifics.
Observed behavior: ffmpeg strips IETF language tags from files.
Expected behavior: ffmpeg retains IETF tags (or all Matroska tags even if
not interpreted) from the source file.
Nice to have behavior: FfProbe emits IETF language tags.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/10291>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list