[FFmpeg-devel] [PATCH] asf - read/write metadata as UTF-16

Michael Niedermayer michaelni
Mon Feb 22 20:18:53 CET 2010


On Mon, Feb 22, 2010 at 07:42:18PM +0100, Anton Khirnov wrote:
> On Mon, Feb 22, 2010 at 05:24:52PM +0100, Michael Niedermayer wrote:
> > 
> > id say validating the stuff should happen somewhere else, how did
> > invalid values reach that point to begin with?
> > also your error checking in utf16 is highly incomple, missing checks
> > for surrogate pairs at least
> > 
> ok, removed

> > 
> > are you sure its unicode characters and not bytes/2
> > (needs surrogate pairs to seperate ...)
> > 
> specs say "unicode characters", tests with wmp show it's actually
> bytes/2. updated comment to reflect that.

well done M$ !!!


> > 
> > what is it now? characters? if so 2*len is not enough not all characters
> > can be stored in 2 bytes.
> > 
> 2*(length in bytes of UTF-16 input) >= length in bytes of UTF-8 output
> > [...]
> > > @@ -157,12 +160,12 @@ static void get_tag(AVFormatContext *s, const char *key, int type, int len)
> > >      if ((unsigned)len >= UINT_MAX)
> > >          return;
> > >  
> > > -    value = av_malloc(len+1);
> > > +    value = av_malloc(2*len+1);
> > 
> > integer overflow
> > 
> fixed
> 
> Anton Khirnov

>  common.h |   30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 4178cd774dd8fbf4a19e0a0fd656f52cb5df1fa5  0001-Add-PUT_UTF16-macro.patch
> From c96f2a7dbfbf92c87d2fd71246faf3a8135389de Mon Sep 17 00:00:00 2001
> From: Anton Khirnov <wyskas at gmail.com>
> Date: Sun, 21 Feb 2010 15:20:23 +0100
> Subject: [PATCH 1/6] Add PUT_UTF16 macro.
> 
> ---
>  libavutil/common.h |   30 ++++++++++++++++++++++++++++++
>  1 files changed, 30 insertions(+), 0 deletions(-)
> 
> diff --git a/libavutil/common.h b/libavutil/common.h
> index c91e658..c8c4345 100644
> --- a/libavutil/common.h
> +++ b/libavutil/common.h
> @@ -335,6 +335,36 @@ static inline av_const int av_ceil_log2(int x)
>          }\
>      }
>  
> +/*!
> + * \def PUT_UTF16(val, tmp, PUT_16BIT)
> + * Converts a 32-bit Unicode character to its UTF-16 encoded form (2 or 4 bytes).
> + * \param val is an input-only argument and should be of type uint32_t. It holds
> + * a UCS-4 encoded Unicode character that is to be converted to UTF-16. If
> + * val is given as a function it is executed only once.
> + * \param tmp is a temporary variable and should be of type uint16_t. It
> + * represents an intermediate value during conversion that is to be
> + * output by PUT_16BIT.
> + * \param PUT_16BIT writes the converted UTF-16 data to any proper destination
> + * in desired endianness. It could be a function or a statement, and uses tmp
> + * as the input byte.  For example, PUT_BYTE could be "*output++ = tmp;"
> + * PUT_BYTE will be executed 1 or 2 times depending on input character.
> + */
> +#define PUT_UTF16(val, tmp, PUT_16BIT, ERROR)\
                                          ^^^^^
unused


> +    {\
> +        uint32_t in = val;\
> +        if (in < 0x10000) {\
> +            tmp = in;\
> +            PUT_16BIT\
> +        } else if ( in <= 0x10FFFFU ) {\

else


> +            tmp = 0xD800 | ((in - 0x10000) >> 10);\
> +            PUT_16BIT\
> +            tmp = 0xDC00 | ((in - 0x10000) & 0x3FF);\
> +            PUT_16BIT\
> +        }\
> +    }\
> +
> +
> +
>  #include "mem.h"
>  
>  #ifdef HAVE_AV_CONFIG_H
> -- 
> 1.6.6.1
> 

>  asfenc.c |   14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 618bc87585441e7d67c62d1b850655db6003db6a  0002-asfenc-eliminate-put_str16.patch
> From 72d8bdcc8e8aa861115b6450cc9fd909d7dcb534 Mon Sep 17 00:00:00 2001
> From: Anton Khirnov <wyskas at gmail.com>
> Date: Mon, 22 Feb 2010 14:35:23 +0100
> Subject: [PATCH 2/6] asfenc: eliminate put_str16, it's confusing and only used in one place.
> 
> ---
>  libavformat/asfenc.c |   14 +++++---------
>  1 files changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/libavformat/asfenc.c b/libavformat/asfenc.c
> index cea7180..b24baaf 100644
> --- a/libavformat/asfenc.c
> +++ b/libavformat/asfenc.c
> @@ -203,13 +203,6 @@ static void put_guid(ByteIOContext *s, const ff_asf_guid *g)
>      put_buffer(s, *g, sizeof(*g));
>  }
>  
> -static void put_str16_nolen(ByteIOContext *s, const char *tag);
> -static void put_str16(ByteIOContext *s, const char *tag)
> -{
> -    put_le16(s,strlen(tag) + 1);
> -    put_str16_nolen(s, tag);
> -}
> -
>  static void put_str16_nolen(ByteIOContext *s, const char *tag)
>  {
>      int c;
> @@ -449,6 +442,7 @@ static int asf_write_header1(AVFormatContext *s, int64_t file_size, int64_t data
>      put_le32(pb, s->nb_streams);
>      for(n=0;n<s->nb_streams;n++) {
>          AVCodec *p;
> +        const char *desc;
>  
>          enc = s->streams[n]->codec;
>          p = avcodec_find_encoder(enc->codec_id);

> @@ -461,9 +455,11 @@ static int asf_write_header1(AVFormatContext *s, int64_t file_size, int64_t data
>              put_le16(pb, -1);
>  
>          if(enc->codec_id == CODEC_ID_WMAV2)
> -            put_str16(pb, "Windows Media Audio V8");
> +            desc = "Windows Media Audio V8";
>          else
> -            put_str16(pb, p ? p->name : enc->codec_name);
> +            desc = p ? p->name : enc->codec_name;
> +        put_le16(pb, strlen(desc) + 1); // "number of characters" = length in bytes / 2

the comment and the code dont match


> +        put_str16_nolen(pb, desc);
>          put_le16(pb, 0); /* no parameters */
>  
>  
> -- 
> 1.6.6.1
> 

>  asf.c    |    6 +++---
>  asfenc.c |    5 +----
>  2 files changed, 4 insertions(+), 7 deletions(-)
> f03f2787835c161e424c5fce8b0d87c0aa8a9bd5  0003-asf-don-t-add-WM-prefix-to-all-tags.patch
> From 416d415f0c4c3550f154619908a642c787285f90 Mon Sep 17 00:00:00 2001
> From: Anton Khirnov <wyskas at gmail.com>
> Date: Sun, 21 Feb 2010 17:17:06 +0100
> Subject: [PATCH 3/6] asf: don't add WM/ prefix to all tags, it's only required for some.
> 
> ---
>  libavformat/asf.c    |    6 +++---
>  libavformat/asfenc.c |    5 +----
>  2 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/libavformat/asf.c b/libavformat/asf.c
> index e8b37d4..484a58d 100644
> --- a/libavformat/asf.c
> +++ b/libavformat/asf.c
> @@ -129,10 +129,10 @@ const ff_asf_guid ff_asf_digital_signature = {
>  };
>  
>  const AVMetadataConv ff_asf_metadata_conv[] = {
> -    { "AlbumArtist", "album_artist"},
> -    { "AlbumTitle" , "album"     },
> +    { "WM/AlbumArtist", "album_artist"},
> +    { "WM/AlbumTitle" , "album"     },
>      { "Author"     , "artist"    },
> -    { "TrackNumber", "track"     },
> +    { "WM/TrackNumber", "track"     },
>  //  { "Year"       , "date"      }, TODO: conversion year<->date
>      { 0 }
>  };
> diff --git a/libavformat/asfenc.c b/libavformat/asfenc.c
> index b24baaf..5f584a2 100644
> --- a/libavformat/asfenc.c
> +++ b/libavformat/asfenc.c
> @@ -345,10 +345,7 @@ static int asf_write_header1(AVFormatContext *s, int64_t file_size, int64_t data
>          hpos = put_header(pb, &ff_asf_extended_content_header);
>          put_le16(pb, metadata_count);
>          while ((tag = av_metadata_get(s->metadata, "", tag, AV_METADATA_IGNORE_SUFFIX))) {
> -            put_le16(pb, 2*(strlen(tag->key) + 4));
> -            put_le16(pb, 'W');
> -            put_le16(pb, 'M');
> -            put_le16(pb, '/');
> +            put_le16(pb, 2*(strlen(tag->key) + 1));

thats storing a different length than it did before
no doubt the one before was wrong but this doesnt belong in this patch
and the new looks at least suspicous as thats strlen of utf8


>              put_str16_nolen(pb, tag->key);
>              put_le16(pb, 0);
>              put_le16(pb, 2*(strlen(tag->value) + 1));
> -- 
> 1.6.6.1
> 

>  asfenc.c |   27 ++++++++++++---------------
>  1 file changed, 12 insertions(+), 15 deletions(-)
> 3902328de5c996e93fa2751d8b7fb3b473dbe18b  0004-asfenc-simplify-writing-of-comment-header.patch
> From 9916f5a2ca9d102f91a4831d603c6c90da5a2c11 Mon Sep 17 00:00:00 2001
> From: Anton Khirnov <wyskas at gmail.com>
> Date: Mon, 22 Feb 2010 14:01:00 +0100
> Subject: [PATCH 4/6] asfenc: simplify writing of comment header.

ok if written data does not change

[..]


>  asfenc.c |   76 ++++++++++++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 59 insertions(+), 17 deletions(-)
> 96bb0b5bd069d66852d4ebfa2dbbbc3944effca9  0005-asfenc-write-tags-in-proper-UTF-16.patch
> From bc77adab739ff42ac92544bd59c67baeed7ea33f Mon Sep 17 00:00:00 2001
> From: Anton Khirnov <wyskas at gmail.com>
> Date: Sun, 21 Feb 2010 17:57:17 +0100
> Subject: [PATCH 5/6] asfenc: write tags in proper UTF-16.

i wouldnt normally be asking but as this is M$ design
did you check they use bytes/2 for all fields in wmp? :)
i wouldnt be much surpised if they used that just for some ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100222/47d5d99f/attachment.pgp>



More information about the ffmpeg-devel mailing list