[FFmpeg-devel] [BUG] UTF-8 decoder vulnerable to character spoofing attacks

Michael Niedermayer michaelni
Mon Oct 22 19:15:22 CEST 2007

On Mon, Oct 22, 2007 at 01:00:18PM -0400, Rich Felker wrote:
> On Mon, Oct 22, 2007 at 05:31:18PM +0200, Michael Niedermayer wrote:
> > Hi
> > 
> > On Mon, Oct 22, 2007 at 10:24:41AM -0400, Rich Felker wrote:
> > > The UTF-8 decoder in libavutil has a classic bug where it incorrectly
> > > decodes illegal sequences as aliases for valid characters. For
> > > instance, the sequence "C0 80" will decode to NUL, and the sequence
> > > "C0 AF" will decode to '/'. Aside from possible direct vulnerabilities
> > > (of which there are probably none at present, but I have not checked),
> > 
> > > this can lead to indirect problems by allowing illegal sequences to
> > > get into files generated by ffmpeg, causing problems for other
> > > processes interpreting the files.
> > 
> > well, i dont see how any change to GET_UTF8 could affect that.
> > UTF8 generated by ffmpeg should be valid, UTF8 input into ffmpeg gets
> > stored in the output files without ever being touched by GET_UTF8
> *nod* okay. Actually it would be a good idea to verify it though since
> lots of users have their locales set wrong, etc., but this is a
> separate issue from the bug I reported of course.

yes and a patch is welcome ...

> We still have a bug with GET_UTF8 though because it's used for writing
> data into containers that use UTF-16 or UCS-2/4. I suspect writing a 0
> into the container when "C0 80" is encountered is fairly bad...

and a quick grep seems to indicate that only -f psp is affected
asf assumes ASCII input, and actually i think there was a patch from
someone which added UTF8 or some japanese encoding support to the asf
muxer, dunno i dont remember
also patch welcome

> BTW, does ffmpeg even do anything with strings input from the command
> line to convert them to UTF-8? AFAIK it's generating bogus files
> unless the user's locale is UTF-8-based.

add a check which exit() in ffmpeg.c if the locale is not UTF8 :)

> > > In addition, the code fails to detect illegal sequences beginning with
> > > more than 4 bits equal to 1. I have attached a naive, inefficient
> > > patch for fixing these issues, but someone should really write a
> > > better fix.
> > 
> > i would first like to understand under what circumstances the current code
> > is causing a real problem (security or normal bug)
> It could actually cause crash, e.g. if a string is right at the top of
> the heap and contains nothing but 0xff, then the UTF-8 decoder will
> read 8 bytes and crash.

i think the case of just 0xff (without the terminating 0) at the
top of the heap is quite obscure
if such a invalid sequence can happen why not a the first byte of a
valid one? the later would be undetectable and could crash even the finest
UTF8 decoder

> As I said, it will also incorrectly decode illegal aliases for
> characters rather than signalling error. I don't think this will lead
> to vulns in the current code (since UTF-8 decoder is hardly used
> anyway), but it's bad to have buggy code that could cause problems
> when someone uses it in the future. Even if not, it teaches people who
> read it extremely bad practices.
> > > Also, a few other 'bugs' in the code:
> > [...]
> > > - it's not wrapped in the proper do { ... } while(0) construct to make
> > >   it behave as a single statement.
> > 
> > i wouldnt call that "bug" but rather feature, id like to keep the code
> > readable ...
> If you look at the code, it's already inside { } except for one
> statement at the beginning that could be moved inside. Then it's just

but that would break gcc 2.95 ...


Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Breaking DRM is a little like attempting to break through a door even
though the window is wide open and the only thing in the house is a bunch
of things you dont want and which you would get tomorrow for free anyway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071022/d0a6af2d/attachment.pgp>

More information about the ffmpeg-devel mailing list