[FFmpeg-devel] [BUG] UTF-8 decoder vulnerable to character spoofing attacks

Rich Felker dalias
Mon Oct 22 16:24:41 CEST 2007

The UTF-8 decoder in libavutil has a classic bug where it incorrectly
decodes illegal sequences as aliases for valid characters. For
instance, the sequence "C0 80" will decode to NUL, and the sequence
"C0 AF" will decode to '/'. Aside from possible direct vulnerabilities
(of which there are probably none at present, but I have not checked),
this can lead to indirect problems by allowing illegal sequences to
get into files generated by ffmpeg, causing problems for other
processes interpreting the files.

In addition, the code fails to detect illegal sequences beginning with
more than 4 bits equal to 1. I have attached a naive, inefficient
patch for fixing these issues, but someone should really write a
better fix.

Also, a few other 'bugs' in the code:
- it does not reject 'surrogate' codepoint positions or noncharacters
- it's not wrapped in the proper do { ... } while(0) construct to make
  it behave as a single statement.


P.S. I was going to use the bug tracker but it seems to be
misconfigured and unable to send me a confirmation mail... Also the
certificates are invalid. So for now, it's going to the list..

P.P.S. The code is also fundamentally incorrect if any error recovery
from illegal sequences is desired, since there's no "unget byte" after
an illegal sequence. For instance, "C2 41" will give an illegal
sequence error, but eat the 'A' which is then unrecoverable. The
calling code has no way of knowing whether the last byte that got
eaten was invalid itself (e.g. "C0") or the start of a new
potentially-valid sequence after an incomplete sequence. A string of
all non-ascii characters will thus be _entirely_ lost if there's a
single incomplete sequence at the beginning.
-------------- next part --------------
--- common.h	Sat Oct  6 12:23:17 2007
+++ common.h.fixed	Mon Oct 22 10:20:57 2007
@@ -236,16 +236,19 @@
 #define GET_UTF8(val, GET_BYTE, ERROR)\
     val= GET_BYTE;\
-        int ones= 7 - av_log2(val ^ 255);\
-        if(ones==1)\
+        static const int min[5] = { 0, 0, 0x80, 0x800, 0x10000 };
+        int ones= 7 - av_log2(val ^ 255), i=ones;\
+        if(ones==1 || ones>4)\
         val&= 127>>ones;\
-        while(--ones > 0){\
+        while(--i > 0){\
             int tmp= GET_BYTE - 128;\
             val= (val<<6) + tmp;\
+        if (val < min[ones])\
+            ERROR\

More information about the ffmpeg-devel mailing list