[FFmpeg-devel] [PATCH] Support for UTF8 filenames on Windows

Karl Blomster thefluff
Fri Jun 26 16:07:11 CEST 2009


M?ns Rullg?rd wrote:
> Karl Blomster <thefluff at uppcon.com> writes:
> 
>> Ramiro Polla wrote:
>>> Hi,
>>> On Thu, Jun 25, 2009 at 8:59 AM, Michael
>>> Niedermayer<michaelni at gmx.at> wrote:
>>>> On Sat, Jun 20, 2009 at 11:56:37PM +0200, Kalle Blomster wrote:
>>>>> Currently, ffmpeg on Windows does not support opening files whose names
>>>>> contain characters that cannot be expressed in the current locale, because
>>>>> on Windows you can't pass UTF8 in a char* to _open() and have it work. You
>>>>> have to convert the filename to UTF16 and use _wopen(), which takes a
>>>>> wchar_t instead.
>>>>>
>>>>> I have attached a patch that attempts to solve the problem with a rather
>>>>> ugly hack. It Works For Me(tm) under mingw at least. Comments are
>>>>> appreciated.
>>>>>
>>>>> Regards,
>>>>> Karl Blomster
>>>>>  os_support.c |   17 +++++++++++++++++
>>>>>  os_support.h |    5 +++++
>>>>>  2 files changed, 22 insertions(+)
>>>>> 9afa6887f1f6998c37d75efaae5d589918dc752b  ffmpeg_win_utf8_paths.patch
>>>>> Index: libavformat/os_support.c
>>>>> ===================================================================
>>>>> --- libavformat/os_support.c  (revision 19242)
>>>>> +++ libavformat/os_support.c  (working copy)
>>>>> @@ -30,6 +30,23 @@
>>>>>  #include <sys/time.h>
>>>>>  #include "os_support.h"
>>>>>
>>>>> +#ifdef HAVE_WIN_UTF8_PATHS
>>>>> +#define WIN32_LEAN_AND_MEAN
>>>>> +#include <windows.h>
>>>>> +#endif
>>>>> +
>>>>> +#ifdef HAVE_WIN_UTF8_PATHS
>>> Where is HAVE_WIN_UTF8_PATHS defined?
>> Nowhere, right now. My thought is to let configure set it with some
>> --enable parameter, or you just pass -DHAVE_WIN_UTF8_PATHS in your
>> CFLAGS. The point was that I thought it might be a good idea to let
>> the user compile with it disabled, if he wanted to, like if someone
>> wanted to build on Win9x (heh) or something where unicode support
>> might not be available.
> 
> Can we simply test for the existence of _wopen()?  Is there any reason
> to disable this if the function exists?

That may be dangerous. It will always exist in the MinGW includes/libraries, but 
that doesn't mean it's implemented and works in the runtime libraries you end up 
using. See also below.

>>>>> +int winutf8_open(const char *filename, int oflag, int pmode)
>>>>> +{
>>>>> +     wchar_t wfilename[MAX_PATH * 2];
>>>>> +
>>>>> +     if (MultiByteToWideChar(CP_UTF8,MB_ERR_INVALID_CHARS,filename,-1,wfilename,MAX_PATH) > 0)
>>>>> +             return _wopen(wfilename, oflag, pmode);
>>>>> +     else
>>>>> +             return open(filename, oflag, pmode);
>>>>> +}
>>>>> +#endif
> 
> What might cause MultiByteToWideChar() to fail?  What will plain
> open() do with such input?  Also, what is the value of MAX_PATH?
> It is probably a bad idea to silently truncate the filename at
> MAX_PATH characters.  This could turn an invalid name into the name of
> an existing file.

MultiByteToWideChar() will fail in this case if the input string has characters 
that cannot be translated as valid UTF8 (since MB_ERR_INVALID_CHARS is 
specified). This might happen if you have a multi-byte string that isn't UTF8, 
like for example in the system's local code page (if it's multi-byte). It can 
also fail if the buffer length is insufficient, or if you lack CP_UTF8, but 
neither should be a concern here.

open() should, as far as I am aware, deal gracefully with multi-byte strings in 
the system locale, but since it is conceivable that there might be multi-byte 
characters in the local code page that can be interpreted as valid UTF-8 even 
though they are not, and considering the fact that the MSVCRT behaves really 
weirdly with character translations sometimes, the only truly safe option here 
is to pass only UTF-8 or latin-1; other character sets are not guaranteed to 
work. Hence my preference for leaving it optional, so people who want UTF-8 
filenames on Windows can get them and everyone else can go about their business 
as usual.

MAX_PATH is defined to 260 in WinDef.h, and that is actually the maximum allowed 
path length in the Win32 API unless you want to jump through some hoops. Paths 
of up to 32,767 characters (approximately) are allowed, but only if they are 
absolute and start with the magical \\?\ prefix. I guess I could do some 
detection of relative paths and add said magical prefix manually if so desired, 
but the static allocation seems safe enough, and the 260 character limit is 
indeed what a vast majority of Windows programs use.

Updated patch with less tabs (and a rather embarrassing typo fix) attached.

Regards,
Karl Blomster
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ffmpeg_win_utf8_paths_v2.patch
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090626/d3f20237/attachment.txt>



More information about the ffmpeg-devel mailing list