[Ffmpeg-devel] retrieving asf textual info in other languages

Hauke Duden H.NS.Duden
Tue May 17 11:06:04 CEST 2005


M?ns Rullg?rd wrote:

>"??" <c_liao at openfind.com.tw> writes:
>
>  
>
>>>note, the thing must be converted to utf8
>>>      
>>>
>>>Is it acceptable to use iconv() for the conversion?
>>>      
>>>
>>isn't libiconv GPL'ed ? so.. the simplest solution is just to treat
>>it as a buffer and let user handle it, however I haven't dived into
>>other parts of the code so not sure whether will cause any
>>problems. or use iconv as default routine for conversion and have a
>>callback interface for user to define their own conversion routine
>>if they don't wish to use GPL'ed iconv.
>>    
>>
>
>iconv is part of glibc, and specified by SUSv3, so there should be no
>legal implications from using it, even if some particular
>implementation is under the GPL.
>
>My main concern was portability to non-SUS platforms.
>  
>

Sorry to intrude here, but UTF-8 is very simple. Why not simply convert 
it yourself? Below is a simple striaghtforward encode routine from 
unicode char to UTF-8, if you need one. Since UCS-2 is a subset of 
Unicode this should work for it as well  Use it as you like.


// Returns new offset. Return value=destOffset means either the buffer
// is too small or the character is not unicode.
int utf8EncodeChar(int chr,unsigned char* pDest,int destBytes,int 
destOffset)
{
    int destFree=destBytes-destOffset;       
   
    if(chr<=0x7f)
    {
        //one byte
        if(destFree<1)
            return destOffset;
        pDest[destOffset]=chr;
        return destOffset+1;
    }
    else if(chr<=0x7ff)
    {
        //two bytes
        if(destFree<2)
            return destOffset;
        pDest[destOffset+1]=(chr & 0x3f) | 0x80;
        pDest[destOffset]=(chr>>6) | 0xc0;
        return destOffset+2;
    }
    else if(chr<=0xffff)
    {
        //three bytes
        if(destFree<3)
            return destOffset;
        pDest[destOffset+2]=(chr & 0x3f) | 0x80;
        pDest[destOffset+1]=((chr>>6) & 0x3f) | 0x80;
        pDest[destOffset]=(chr>>12) | 0xe0;
        return destOffset+3;
    }
    else if(chr<=0x10ffff)    //the biggest UTF-8 value for 4 bytes is 
actually 0x1fffff but that is not unicode anymore
    {
        //four bytes
        if(destFree<4)
            return destOffset;
        pDest[destOffset+3]=(chr & 0x3f) | 0x80;
        pDest[destOffset+2]=((chr>>6) & 0x3f) | 0x80;
        pDest[destOffset+1]=((chr>>12) & 0x3f) | 0x80;
        pDest[destOffset]=(chr>>18) | 0xf0;
        return destOffset+4;
    }
    else
    {
        //not a unicode character.
        return destOffset;
    }
}


int utf8GetEncodedCharBytes(int chr)
{
    if(chr<=0x7f)
        return 1;           
    else if(chr<=0x7ff)
        return 2;
    else if(chr<=0xffff)
        return 3;
    else if(chr<=0x10ffff)    //the biggest UTF-8 value for 4 bytes is 
actually 0x1fffff but that is not unicode anymore
        return 4;
    else
        return 0;
}









More information about the ffmpeg-devel mailing list