[FFmpeg-devel] GSoC with FFMpeg waht a combination!

Michael Niedermayer michaelni
Sat Mar 22 22:22:28 CET 2008


On Sat, Mar 22, 2008 at 01:29:40PM +0100, Michael Niedermayer wrote:
[...]
> The requirement of no duplicate strings leads to integers as keys.
> The requirement of O(1) and uncoordinated additon of new keys leads
> to the use of a hash table and 64bit hash values of 
> default (english/in source) strings as keys.
> 
> Thus binary files with translated strings would be a hash table of some sort
> with pointers to the strings stored after it. This file could be a custom
> format or a normal ELF object created from a c file.
> 
> Now how to handle this conveniently without reinventing the whole wheel.
> The trick is to use gettext as much as possible :)
> The code in the program, should look like
> av_log(..., _("english string %d blah\n"), ...);
> that way gettext tools can be used to extract these strings and build .po
> files out of them which translators can translate (its totally identical to
> gettext ...)
> 

> The difference happens afterwards, or more specificially
> * A script would go over the source and replace all _("blah") by
>   _(0x1275384ULL) where 0x1275384ULL is a strong 64bit hash of "blah"
> * Another little program would parse the .po files and convert them to
>   the hashtab of translated strings with hash(english string) as keys
>   and store these in the translation files, one for each language.
> 
> At runtime we just need to load the correct file look in the hashtab and
> return the string.

After some testing i found out point 1 is unneeded.
the following test program:
#include <string.h>
#include <inttypes.h>

static uint64_t hash(const char *c) {
    uint64_t r=0;

#define X\
    r= r*1664525 + 1013904223 + *c++;\
    if(!*c)\
        return r;
#define X4 X X X X
#define X16 X4 X4 X4 X4
X16
X16
X16
X16

    return r;
}

uint64_t test(void){
    return hash("This is a test foobar\n");
}
----------gets compiled to-------
test:
        movl    $1908509146, %eax
        movl    $-994996004, %edx
        ret
---------------------------------

So no preprocess step is needed

The only thing we need is
* a good hash function (the one above might do for the moment)
* some program that turns .po files into hashtable+string files
  this should be easy
* an implementation of "_" which hashes the given string and looks
  the hash up in the loaded hashtab to find the translated string.
  this as well should be easy

The rest can be done with gettext tools and some build system changes to
ensure everything gets rebuild as needed.

comments welcome!

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Its not that you shouldnt use gotos but rather that you should write
readable code and code with gotos often but not always is less readable
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080322/437cf081/attachment.pgp>



More information about the ffmpeg-devel mailing list