[FFmpeg-devel] GSoC with FFMpeg waht a combination!

Michael Niedermayer michaelni
Sun Mar 23 21:44:27 CET 2008


On Sun, Mar 23, 2008 at 04:48:44AM +0100, Aurelien Jacobs wrote:
> On Sun, 23 Mar 2008 00:19:39 +0100
> Michael Niedermayer <michaelni at gmx.at> wrote:
> 
> > On Sun, Mar 23, 2008 at 12:33:04AM +0200, Uoti Urpala wrote:
> > > 
> > > Also your claim that using
> > > strings as keys would necessarily require O(log n) lookups is not true.
> > > Hash tables require O(1) on average, and your own suggested method needs
> > > an equal lookup.
> > 
> > Yes, but if you do use a hash table why calculate the hash values at runtime?
> 
> I doubt this would make any mesurable difference, especially for ffmpeg
> which don't really makes intensive use of text output.
> 

> > Why store the english strings twice instead of corresponding hash values?
> 
> Because you have no guaranty that the hash results won't clash.

With 64bits you will have approximately one single such clash when you
reach 4 billion strings. And such clash is immedeatly detected when the
translation file is build.
if you have "just" a million strings your chances if a clash are 
0.000005%
Clashes in this system are of no practical significance. In the once
in a lifetime case where one happens you just add a space in the string.

This of course assume a good hash algo. (Thats also where hashing the
string at runtime gets more costly)


> 
> > I dont belive you consider this good design. It is plain waste of space.
> 
> If you don't care about potential hash clash, that's indeed a bad
> design. It would be nice if english strings could be optionaly
> avoided in the gmo file.
> 
> > > The only calculation your suggestion can save is
> > > calculating the hash at runtime, which is O(length of string) and thus
> > > cannot affect O() behavior (assuming the result is of similar length and
> > > has to be output).
> > 
> > The .gmo files do not contain hashes, they contain 2 lists of pointers to
> > to arrays of sorted strings.
> 
> The doc disagree with you. They contain hashes:
> http://www.gnu.org/software/gettext/manual/html_node/MO-Files.html

I (obviously) didnt find the doc, so i used a hex editor and i can assure
you the actual files i looked at did not contain hash tables. Iam happy
that its in the spec but it would be nicer if it were actually used.


> So gettext is not so different than your proposition.
> 
> If I try to resume the significant differences between gettext and
> your proposition:
> 1) gettext use more disk space/memory by storing english strings twice
>    but your system can't guaranty that there is no hash clash.

Wrong my system can gurantee this. More exactly by compile time failure
and correction of the clash (which is an exceedingly rare event)


> 2) gettext allows your program to run without any additionnal file,
>    while your system require a "translation" file even for default
>    language.

True, this could be corrected of course at the expense of some space.


> 
> Also, how is your proposition supposed to work with such a string:
>   printf (_("The amount is %0" PRId64 "\n"), number);
> This is something quite common in ffmpeg, and gettext knows how to
> handle this.

What problem does this have? Are you suggesting that translation files
would be build in a build environment very different from the main
program? This is quite obscure IMHO. It wont happen in distros and
wont happen for anyone building their own ffmpeg.


> 
> And here is another example which couldn't be translated with
> your proposed way of calculating hashes at compile time:
> 
> static const char *messages[] = {
>     "some very meaningful message",
>     "and another one"
> };
> printf (index > 1 ? "a default message" : messages[index]);

How does gettext handle this?


> Well, reinventing gettext is not so trivial. I think the main
> disadvantage of gettext is that it forces you to have a copy
> of english strings in every translation files. But I'm pretty
> sure this can be fixed in gettext (as an optional feature).

well after someone fixed gettext i wont mind considering it ...


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080323/6d7e8799/attachment.pgp>



More information about the ffmpeg-devel mailing list