[FFmpeg-devel] GSoC with FFMpeg waht a combination!

Michael Niedermayer michaelni
Sun Mar 23 00:19:39 CET 2008


On Sun, Mar 23, 2008 at 12:33:04AM +0200, Uoti Urpala wrote:
> On Sat, 2008-03-22 at 13:29 +0100, Michael Niedermayer wrote:
> > On Sat, Mar 22, 2008 at 12:37:39AM -0400, Ronald S. Bultje wrote:
> > > Will you accept patches that add internationalization-support to
> > > ffmpeg/lav*?
> > > 
> > > It's high on my fairy-list.
> > 
> > Sure just keep in mind that you will be flamed if this is done with gettext :)
> > The reason being
> > * gettext duplicates english strings all over the place
> 
> I'm not sure exactly what gettext duplicates, but how much space would
> this waste? It'd need to have many copies of every string to make a
> difference for FFmpeg.

I would estimate that gettext needs roughly twice as much disk space as a
integer based system and about 3 times as much in memory
(english string in _() and as key as well as the translated string)
The disk space matters only for embeded systems of course. There it can
matter a lot though, especially with few codecs and many languages.


> 
> > * gettext uses strings as keys (very inefficient requireing O(log n) lookups)
> 
> O(log n) is NOT inefficient for text lookups!

If the area where the strings are has been paged out to disk then O(log n)
can be alot slower than O(1). And i would expect it to be paged out in
practice as ffmpeg doesnt print the overhelming amount of these strings
regularly. If it were in memory id immedeatly agree with you that O(log N)
is irrelevant.


> Also your claim that using
> strings as keys would necessarily require O(log n) lookups is not true.
> Hash tables require O(1) on average, and your own suggested method needs
> an equal lookup.

Yes, but if you do use a hash table why calculate the hash values at runtime?
Why store the english strings twice instead of corresponding hash values?
I dont belive you consider this good design. It is plain waste of space.


> The only calculation your suggestion can save is
> calculating the hash at runtime, which is O(length of string) and thus
> cannot affect O() behavior (assuming the result is of similar length and
> has to be output).

The .gmo files do not contain hashes, they contain 2 lists of pointers to
to arrays of sorted strings. This structure is designed for a O(log N)
binary search. One of course could convert that to a sane hash table on
load but then you loose the file backing this hash table which means
it needs more time if its paged out.
Also the gmo files are 50% larger than they have to be and you still have
your key strings from _() in memory wasting space.


> 
> > Requirements:
> > * getting a string must be O(1)
> 
> Nonsensical requirement. Being so fast that further speedups make no
> difference is more than fast enough, and for translations that does not
> require O(1).

see above


> 
> > * no duplicate strings in the final binary files
> 
> I assume you mean that the English strings are not used at all when
> using a translation. This is also a nonsensical requirement, as the
> total amount of translated text is not large enough to justify it. There
> are easier ways to reduce FFmpeg binary size (and by a larger amount)
> than the complexity required for this.

If you can reduce ffmpegs binary size by a large amount and with little
compexity do it please, but judging from your past claims this is just
hot air you at best sketch something very complex noone and especially
not you will ever implement.
Also even of you can reduce ffmpegs size significantly, this is hardly
an argument not to attempt to implement other unrelated improvments.


> 
> > * uncoordinated additon of new keys
> > * uncoordinated additon of new translated strings
> > * easy to use for both devels and translators
> 
> These are fulfilled by doing lookups based on the English string.
> 
> > The requirement of no duplicate strings leads to integers as keys.
> > The requirement of O(1) and uncoordinated additon of new keys leads
> > to the use of a hash table and 64bit hash values of 
> > default (english/in source) strings as keys.
> 
> But you chose those requirements incorrectly, so these consequences are
> wrong too.

Wrong according to your definition of correct.


> 
> 
> You really need to learn that concentrating obsessively on performance
> alone is not the right way to solve all development questions. Other
> factors are much more important for a translation system.

Which would that be?

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many things microsoft did are stupid, but not doing something just because
microsoft did it is even more stupid. If everything ms did were stupid they
would be bankrupt already.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080323/12c33df4/attachment.pgp>



More information about the ffmpeg-devel mailing list