[FFmpeg-devel] GSoC with FFMpeg waht a combination!

Michael Niedermayer michaelni
Mon Mar 24 21:16:55 CET 2008

On Mon, Mar 24, 2008 at 08:08:43PM +0200, Uoti Urpala wrote:
> On Mon, 2008-03-24 at 00:41 +0100, Michael Niedermayer wrote:
> > On Sun, Mar 23, 2008 at 03:31:15AM +0200, Uoti Urpala wrote:
> > > > > O(log n) is NOT inefficient for text lookups!
> > > > 
> > > > If the area where the strings are has been paged out to disk then O(log n)
> > > > can be alot slower than O(1). And i would expect it to be paged out in
> > > > practice as ffmpeg doesnt print the overhelming amount of these strings
> > > > regularly. If it were in memory id immedeatly agree with you that O(log N)
> > > > is irrelevant.
> > > 
> > > If it's used rarely enough to need paging does the speed really matter?
> > 
> > Well, having 1 second extra delay before a error message comes up is giving
> > a bad impression. Things like that make a program look bloated and slow to
> 1 second is a huge exaggeration in practical uses. On a setup where
> printing a message would take that long FFmpeg startup would take ages.
> > > Also it's not that obvious that there would be much of a speed
> > > difference. Even a directly hardcoded string without any lookup can be
> > > swapped out, and a lookup structure is unlikely to need a separate seek
> > > and read for every accessed page.
> > 
> > binary search needs O(log N) seeks.
> That's the asymptotic behavior as you get beyond billions of translated
> messages, but in practice the number of disk seeks needed during the
> binary search is significantly less than log2(N) because several last
> steps are read with a single seek.

a 64k string table with an average of 10bytes per string with a binary search
behaves as follows:

read 16 -20k (ptr to string)
read 48 -52k
read  8 -12k (ptr to string)
read 40 -44k
read  4 - 8k (ptr to string)
read 36 -40k
read  0 - 4k (ptr to string)
read 32 -36k

these are 8 seeks

A hashtable which is 50% full would need <2 seeks at average with billions
of strings.

Also for each of these read pages another page must be paged out needing
another seek thus its 16 vs 4.
And your idea of reading much more than 4kb, well you can but other
equally large blocks would be paged out.
And just to see how idiotic this argument is,
consider a embeded system with 1mb ram, this thing is not going to page
half of its memory out on each page fault.

> Anyway this isn't too important if even gettext doesn't use binary
> search in practice.

Which shows that even the gettext developers consider binary searches
problematic. I assume they have more practical experience with them in
respect to translation than you.
Of course that will not stop you from explaining us why you are still
correct and the world is wrong.

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080324/cbab0e4d/attachment.pgp>

More information about the ffmpeg-devel mailing list