[FFmpeg-devel] GSoC with FFMpeg waht a combination!

Michael Niedermayer michaelni
Mon Mar 24 22:47:07 CET 2008


On Mon, Mar 24, 2008 at 11:18:00PM +0200, Uoti Urpala wrote:
> On Mon, 2008-03-24 at 21:16 +0100, Michael Niedermayer wrote:
> > On Mon, Mar 24, 2008 at 08:08:43PM +0200, Uoti Urpala wrote:
> > > On Mon, 2008-03-24 at 00:41 +0100, Michael Niedermayer wrote:
> > > > On Sun, Mar 23, 2008 at 03:31:15AM +0200, Uoti Urpala wrote:
> > > > > Also it's not that obvious that there would be much of a speed
> > > > > difference. Even a directly hardcoded string without any lookup can be
> > > > > swapped out, and a lookup structure is unlikely to need a separate seek
> > > > > and read for every accessed page.
> > > > 
> > > > binary search needs O(log N) seeks.
> > > 
> > > That's the asymptotic behavior as you get beyond billions of translated
> > > messages, but in practice the number of disk seeks needed during the
> > > binary search is significantly less than log2(N) because several last
> > > steps are read with a single seek.
> > 
> > a 64k string table with an average of 10bytes per string with a binary search
> > behaves as follows:
> > 
> > read 16 -20k (ptr to string)
> > read 48 -52k
> > read  8 -12k (ptr to string)
> > read 40 -44k
> > read  4 - 8k (ptr to string)
> > read 36 -40k
> > read  0 - 4k (ptr to string)
> > read 32 -36k
> > 
> > these are 8 seeks
> 
> That's a fucked up table... there are 32 KiB of pointers to strings and
> 32 KiB of actual strings (I assume you meant Ki, not k)? Assuming 4-byte

I didnt mean kiddy byte no. And gettext needs more than 7 byte to store a
2 byte index IIRC. Complain to them for the fucked up format ...


[...]
> > A hashtable which is 50% full would need <2 seeks at average with billions
> > of strings.
> 
> Getting to 2 seeks on average would cost extra space though, as you'd
> need to store enough information in each hash table cell to be able to
> verify whether it's the correct one without extra seeks (and this would
> be wasted in the unused cells).

This discussion is very tireing, it does waste maybe 4 bit on average and
maybe it ends uo needing 2.1 seeks on average.
Werent you just arguing a moment ago that 200 bits for an average string
didint matter.
Or is it that you just count bits in my design diferently? Could you spare
me of this trolling please!

Anyway the hash table could be designed like:
2-4 bytes (depends on string array size) in each hash table entry, that being
an index into the strings array and 4-8 bits of that to quickly catch most
wrong ones. The string than has the remaining bits of the hash which arent
yet known from hash table position and not part of the 4-8 bits prepended.

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

There will always be a question for which you do not know the correct awnser.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080324/390db0e1/attachment.pgp>



More information about the ffmpeg-devel mailing list