[FFmpeg-devel] [PATCH] Faster ff_sqrt()

Michael Niedermayer michaelni
Sun Jan 20 00:58:42 CET 2008


On Sat, Jan 19, 2008 at 11:15:43PM +0100, Vitor Sessak wrote:
> Hi
> 
> Michael Niedermayer wrote:
> > On Sun, Jan 13, 2008 at 09:14:10PM +0100, Michael Niedermayer wrote:
> >> On Sun, Jan 13, 2008 at 08:49:17PM +0100, Michael Niedermayer wrote:
> > [...]
> >> next one, just reordering the if() to make smaller values faster and some
> >> cosmetics
> >>
> 
> [...]
> 
> > 
> > another minor revission (just av_log2 -> av_log2_16bit)
> > 
> > static inline unsigned int sqrt3(unsigned int a)
> > {
> >     unsigned int b;
> > 
> >     if     (a<(1<<10)- 3) b= sqrt_tab[(a+ 3)>>2 ]>>3;
> >     else if(a<(1<<14)-28) b= sqrt_tab[(a+28)>>6 ]>>1;
> >     else if(a<(1<<16)   ) b= sqrt_tab[ a    >>8 ]   ;
> >     else{
> >         int s= av_log2_16bit(a>>16)>>1;
> >         b= sqrt_tab[a>>(2*s+10)];
> >         b= (FASTDIV(a,b)>>(s+2)) + (b<<s);
> >     }
> > 
> >     return b - (a<b*b);
> > }
> 
> Just a question: you didn't commit it because you didn't have time, or 
> do you think it's not fit for lavu? Wasn't lavu supposed to have 
> efficient code, no matter if it is used in speed critical codecs or not?

is it efficient?
for numbers <128 it needs 2x as much time
for >65536 its faster with the table based FASTDIV, with the generic a/b its
slower
for the rest it is always faster
its data cache needs are MUCH larger
128byte LUT for the SVN code
1536byte for the new proposed code

the actual object/memory requirements are mostly the same (~100byte difference
IIRC)

so this is not really a clear case

the only use case which has been shown was the roq encoder which would become
slower actually with either old or new sqrt ...

so iam undecided ...

for code feeding only small numbers in sqrt() this is a loss, so it is for
code which only occasionally does a sqrt() as the larger tables will be less
likely to be in the cache

if you could reduce the cache needs, make the code faster for small input
or demonstrate a real case which benefits from it that would be very usefull

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I hate to see young programmers poisoned by the kind of thinking
Ulrich Drepper puts forward since it is simply too narrow -- Roman Shaposhnik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080120/4c017c9e/attachment.pgp>



More information about the ffmpeg-devel mailing list