[FFmpeg-devel] [PATCH] Faster ff_sqrt()

Michael Niedermayer michaelni
Sun Jan 13 20:13:00 CET 2008


On Sun, Jan 13, 2008 at 05:50:52PM +0100, Vitor Sessak wrote:
> Hi,
>
> I saw in the thread "Copy-on-write pages statistics" that roqaudioenc.c
> is a big copy-on-write memory usage offender. I think that
> increasing the memory usage of FFmpeg by 16kb just for a table used in a
> game format encoder is not a good idea. So I'm sending a new, faster
> ff_sqrt(), so I can remove the table completely without too much speed 
> loss.
>
> Unfortunately, even with the new ff_sqrt, the encoder is not as fast as 
> before :
> (for a 3 min wav file input)
>
> Before:
> 0m0.340s
>
> After:
> 0m0.556s
>
> But it's the audio encoder of a VQ codec and it takes forever to encode the 
> video. So I don't think a few ms more to encode the audio can make any 
> difference...
>
> The code for ff_sqrt() was taken from public domain code at 
> http://atoms.alife.co.uk/sqrt/ (Warning: original in java).
[...]
> +/**
> + * Fast square-root. Completely accurate for x < 2147483648 (i.e. 2^31)...
> + * Code from http://atoms.alife.co.uk/sqrt/
> + */
> +static inline int ff_sqrt(int x) {
> +    int xn;
>  
...
> +    if (x >= 0x10000) {
> +        if (x >= 0x1000000) {
> +            if (x >= 0x10000000) {
> +                if (x >= 0x40000000) {
> +                    xn = ff_sqrt_tab[x >> 24] << 8;
> +                } else {
> +                    xn = ff_sqrt_tab[x >> 22] << 7;
> +                }
> +            } else {
> +                if (x >= 0x4000000) {
> +                    xn = ff_sqrt_tab[x >> 20] << 6;
> +                } else {
> +                    xn = ff_sqrt_tab[x >> 18] << 5;
> +                }
> +            }
>  
...
> +            xn = (xn + 1 + (x / xn)) >> 1;
> +            xn = (xn + 1 + (x / xn)) >> 1;
> +            return ((xn * xn) > x) ? --xn : xn;
> +        } else {
> +            if (x >= 0x100000) {
> +                if (x >= 0x400000) {
> +                    xn = ff_sqrt_tab[x >> 16] << 4;
> +                } else {
> +                    xn = ff_sqrt_tab[x >> 14] << 3;
> +                }
> +            } else {
> +                if (x >= 0x40000) {
> +                    xn = ff_sqrt_tab[x >> 12] << 2;
> +                } else {
> +                    xn = ff_sqrt_tab[x >> 10] << 1;
> +                }
> +            }
> +
> +            xn = (xn + 1 + (x / xn)) >> 1;
> +
> +            return ((xn * xn) > x) ? --xn : xn;
>          }
> +    } else {
> +        if (x >= 0x100) {
> +            if (x >= 0x1000) {
> +                if (x >= 0x4000) {
> +                    xn = (ff_sqrt_tab[x >> 8]) + 1;
> +                } else {
> +                    xn = (ff_sqrt_tab[x >> 6] >> 1) + 1;
> +                }
> +            } else {
> +                if (x >= 0x400) {
> +                    xn = (ff_sqrt_tab[x >> 4] >> 2) + 1;
> +                } else {
> +                    xn = (ff_sqrt_tab[x >> 2] >> 3) + 1;
> +                }
> +            }
> +
> +            return ((xn * xn) > x) ? --xn : xn;
> +        } else {
> +            if (x >= 0) {
> +                return ff_sqrt_tab[x] >> 4;
> +            }
> +        }
>      }
> -    return ret;
> +
> +    return -1;

for a less idiotic and simpler variant i wrote a few years ago see
http://guru.multimedia.cx/fast-integer-square-root/

if anyone can simplify this further that would be welcome :)


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The worst form of inequality is to try to make unequal things equal.
-- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080113/356e4af8/attachment.pgp>



More information about the ffmpeg-devel mailing list