[FFmpeg-devel] [Issue 664] [PATCH] Fix AAC PNS Scaling

Michael Niedermayer michaelni
Wed Oct 8 04:30:16 CEST 2008


On Wed, Oct 08, 2008 at 04:45:52AM +0300, Uoti Urpala wrote:
> On Wed, 2008-10-08 at 02:55 +0200, Michael Niedermayer wrote:
> > On Tue, Oct 07, 2008 at 05:23:56PM -0400, Alex Converse wrote:
> > > Attached is a version that explicitly uses a int32_t.
> > 
> > i will object to this patch until someone posts a speed comparission
> > between it and ff_sqrt.
> 
> I tested a simple loop doing "sum += 1/sqrtf(i)" on core2. As expected,
> "1./ff_sqrt(i)" is the slowest way to calculate that. Standard
> "1./sqrtf(i)" is equally fast with default flags and somewhat faster
> with -ffast-math (and has better accuracy). The code from Alex is about
> twice as fast.

on a Pentium Dual  @ 1.73GHz

ff_sqrt() is as expected much faster than sqrtf(), iam rather surprised
about your results, maybe you could post your test code?

942848790 dezicycles in ff_sqrt, 1 runs, 0 skips
820105780 dezicycles in sqrtf, 1 runs, 0 skips
320925930 dezicycles in sqrt alex, 1 runs, 0 skips
689735345 dezicycles in ff_sqrt, 2 runs, 0 skips
740571455 dezicycles in sqrtf, 2 runs, 0 skips
322195770 dezicycles in sqrt alex, 2 runs, 0 skips
562373695 dezicycles in ff_sqrt, 4 runs, 0 skips
700777317 dezicycles in sqrtf, 4 runs, 0 skips
322461587 dezicycles in sqrt alex, 4 runs, 0 skips
498531962 dezicycles in ff_sqrt, 8 runs, 0 skips
681025767 dezicycles in sqrtf, 8 runs, 0 skips
323032856 dezicycles in sqrt alex, 8 runs, 0 skips
467081127 dezicycles in ff_sqrt, 16 runs, 0 skips
671043717 dezicycles in sqrtf, 16 runs, 0 skips
322701851 dezicycles in sqrt alex, 16 runs, 0 skips
450925613 dezicycles in ff_sqrt, 32 runs, 0 skips
666131574 dezicycles in sqrtf, 32 runs, 0 skips
322542049 dezicycles in sqrt alex, 32 runs, 0 skips
443217610 dezicycles in ff_sqrt, 64 runs, 0 skips
664184340 dezicycles in sqrtf, 64 runs, 0 skips
322520999 dezicycles in sqrt alex, 64 runs, 0 skips
3318.429688 3318.004883 3314.035156

one also can see here that alex code is about a factor of 10 less accurate
also one has to keep in mind that these are synthetic tests and we really
should be testing with the AAC code.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "libavutil/common.h"
#include "libavutil/internal.h"
#include "libavutil/log.h"

static float inv_sqrtf(float x) {
    union { float f; int i; } pun;
    float xhalf = 0.5f*x;
    pun.f = x;
    pun.i = 0x5f3759df - (pun.i>>1);
    x = pun.f;
    x = x*(1.5f-xhalf*x*x);
    return x;
}

#define N (1000000000)
#undef printf
main(){
        int i, j;
        float sum0=0;
        float sum1=0;
        float sum2=0;
        for(j=0; j<100; j++){
                {START_TIMER
                for(i=1; i<N; i+=1000){
                        sum0 += 1./ff_sqrt(i);
                }
                STOP_TIMER("ff_sqrt")}
                {START_TIMER
                for(i=1; i<N; i+=1000){
                        sum1 += 1./sqrtf(i);
                }
                STOP_TIMER("sqrtf")}
                {START_TIMER
                for(i=1; i<N; i+=1000){
                        sum2 += inv_sqrtf(i);
                }
                STOP_TIMER("sqrt alex")}
        }
        printf("%f %f %f\n", sum0, sum1, sum2);
}
with -O3 -ffast-math
---------

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

> ... defining _GNU_SOURCE...
For the love of all that is holy, and some that is not, don't do that.
-- Luca & Mans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081008/f8d1da5a/attachment.pgp>



More information about the ffmpeg-devel mailing list