[FFmpeg-devel] [Issue 664] [PATCH] Fix AAC PNS Scaling

Uoti Urpala uoti.urpala
Wed Oct 8 07:40:25 CEST 2008


On Wed, 2008-10-08 at 07:02 +0300, Uoti Urpala wrote:
> 312743900 dezicycles in ff_sqrt, 64 runs, 0 skips
> 130205690 dezicycles in sqrtf, 64 runs, 0 skips
> 327773023 dezicycles in sqrt alex, 64 runs, 0 skips
> 6420.692419 6419.931566 6413.858814

For fun I added the following vectorized intrinsics version using
rsqrtps too:

        START_TIMER
        typedef float v4sf __attribute__((vector_size(16)));
        v4sf sum = {0, 0, 0, 0};
        v4sf cnt = {1, 1001, 2001, 3001};
        v4sf add = {4000, 4000, 4000, 4000};
        for (i = 1; i < N; i += 4000) {
            sum += __builtin_ia32_rsqrtps(cnt);
            cnt += add;
        }
        for (i = 0; i < 4; i++)
            sum3 += ((float *)&sum)[i];
        STOP_TIMER("rsqrtps")}

With this the results are (-ffast-math version for others; makes no
difference for the rsqrtps one):

311880190 dezicycles in ff_sqrt, 64 runs, 0 skips
130455941 dezicycles in sqrtf, 64 runs, 0 skips
326274114 dezicycles in sqrt alex, 64 runs, 0 skips
 10004567 dezicycles in rsqrtps, 64 runs, 0 skips
6420.692419 6419.931566 6413.858814 6421.396637

rsqrtps returns an "approximate" value and is between ff_sqrt and
sqrt_alex in accuracy for the sum starting at 1, but it's of course an
order of magnitude faster than anything else.





More information about the ffmpeg-devel mailing list