[FFmpeg-devel] [Issue 664] [PATCH] Fix AAC PNS Scaling
Michael Niedermayer
michaelni
Wed Oct 8 14:11:17 CEST 2008
On Wed, Oct 08, 2008 at 07:02:22AM +0300, Uoti Urpala wrote:
> On Wed, 2008-10-08 at 04:30 +0200, Michael Niedermayer wrote:
> > On Wed, Oct 08, 2008 at 04:45:52AM +0300, Uoti Urpala wrote:
> > > I tested a simple loop doing "sum += 1/sqrtf(i)" on core2. As expected,
> > > "1./ff_sqrt(i)" is the slowest way to calculate that. Standard
> > > "1./sqrtf(i)" is equally fast with default flags and somewhat faster
> > > with -ffast-math (and has better accuracy). The code from Alex is about
> > > twice as fast.
> >
> > on a Pentium Dual @ 1.73GHz
> >
> > ff_sqrt() is as expected much faster than sqrtf(), iam rather surprised
> > about your results, maybe you could post your test code?
>
> > 443217610 dezicycles in ff_sqrt, 64 runs, 0 skips
> > 664184340 dezicycles in sqrtf, 64 runs, 0 skips
> > 322520999 dezicycles in sqrt alex, 64 runs, 0 skips
> > 3318.429688 3318.004883 3314.035156
> >
> > one also can see here that alex code is about a factor of 10 less accurate
> > also one has to keep in mind that these are synthetic tests and we really
> > should be testing with the AAC code.
>
> Note that the final results here are all completely wrong; a float runs
> out of precision with 100 repetitions of the loop and stops accumulating
> the smaller values. The overall accuracy variation with double sums is
> still similar, but starting each loop from 2 rather than 1 would give a
> lot worse result for ff_sqrt (it calculates 1/sqrt(1) exactly, but
> returning 1 for 1/sqrt(2) is a big error).
as i said, we should be using the aac code not such synthetic tests ...
in reality this really would be taken from the sum of squares of a PRNG
thus going from N/2 ... N is likely more realistic.
besides of course we arent summing 1/sqrts() in aac ...
with N/2..N in 1000 steps i get
1852.454717 1852.420020 1850.734579
which increases the gap in accuracy from *10 to *50
>
> I benchmarked your code with both float and double sum variables. Result
> with floats:
>
> 392916330 dezicycles in ff_sqrt, 1 runs, 0 skips
> 381062230 dezicycles in sqrtf, 1 runs, 0 skips
> 322693660 dezicycles in sqrt alex, 1 runs, 0 skips
> 393098055 dezicycles in ff_sqrt, 2 runs, 0 skips
> 380917730 dezicycles in sqrtf, 2 runs, 0 skips
> 322906540 dezicycles in sqrt alex, 2 runs, 0 skips
> 393145080 dezicycles in ff_sqrt, 4 runs, 0 skips
> 380681745 dezicycles in sqrtf, 4 runs, 0 skips
> 322720327 dezicycles in sqrt alex, 4 runs, 0 skips
> 393035408 dezicycles in ff_sqrt, 8 runs, 0 skips
> 380493788 dezicycles in sqrtf, 8 runs, 0 skips
> 322382240 dezicycles in sqrt alex, 8 runs, 0 skips
> 393197536 dezicycles in ff_sqrt, 16 runs, 0 skips
> 380411030 dezicycles in sqrtf, 16 runs, 0 skips
> 322413626 dezicycles in sqrt alex, 16 runs, 0 skips
> 393182715 dezicycles in ff_sqrt, 32 runs, 0 skips
> 380354178 dezicycles in sqrtf, 32 runs, 0 skips
> 322441543 dezicycles in sqrt alex, 32 runs, 0 skips
> 393194414 dezicycles in ff_sqrt, 64 runs, 0 skips
> 380323271 dezicycles in sqrtf, 64 runs, 0 skips
> 322219995 dezicycles in sqrt alex, 64 runs, 0 skips
> 3318.429688 3318.004883 3314.035156
>
> Result with double sum variables (but still sqrtf, not sqrt):
>
> 312860180 dezicycles in ff_sqrt, 1 runs, 0 skips
> 130667270 dezicycles in sqrtf, 1 runs, 0 skips
> 327645670 dezicycles in sqrt alex, 1 runs, 0 skips
> 313082415 dezicycles in ff_sqrt, 2 runs, 0 skips
> 130357360 dezicycles in sqrtf, 2 runs, 0 skips
> 327654430 dezicycles in sqrt alex, 2 runs, 0 skips
> 312881092 dezicycles in ff_sqrt, 4 runs, 0 skips
> 130205297 dezicycles in sqrtf, 4 runs, 0 skips
> 327785202 dezicycles in sqrt alex, 4 runs, 0 skips
> 312870742 dezicycles in ff_sqrt, 8 runs, 0 skips
> 130198391 dezicycles in sqrtf, 8 runs, 0 skips
> 327706620 dezicycles in sqrt alex, 8 runs, 0 skips
> 312766372 dezicycles in ff_sqrt, 16 runs, 0 skips
> 130203895 dezicycles in sqrtf, 16 runs, 0 skips
> 327761492 dezicycles in sqrt alex, 16 runs, 0 skips
> 312736582 dezicycles in ff_sqrt, 32 runs, 0 skips
> 130194519 dezicycles in sqrtf, 32 runs, 0 skips
> 327788920 dezicycles in sqrt alex, 32 runs, 0 skips
> 312743900 dezicycles in ff_sqrt, 64 runs, 0 skips
> 130205690 dezicycles in sqrtf, 64 runs, 0 skips
> 327773023 dezicycles in sqrt alex, 64 runs, 0 skips
> 6420.692419 6419.931566 6413.858814
double sums here with gcc-4.3
342282374 dezicycles in ff_sqrt, 64 runs, 0 skips
229988839 dezicycles in sqrtf, 64 runs, 0 skips
330980962 dezicycles in sqrt alex, 64 runs, 0 skips
6420.692419 6419.931566 6413.858814
double sums here with gcc-4.2
419079023 dezicycles in ff_sqrt, 64 runs, 0 skips
731808106 dezicycles in sqrtf, 64 runs, 0 skips
314473386 dezicycles in sqrt alex, 64 runs, 0 skips
6420.692419 6419.931566 6413.858814
double sums here with gcc-3.4
374843929 dezicycles in ff_sqrt, 64 runs, 0 skips
596627904 dezicycles in sqrtf, 64 runs, 0 skips
132644535 dezicycles in sqrt alex, 64 runs, 0 skips
6420.692419 6419.931566 6413.858813
double sums here with gcc-3.3
354109591 dezicycles in ff_sqrt, 64 runs, 0 skips
606486272 dezicycles in sqrtf, 64 runs, 0 skips
119127703 dezicycles in sqrt alex, 64 runs, 0 skips
6420.692419 6419.931566 6413.858813
so it seems the gcc version makes a very significant difference as well
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Its not that you shouldnt use gotos but rather that you should write
readable code and code with gotos often but not always is less readable
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081008/4ab7c3a5/attachment.pgp>
More information about the ffmpeg-devel
mailing list