[Ffmpeg-devel] yet another silly int vs. float benchmark

Måns Rullgård mru
Sat May 21 20:44:56 CEST 2005


Michael Niedermayer <michaelni at gmx.at> writes:

> hmm, try to set fv[] to 0 instead of 1, maybe it overflows

Indeed.  New numbers with GCC:

100 ; needed     3 cycles ->     3 cycles per operation
100 iv[0]+=iv[1];iv[1]+=iv[0]; needed   206 cycles ->   103 cycles per operation
100 iv[0]*=iv[1];iv[1]*=iv[0]; needed  1804 cycles ->   902 cycles per operation
100 fv[0]+=fv[1];fv[1]+=fv[0]; needed   804 cycles ->   402 cycles per operation
100 fv[0]*=fv[1];fv[1]*=fv[0]; needed   804 cycles ->   402 cycles per operation
100 iv[0]+=iv[1];iv[1]+=iv[2];iv[2]+=iv[3];iv[3]+=iv[4];iv[4]+=iv[5]; needed   261 cycles ->    52 cycles per operation
100 iv[0]*=iv[1];iv[1]*=iv[2];iv[2]*=iv[3];iv[3]*=iv[4];iv[4]*=iv[5]; needed  2010 cycles ->   402 cycles per operation
100 fv[0]+=fv[1];fv[1]+=fv[2];fv[2]+=fv[3];fv[3]+=fv[4];fv[4]+=fv[5]; needed   511 cycles ->   102 cycles per operation
100 fv[0]*=fv[1];fv[1]*=fv[2];fv[2]*=fv[3];fv[3]*=fv[4];fv[4]*=fv[5]; needed   511 cycles ->   102 cycles per operation

With CCC:

100 ; needed     3 cycles ->     3 cycles per operation
100 iv[0]+=iv[1];iv[1]+=iv[0]; needed   204 cycles ->   102 cycles per operation
100 iv[0]*=iv[1];iv[1]*=iv[0]; needed  1801 cycles ->   900 cycles per operation
100 fv[0]+=fv[1];fv[1]+=fv[0]; needed   746 cycles ->   373 cycles per operation
100 fv[0]*=fv[1];fv[1]*=fv[0]; needed   685 cycles ->   342 cycles per operation
100 iv[0]+=iv[1];iv[1]+=iv[2];iv[2]+=iv[3];iv[3]+=iv[4];iv[4]+=iv[5]; needed   259 cycles ->    51 cycles per operation
100 iv[0]*=iv[1];iv[1]*=iv[2];iv[2]*=iv[3];iv[3]*=iv[4];iv[4]*=iv[5]; needed  2070 cycles ->   414 cycles per operation
100 fv[0]+=fv[1];fv[1]+=fv[2];fv[2]+=fv[3];fv[3]+=fv[4];fv[4]+=fv[5]; needed    10 cycles ->     2 cycles per operation
100 fv[0]*=fv[1];fv[1]*=fv[2];fv[2]*=fv[3];fv[3]*=fv[4];fv[4]*=fv[5]; needed    10 cycles ->     2 cycles per operation

Those last numbers are bogus, the compiler optimized the entire loop
into a few stores.

Changing it a little, I get some more realistic figures:

100 iv[0]+=iv[1];iv[1]+=iv[2];iv[2]+=iv[3];iv[3]+=iv[4];iv[4]+=iv[0]; needed   262 cycles ->    52 cycles per operation
100 iv[0]*=iv[1];iv[1]*=iv[2];iv[2]*=iv[3];iv[3]*=iv[4];iv[4]*=iv[0]; needed  2138 cycles ->   427 cycles per operation
100 fv[0]+=fv[1];fv[1]+=fv[2];fv[2]+=fv[3];fv[3]+=fv[4];fv[4]+=fv[0]; needed   462 cycles ->    92 cycles per operation
100 fv[0]*=fv[1];fv[1]*=fv[2];fv[2]*=fv[3];fv[3]*=fv[4];fv[4]*=fv[0]; needed   446 cycles ->    89 cycles per operation

-- 
M?ns Rullg?rd
mru at inprovide.com





More information about the ffmpeg-devel mailing list