[FFmpeg-devel] [PATCH] Merge some computations in C code for VC-1 inverse transforms
Thu Jan 17 20:52:53 CET 2008
Michael Niedermayer a ?crit :
>> - adding the bias constant at the first butterflies hardly ever works
>> (unless there's register pressure from the looks of where it works best)
> reducing the number of operations only helps with compilers not compensating
> by adding nonsense ;)
Redoing benchmarks, I now notice that this constant adding never
improves speed. To make sure I switched back and forth several times on
the 8x8 and 8x4 functions.
For some reason, the previous test had a tiny bit of improvement but
that doesn't seem to come from it.
The previous benchmark:
>> 8x4: 4926 4267
got back to like 4500, while with the current patch, it's 4400...
Anyway, the improvement I was measuring yesterday was like 20 dezicycles...
> t3= 10*(src[ 8] + src);
> t4= 32*src - t3;
> t3+= 12*src[ 8];
> is faster?
> its 3 add, 2 mul, 1 shift vs. 2 add, 4 mul
Should have been at first glance, but this seems to cost 10-30
dezicycles more per loop
Again, maybe it could explained by checking the generated asm code, but
another CPU might see another result with the same code...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4928 bytes
Desc: not available
More information about the ffmpeg-devel