[Ffmpeg-devel] VP3/Theora Perfection

The Wanderer inverseparadox
Thu May 19 13:04:46 CEST 2005


Michael Niedermayer wrote:

> Hi
> 
> On Thursday 19 May 2005 04:47, Mike Melanson wrote:
> 
>> Hi, I have replaced unpack_token() with a series of lookup tables
>> in vp3.c. Now vp3data.h has more lines than vp3.c. Again, please
>> test as I do not have great testing facilities right now. However,
>> I did run a series of tests that validated a bunch of decoded
>> tokens against the old function.
>> 
>> Numbers for the speed freaks:
>> 
>> [original]
>> 1223 dezicycles in unpack_token, 32757 runs, 11 skips
>> 1202 dezicycles in unpack_token, 65512 runs, 24 skips
>> [new]
>> 845 dezicycles in unpack_token, 32735 runs, 33 skips
>> 841 dezicycles in unpack_token, 65466 runs, 70 skips
> 
> well, not here, after a cvs up unpack_dct_coeffs (which includes the
> unpack_token()) speed droped by 20%, to exclude possible effects of
> local changes i tried on a clean tree
> 
> [original]
> 47208165 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
> 46909636 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
> 47450793 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
> 
> [new]
> 43178650 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
> 42991589 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
> 43081780 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips

Am I reading these wrong? It looks to me like the original spends about
9.4% more time in unpack_dct_coeffs than the new version does (that's
(47208165/43178650) - 1 ~= .0933). I'm assuming that the "new" version
is the one which was just committed, i.e. the one which you are saying
is slower; if it takes fewer dezicycles, I'm not sure how that doesn't
mean it's faster instead. (If this assumption is invalid, I'd be
interested to know how it makes sense to label the different versions
that way...)

Similarly, with your different-cflags version:

> [original]
> 41514189 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
> 41710143 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
> 41758835 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
> 
> [new]
> 43992551 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
> 44276594 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
> 43972657 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips

Here it looks to me like the "original" version spends about 5.5% *less*
time in unpack_dct_coeffs than the "new" one does (that's 1 - (4151489 /
43992551) ~= .056), i.e., the "new" one is slower. Like the above, this
is exactly the reverse of what you're saying; is my brain just totally
screwed up here, or is something else going on?

-- 
       The Wanderer

Warning: Simply because I argue an issue does not mean I agree with any
side of it.

A government exists to serve its citizens, not to control them.





More information about the ffmpeg-devel mailing list