[FFmpeg-devel] [PATCHv2] lavc/cbrt_tablegen: speed up tablegen

Ganesh Ajjanagadde gajjanag at mit.edu
Tue Jan 5 19:46:52 CET 2016


On Tue, Jan 5, 2016 at 10:10 AM, Daniel Serpell <dserpell at gmail.com> wrote:
> Hi!,
>
> El Tue, Jan 05, 2016 at 08:08:35AM -0800, Ganesh Ajjanagadde escribio:
>> On Tue, Jan 5, 2016 at 7:44 AM, Daniel Serpell <dserpell at gmail.com> wrote:
>> > Hi!,
>> >
>> > El Mon, Jan 04, 2016 at 06:33:59PM -0800, Ganesh Ajjanagadde escribio:
>> >> This exploits an approach based on the sieve of Eratosthenes, a popular
>> >> method for generating prime numbers.
>> >>
>> >> Tables are identical to previous ones.
>> >>
>> >> Tested with FATE with/without --enable-hardcoded-tables.
>> >>
>> >> Sample benchmark (Haswell, GNU/Linux+gcc):
>> >> prev:
>> >> 7860100 decicycles in cbrt_tableinit,       1 runs,      0 skips
>> >> 7777490 decicycles in cbrt_tableinit,       2 runs,      0 skips
>> >> [...]
>> >> 7582339 decicycles in cbrt_tableinit,     256 runs,      0 skips
>> >> 7563556 decicycles in cbrt_tableinit,     512 runs,      0 skips
>> >>
>> >> new:
>> >> 2099480 decicycles in cbrt_tableinit,       1 runs,      0 skips
>> >> 2044470 decicycles in cbrt_tableinit,       2 runs,      0 skips
>> >> [...]
>> >> 1796544 decicycles in cbrt_tableinit,     256 runs,      0 skips
>> >> 1791631 decicycles in cbrt_tableinit,     512 runs,      0 skips
>> >>
>> >
>> > See attached code, function "test1", based on an approximation of:
>> >
>> >   (i+1)^(1/3) ~= i^(1/3) * ( 1 + 1/(3i) - 1/(9i) + 5/(81i) - .... )
>>
>> I assume 1/(3i), 1/(9i^2), etc obtained via a Taylor series at x = 0.
>>
>
> Yes, more specifically (in wxmaxima):
>
> (%i1) at( taylor((i+x)^(1/3)/i^(1/3), x, 0, 6) , x=1 );
>
>                1     1       5       10       22       154
> (%o1)     1 + --- - ---- + ----- - ------ + ------ - -------
>               3 i      2       3        4        5         6
>                     9 i    81 i    243 i    729 i    6561 i
>
>
> I noticed that if you do a change of variable, the series simplifies:
>
> (%i2) at( taylor((i+x)^(1/3)/i^(1/3), x, 0, 6) , [ x=1, i = (1/3)/j ] );
>
>                         6        5       4      3
>                    154 j     22 j    10 j    5 j     2
> (%o2)           (- ------) + ----- - ----- + ---- - j  + j + 1
>                      9         3       3      3
>
>> >
>> > Generated values are the same as original floats (max error in double
>> > is < 4*10^-10), it is faster (and I think, simpler) than your version.
>>
>> Had thought of these ideas, but did not examine as I was a little
>> concerned about accuracy. Thanks, will give it a spin. Or
>> alternatively, you can submit a patch since you put it into action.
>>
>
> Best if you make the patch, as you can test the speed in your same setup.

Ok, I will still make you the author though.

>
>> Alternatively, one could directly expand the series for (i+1)^(4/3).
>
> Yes, but the first two coefficients are not 1 anymore, so it needs one
> more multiplication, canceling the advantage:
>
> (%i3) at( taylor((i+x)^(4/3)/i^(4/3), x, 0, 6) , [ x=1, i = (4/3)/j ] );
>
>                          6    5       4    3    2
>                      11 j    j     5 j    j    j
> (%o3)                ----- - --- + ---- - -- + -- + j + 1
>                      9216    384   768    48   8

Ok, thanks.

>
>
>> And it may be possible to tighten the number of terms needed by
>> expanding not about x = 0, but x = i to get i+1. Or fancier polynomial
>> approximations can be used. Have you tried these?
>
> I think that this is what I'm already doing. As I said, the slowest part
> is the first division ( r = (1.0/3.0) / i ), and I can't think of any way
> to avoid it.

I meant Chebyshev approximation obtained e.g via Remez; it may allow
shaving the degree by one, but I can't think of a simple way to get
rid of the divide. Also, coefficients will likely not be 1 anymore for
leading terms.

>
> I altered the last constants to reduce the error, but stopped trying after
> getting better than 10^-10 absolute error, that is higher than the
> precision of a float.

This is the main thing I slightly did not like: the perturbations are
not documented and they were done in an ad-hoc way. Overall though,
good work! Will do some more testing and post a patch tonight.

>
>     Daniel.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


More information about the ffmpeg-devel mailing list