[FFmpeg-devel] [PATCH] 1D DCT for dsputil

Vitor Sessak vitor1001
Wed Jan 20 01:40:17 CET 2010


Michael Niedermayer wrote:
> On Mon, Jan 18, 2010 at 11:49:03PM -0500, Vitor Sessak wrote:
>> Vitor Sessak wrote:
>>> Loren Merritt wrote:
>>>> On Mon, 18 Jan 2010, Vitor Sessak wrote:
>>>>
>>>>> + data[i    ] =   COS(s,n,i) * val1 + SIN(s,n,i) * val2;
>>>>> + data[i + 1] =   SIN(s,n,i) * val1 - COS(s,n,i) * val2;
>>>> data aliases costab, so the SIN/COS loads will be duplicated.
>>> Done.
>>>>> + float tmp1 = data[i        ] * (1./n);
>>>>> + float tmp2 = data[n - i - 1] * (1./n);
>>>>> + float sin1 = 0.5/SIN(s,n,2*i+1);
>>>> division?
>>> I don't see how it is avoidable, I've tried a LUT and it is slower.
>> I made a stupid mistake that was getting the benchmarks wrong. Actually a 
>> LUT is faster. New patch attached.
>>
> [...]
>> +static void ff_dct_calc_c(DCTContext *ctx, FFTSample *data)
>> +{
>> +    int n = 1 << ctx->nbits;
>> +    int i;
>> +
>> +    if (ctx->inverse) {
>> +        float next = data[n - 1];
>> +
>> +        for (i = n - 2; i >= 2; i -= 2) {
>> +            float val1 = data[i    ];
>> +            float val2 = data[i - 1] - data[i + 1];
>> +            float c = COS(ctx, n, i);
>> +            float s = SIN(ctx, n, i);
>> +
>> +            data[i    ] = c * val1 + s * val2;
>> +            data[i + 1] = s * val1 - c * val2;
>> +        }
>> +
>> +        data[1] = 2 * next;
>> +
>> +        ff_rdft_calc(&ctx->rdft, data);
>> +
>> +        for (i = 0; i < n / 2; i++) {
>> +            float tmp1 = data[i        ] * (1. / n);
>> +            float tmp2 = data[n - i - 1] * (1. / n);
> 
> float f= (1. / n); prior to the loop would make sure the compiler
> does nothing silly
> 
> 
>> +            float csc = ctx->csc2[i];
>> +
>> +            data[i        ] = tmp1 + tmp2 + csc * (tmp1 - tmp2);
>> +            data[n - i - 1] = tmp1 + tmp2 - csc * (tmp1 - tmp2);
> 
> do wetrust the comiler that much?
> 
> float csc = ctx->csc2[i] * (tmp1 - tmp2);
> tmp1+=tmp2;
> data[i        ] = tmp1 + csc;
> data[n - i - 1] = tmp1 - csc;
> 
> (if both are the same speed pick what you prefer)
> also the patch is pretty much ok, commit if there are no other comments

Crazy stuff, both changes made a speed difference! Compilers are indeed 
not to be trusted.

Committed.

-Vitor



More information about the ffmpeg-devel mailing list