[FFmpeg-devel] [PATCH] Chinese AVS encoder

Thu Jul 26 07:34:26 CEST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Niedermayer wrote:
> Hi
> 
> On Wed, Jul 25, 2007 at 07:50:17AM +0200, Stefan Gehrer wrote:
> [...]
>>> [...]
>>>> +/**
>>>> + * eliminate residual blocks that only have insignificant coefficients,
>>>> + * inspired from x264 and JVT-B118
>>>> + */
>>>> +static inline int decimate_block(uint8_t *run, DCTELEM *level, int count) {
>>>> +    static const uint8_t run_score[30] = {
>>>> +        0,3,3,3,3,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0 };
>>>> +    int i;
>>>> +    int score = 0;
>>>> +
>>>> +    if(count>4)
>>>> +        return 9;
>>>> +    for(i=0;i<count;i++) {
>>>> +        int abslevel = FFABS(level[i]);
>>>> +        if(abslevel > 1)
>>>> +            return 9;
>>>> +        score += run_score[FFMIN(run[i],29)];
>>>> +    }
>>>> +    return score;
>>>> +}
>>> how much psnr/bitrate is gained by this? if none then please drop it
>> I tested on the popular foreman (300 CIF frames). When encoding at
>> around 800kbps, roughly 0.1dB are gained by this. When encoding at
>> around 200kbps, this increases to a gain of around 0.15dB.
>> So I would like to keep it.
> 
> ok, but i think there is something wrong elsewhere if there is such a
> large gain from this ... 
> maybe a simple change to the bias would help? or maybe there is something
> else wrong with the quantization code ...
> 

why? It seems clear to me that coding a macroblock as skipped is so much
cheaper than coding type/cbp/mv/coefficients that it might be worth
throwing some real information away. By real I mean coefficients that
come out of a fully correct quantization code, though I am not claiming
my code to be that.

> 
> [...]
>>>> +    for(i=0;i<15;i++)
>>>> +        if((ff_frame_rate_tab[i].den == frame_rate.den) &&
>>>> +           (ff_frame_rate_tab[i].num == frame_rate.num))
>>>> +            frame_rate_code = i;
>>>> +    if(frame_rate_code < 0) {
>>>> +        av_log(h->s.avctx, AV_LOG_ERROR, "unsupported framerate %d/%d\n",
>>>> +               frame_rate.num, frame_rate.den);
>>>> +        return -1;
>>>> +    }
>>>
>>> [...]
>>>> +        put_bits(&s->pb,16,0);
>>>> +        put_bits(&s->pb,16,CAVS_START_CODE);
>>> add a put_bits_long() to bitstrea.c, document the issue with 32bits and
>>> fix vorbis_enc.c :)
>> I looked a bit into this and I think a better way would be to fix
>> put_bits() to support up to 32 bits instead of up to 31 bits.
>> I am not so sure, but after a bit of checking in vorbis_enc.c it
>> seems there is never any writing with more than 32 bits even
>> though the local put_bits() function takes a 64bit argument.
> 
> argh, ive not noticed that vorbis is using a private put_bits
> iam starting to hate vorbis_enc.c ...
> 
> 
>> Attached a new encoder patch 
> 
>> and a proposed fix for put_bits().
> 
> the fix to put_bits is rejected due to a likely huge slowdown
> 

It is in the else branch and only executed every 32 bits of bitstream
writing, so I don't think the slowdown will be "huge". But if it's
not acceptable I will think about adding a put_bits_long again.

> 
> [...]
>>  /*****************************************************************************
>>   *
>> + * quantization
>> + *
>> + ****************************************************************************/
>> +
>> +static void cavs_quant_c(DCTELEM *block, const uint16_t *norm, int mul,
>> +                         int bias) {
>> +    int i;
>> +
>> +    for(i=0;i<64;i++) {
>> +        if(block[i] > 0)
>> +            block[i] =   ((((1<<18) + block[i]*norm[i])>>19)*mul+bias)>>15;
>> +        else
>> +            block[i] = -(((((1<<18) - block[i]*norm[i])>>19)*mul+bias)>>15);
>> +    }
> 
> this is a duplicate of dct_quantize*
> its just much slower and smaller
> 
> 

Looks quite different to me, but I will see if I can reuse it somehow.

> [...]
>> +        int level = abs(level_buf[coeff_num])-1;
>> +        int sign = (level_buf[coeff_num]>>31)&1;
> 
> MASK_ABS()
> 
> 
> [...]
> 
>> +    h->lambda = h->qp/3;
> 
> if i understand it correctly then qp is a logarithmic scale, if so this is
> likely completely wrong
> 
> h->lambda = C*quant_mul[h->qp];
> might be more correct
> 
[...]
> 
> with SSE its likely lambda*lambda with the above definition of lambda
> 

Yes, my lambda has no mathematical background. With many of the
parameters I just hand-tuned them and found that a lambda close
to zero seemed to give the best results. Similarly the quantization
biases and the threshold for the intra/inter decision are just
derived from trial and error.

> 
> [...]
>> +    h->dist[0] = (h->picture.poc - h->DPB[0].poc  + 512) % 512;
>> +    h->dist[1] = (h->picture.poc - h->DPB[1].poc  + 512) % 512;
> 
> &511 ?
> 

It is not speed critical, and then I find the modulo more readable.

> 
> [...]
>> +    memcpy(&h->DPB[1], &h->DPB[0], sizeof(Picture));
> 
> h->DPB[1]= h->DPB[0];
> 

Right, will change.

> 
> [...]
>> +        h->levels[0] = av_mallocz(6*64*sizeof(DCTELEM));
>> +        h->runs[0]   = av_mallocz(6*64);
>> +        for(i=1;i<6;i++) {
>> +            h->levels[i] = h->levels[i-1] + 64;
>> +            h->runs[i]   = h->runs[i-1]   + 64;
>> +        }
> 
> why is this allocated with malloc instead of being part of the context?

It keeps the context smaller when only the decoder is running.

Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGqDJhYpYOlpT3vNMRAhmtAJ4rHB9KbjPBesfl0bws0XBwAAPyrwCeJRcO
onkNRnkGiZctJtsmVi3pZME=
=VVSC
-----END PGP SIGNATURE-----