[FFmpeg-devel] [PATCH][RFC] Lagarith Decoder.

Wed Aug 12 16:34:22 CEST 2009

On Mon, Aug 10, 2009 at 11:42:19PM -0600, Nathan Caldwell wrote:
> +    for (i = j = 0; i < 257; i++) {
> +        unsigned r = i << l->hash_shift;
> +        while (l->prob[j + 1] <= r)
> +            j++;

Lacks a range check on j?
Though probably you can just extend the prob array to have a "stop
value" of UINT_MAX.

> +        l->range_hash[i] = j;

??? Huh? i runs up to 256, but range_hash[256] is already beyond the
array?

> +        l->low |= 0xff & (AV_RB16(l->bytestream) >> 1);

Hm, might be worth checking if
l->low |= (uint8_t)(AV_RB16(l->bytestream) >> 1);
is faster on x86. Though I admit this is really something the
compiler should figure out by itself...

> +    if (low_scaled < l->prob[255]) {
> +        val = l->range_hash[low_scaled >> l->hash_shift];
> +        while (l->prob[val + 1] <= low_scaled)
> +            val++;
> +
> +        l->low -= range_scaled * l->prob[val];
> +        l->range = range_scaled * (l->prob[val + 1] - l->prob[val]);

I am not really convinced that none of these can read outside the
arrays...

> +    } else {
> +        val = 255;
> +        l->low -= range_scaled * l->prob[255];
> +        l->range -= range_scaled * l->prob[255];
> +    }

The l->low update is identical in both cases and could be moved out of
the if...

> +static void lag_memset(uint8_t * s, uint8_t c, size_t n, int step)
> +{
> +    if (n == 0)
> +        return;

Is that case so frequent that an extra check speeds things up?
If so, this should be documented.

> +    if (step == 1) {
> +        memset(s, c, n);
> +        return;
> +    }
> +
> +    for (int i = 0; i < n * step; i += step)

won't compile with gcc 2.95

> +static uint8_t *lag_memcpy(uint8_t * dest, const uint8_t * src, size_t n,
> +                    int step)
> +{
> +    if (n == 0)
> +        return dest;
> +    if (step == 1)
> +        return memcpy(dest, src, n);
> +
> +    for (int i = 0; i < n * step; i += step)
> +        dest[i] = src[i];

Same two comments as above, but in addition, does is adding the restrict
keyword to the pointers correct and does it improve speed?

> +    const static uint8_t series[] = { 1, 2, 3, 5, 8, 13, 21, 34 };

FFmpeg uses the order "static const" everywhere else.

> +    int bit = 0;
> +    int prevbit = 0;
> +    int bits = 0;
> +    int i = 0;
> +    int value = 1;
> +
> +    for (i = 0; !(prevbit && bit); i++) {
> +        prevbit = bit;
> +        bit = get_bits1(gb);
> +        if (bit && !prevbit)
> +            bits += series[i];
> +    }
> +    bits--;
> +    if (bits == 0)
> +        return 0;
> +
> +    value = get_bits_long(gb, bits);

You may end up calling get_bits_long with bits > 32 and < 0,
unless it is absolutely necessary to get good performance I'd
change the code to avoid that (preferably in a way that gives
reasonable error resilience).

> +    unsigned int cumul_prob = 0;

Hm, I think you should decide if you want to use "unsigned" or "unsigned
int" and stick with one.

> +    /* Scale probabilities so cumulative probability is an even power of 2. */
> +    cumulative_target = clp2(cumul_prob);
> +
> +    scale_factor = cumulative_target / (double) cumul_prob;
> +
> +    for (i = 1; i < 257; i++) {
> +        rac->prob[i] = (unsigned int) (rac->prob[i] * scale_factor);

I think a 32x32 -> 64 bit multiply + right shift is a more reliable way
to calculate this. floating-point IMO is usually misplaced in
anything that claims to be lossless (yes, you can do it and even have
it work cross-platform, but it's a real effort).

> +    i = 0;
> +    while (scaled_cumul_prob) {
> +        if (rac->prob[i + 1]) {
> +            rac->prob[i + 1]++;
> +            scaled_cumul_prob--;
> +        }
> +
> +        /* Comment from reference source:
> +         * if ( b&0x80==0 ){        // order of operations is 'wrong'; it has been left this way
> +         *                          // since the compression change is negligable and fixing it
> +         *                          // breaks backwards compatibilty
> +         *      b=-(signed int)b;
> +         *      b&=0xFF;
> +         * } else{
> +         */
> +
> +        i++;
> +        i &= 0x7f;
> +    }

Hm, something like
> for (i = 1; scaled_cumul_prob; i = (i & 0x7f) + 1)
might make more sense.
At least I'd put the i++ at the top of the loop and then
use rac->prob[i] instead of rac->prob[i + 1]

> +    for (i = 0; cumulative_target; i++)
> +        cumulative_target >>= 1;

i = cumulative_target ? av_log2(cumulative_target) + 1 : 0;