[FFmpeg-devel] [PATCH] RealAudio 14.4K encoder

Sat May 22 19:33:13 CEST 2010

On Sat, 2010-05-22 at 16:00 +0200, Michael Niedermayer wrote:
> On Sat, May 22, 2010 at 03:18:45PM +0200, Francesco Lavra wrote:
> > > > +    }
> > > > +
> > > > +    /**
> > > > +     * Calculate the zero-input response of the LPC filter and subtract it from
> > > > +     * input data.
> > > > +     */
> > > > +    memset(data, 0, sizeof(data));
> > > > +    ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, data, BLOCKSIZE,
> > > > +                                 LPC_ORDER);
> > > > +    for (i = 0; i < BLOCKSIZE; i++) {
> > > > +        zero[i] = work[LPC_ORDER + i];
> > > > +        data[i] = sblock_data[i] - zero[i];
> > > > +    }
> > > > +
> > > > +    /**
> > > > +     * Codebook search is performed without taking into account the contribution
> > > > +     * of the previous subblock, since it has been just subtracted from input
> > > > +     * data.
> > > > +     */
> > > > +    memset(work, 0, LPC_ORDER * sizeof(*work));
> > > > +
> > > > +    cba_idx = adaptive_cb_search(ractx->adapt_cb, work + LPC_ORDER, coefs,
> > > > +                                 data);
> > > > +    if (cba_idx) {
> > > > +        /**
> > > > +         * The filtered vector from the adaptive codebook can be retrieved from
> > > > +         * work, see implementation of adaptive_cb_search().
> > > > +         */
> > > > +        memcpy(cba, work + LPC_ORDER, sizeof(cba));
> > > > +
> > > > +        ff_copy_and_dup(cba_vect, ractx->adapt_cb, cba_idx + BLOCKSIZE / 2 - 1);
> > > > +        m[0] = (ff_irms(cba_vect) * rms) >> 12;
> > > > +    }
> > > > +    fixed_cb_search(work + LPC_ORDER, coefs, data, cba_idx, &cb1_idx, &cb2_idx);
> > > > +    for (i = 0; i < BLOCKSIZE; i++) {
> > > > +        cb1[i] = ff_cb1_vects[cb1_idx][i];
> > > > +        cb2[i] = ff_cb2_vects[cb2_idx][i];
> > > > +    }
> > > > +    ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, cb1, BLOCKSIZE,
> > > > +                                 LPC_ORDER);
> > > > +    memcpy(cb1, work + LPC_ORDER, sizeof(cb1));
> > > > +    m[1] = (ff_cb1_base[cb1_idx] * rms) >> 8;
> > > > +    ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, cb2, BLOCKSIZE,
> > > > +                                 LPC_ORDER);
> > > > +    memcpy(cb2, work + LPC_ORDER, sizeof(cb2));
> > > > +    m[2] = (ff_cb2_base[cb2_idx] * rms) >> 8;
> > > > +
> > > > +    /**
> > > > +     * Gain quantization is performed taking the NUM_BEST_GAINS best entries
> > > > +     * obtained from floating point data and calculating for each entry the
> > > > +     * actual encoding error with fixed point data.
> > > > +     */
> > > > +    for (i = 0; i < NUM_BEST_GAINS; i++) {
> > > > +        best_errors[i] = FLT_MAX;
> > > > +        indexes[i] = -1;
> > > > +    }
> > > > +    for (n = 0; n < 256; n++) {
> > > > +        g[1] = ((ff_gain_val_tab[n][1] * m[1]) >> ff_gain_exp_tab[n]) / 4096.0;
> > > > +        g[2] = ((ff_gain_val_tab[n][2] * m[2]) >> ff_gain_exp_tab[n]) / 4096.0;
> > > > +        error = 0;
> > > > +        if (cba_idx) {
> > > > +            g[0] = ((ff_gain_val_tab[n][0] * m[0]) >> ff_gain_exp_tab[n]) /
> > > > +                   4096.0;
> > > > +            for (i = 0; i < BLOCKSIZE; i++) {
> > > > +                data[i] = zero[i] + g[0] * cba[i] + g[1] * cb1[i] +
> > > > +                          g[2] * cb2[i];
> > > > +                error += (data[i] - sblock_data[i]) *
> > > > +                         (data[i] - sblock_data[i]);
> > > > +            }
> > > > +        } else {
> > > > +            for (i = 0; i < BLOCKSIZE; i++) {
> > > > +                data[i] = zero[i] + g[1] * cb1[i] + g[2] * cb2[i];
> > > > +                error += (data[i] - sblock_data[i]) *
> > > > +                         (data[i] - sblock_data[i]);
> > > > +            }
> > > > +        }
> > > 
> > > > +        for (i = 0; i < NUM_BEST_GAINS; i++)
> > > > +            if (error < best_errors[i]) {
> > > > +                best_errors[i] = error;
> > > > +                indexes[i] = n;
> > > > +                break;
> > > > +            }
> > > 
> > > this does not keep the 5 best
> > > it only gurantees to keep the 1 best
> > 
> > Why? Perhaps you missed the break statement?
> 
> if we feed the values 9,8,7,6,5,4,3,2,1 in then
> the list will just contain 1 afterwards

Ok, now fixed as follows (j is initialized to 0 outside the main loop):

if (error >= best_errors[j])
    continue;
best_errors[j] = error;
indexes[j] = n;
for (i = 0; i < NUM_BEST_GAINS; i++)
    if (best_errors[i] > best_errors[j])
        j = i;

> > > you are testing your changes in terms of PSNR, arent you?
> > > if not, we need to go back to the last patch and test each change
> > > individually.
> > > I  very much prefer naive and slow code compared to optimized but
> > > untested and thus buggy code. we alraedy have a vorbis and aac encoder
> > > </rant>
> > 
> > I did test each individual change by measuring the resulting average
> > encoding error. Now I have re-tested them with tiny_psnr. Here are the
> > results with 7 different samples.
> > 
> > Fixed point, without orthogonalization, with brute force gain
> > quantization
> > stddev:  849.73 PSNR: 37.74 bytes:   200320/   200334
> > stddev:  983.24 PSNR: 36.48 bytes:   144000/   144014
> > stddev:  835.19 PSNR: 37.89 bytes:   745280/   745294
> > stddev: 3737.95 PSNR: 24.88 bytes:  5370880/  5370880
> > stddev: 2605.75 PSNR: 28.01 bytes:   814400/   814400
> > stddev: 3634.44 PSNR: 25.12 bytes:   432640/   432640
> > stddev: 2853.26 PSNR: 27.22 bytes:  1741440/  1741440
> > 
> > Floating point, without orthogonalization, with gain quantization done
> > the fast way
> > stddev:  940.92 PSNR: 36.86 bytes:   200320/   200334
> > stddev: 1010.57 PSNR: 36.24 bytes:   144000/   144014
> > stddev:  904.31 PSNR: 37.20 bytes:   745280/   745294
> > stddev: 3753.33 PSNR: 24.84 bytes:  5370880/  5370880
> > stddev: 2612.23 PSNR: 27.99 bytes:   814400/   814400
> > stddev: 3638.47 PSNR: 25.11 bytes:   432640/   432640
> > stddev: 2855.30 PSNR: 27.22 bytes:  1741440/  1741440
> 
> you change 2 things relative to the previous test, this makes it
> hard to be certain which change causes the quality loss

Tested the intermediate step too, from the results below you can see
that quality loss is due to the fast gain quantization.

> > Floating point, with orthogonalization, with gain quantization done the
> > fast way
> > stddev:  818.14 PSNR: 38.07 bytes:   200320/   200334
> > stddev:  986.48 PSNR: 36.45 bytes:   144000/   144014
> > stddev:  811.68 PSNR: 38.14 bytes:   745280/   745294
> > stddev: 3762.86 PSNR: 24.82 bytes:  5370880/  5370880
> > stddev: 2635.10 PSNR: 27.91 bytes:   814400/   814400
> > stddev: 3647.02 PSNR: 25.09 bytes:   432640/   432640
> > stddev: 2862.79 PSNR: 27.19 bytes:  1741440/  1741440
> 
> some files loose quality by enabling orthogonalization, thats odd but
> possible.
> assuming there is no bug in the orthogonalization then you could try to
> run the quantization with both codebooks found with and without
> orthogonalization, this should always be better. And or avoid codebook
> choices that would need quantization factors that are far away from
> available values

The first 3 files are uncompressed recordings, while the last 4 files
are RealAudio decoded samples, so statistics for the latter probably are
not that meaningful.
If you are wondering why PSNR values are so low for the last 4 files
(ideally, they should approach infinity), the problem is that I couldn't
come up with an exact method of calculating the frame energy (assuming
one exists, because from the current decoder output I'm not sure we can
reconstruct the encoded stream exactly as it was), so having an energy
value different form what it ought to be influences negatively the
codebook searches.

Below are the latest results (after fixing the algorithm to find the 5
best entries):

Fixed point, without orthogonalization, with brute force gain
quantization
stddev:  849.73 PSNR: 37.74 bytes:   200320/   200334
stddev:  983.24 PSNR: 36.48 bytes:   144000/   144014
stddev:  835.19 PSNR: 37.89 bytes:   745280/   745294
stddev: 3737.95 PSNR: 24.88 bytes:  5370880/  5370880
stddev: 2605.75 PSNR: 28.01 bytes:   814400/   814400
stddev: 3634.44 PSNR: 25.12 bytes:   432640/   432640
stddev: 2853.26 PSNR: 27.22 bytes:  1741440/  1741440

Floating point, without orthogonalization, with brute force gain
quantization
stddev:  821.68 PSNR: 38.04 bytes:   200320/   200334
stddev:  979.00 PSNR: 36.51 bytes:   144000/   144014
stddev:  846.42 PSNR: 37.78 bytes:   745280/   745294
stddev: 3735.23 PSNR: 24.88 bytes:  5370880/  5370880
stddev: 2620.22 PSNR: 27.96 bytes:   814400/   814400
stddev: 3625.96 PSNR: 25.14 bytes:   432640/   432640
stddev: 2850.20 PSNR: 27.23 bytes:  1741440/  1741440

Floating point, without orthogonalization, with gain quantization done
the fast way
stddev:  940.92 PSNR: 36.86 bytes:   200320/   200334
stddev: 1010.57 PSNR: 36.24 bytes:   144000/   144014
stddev:  904.31 PSNR: 37.20 bytes:   745280/   745294
stddev: 3753.33 PSNR: 24.84 bytes:  5370880/  5370880
stddev: 2612.23 PSNR: 27.99 bytes:   814400/   814400
stddev: 3638.47 PSNR: 25.11 bytes:   432640/   432640
stddev: 2855.30 PSNR: 27.22 bytes:  1741440/  1741440

Floating point, without orthogonalization, with gain quantization done
taking into account the rounding error of the 5 best entries
stddev:  869.60 PSNR: 37.54 bytes:   200320/   200334
stddev:  992.83 PSNR: 36.39 bytes:   144000/   144014
stddev:  853.24 PSNR: 37.71 bytes:   745280/   745294
stddev: 3738.97 PSNR: 24.87 bytes:  5370880/  5370880
stddev: 2620.56 PSNR: 27.96 bytes:   814400/   814400
stddev: 3634.24 PSNR: 25.12 bytes:   432640/   432640
stddev: 2851.40 PSNR: 27.23 bytes:  1741440/  1741440

Floating point, with orthogonalization, with brute force gain
quantization
stddev:  768.34 PSNR: 38.62 bytes:   200320/   200334
stddev:  971.39 PSNR: 36.58 bytes:   144000/   144014
stddev:  778.60 PSNR: 38.50 bytes:   745280/   745294
stddev: 3753.48 PSNR: 24.84 bytes:  5370880/  5370880
stddev: 2622.78 PSNR: 27.95 bytes:   814400/   814400
stddev: 3645.04 PSNR: 25.10 bytes:   432640/   432640
stddev: 2861.43 PSNR: 27.20 bytes:  1741440/  1741440

Floating point, with orthogonalization, with gain quantization done the
fast way
stddev:  818.14 PSNR: 38.07 bytes:   200320/   200334
stddev:  986.48 PSNR: 36.45 bytes:   144000/   144014
stddev:  811.68 PSNR: 38.14 bytes:   745280/   745294
stddev: 3762.86 PSNR: 24.82 bytes:  5370880/  5370880
stddev: 2635.10 PSNR: 27.91 bytes:   814400/   814400
stddev: 3647.02 PSNR: 25.09 bytes:   432640/   432640
stddev: 2862.79 PSNR: 27.19 bytes:  1741440/  1741440

Floating point, with orthogonalization, with gain quantization done
taking into account the rounding error of the 5 best entries
stddev:  782.21 PSNR: 38.46 bytes:   200320/   200334
stddev:  975.64 PSNR: 36.54 bytes:   144000/   144014
stddev:  785.38 PSNR: 38.43 bytes:   745280/   745294
stddev: 3753.60 PSNR: 24.84 bytes:  5370880/  5370880
stddev: 2631.43 PSNR: 27.93 bytes:   814400/   814400
stddev: 3652.04 PSNR: 25.08 bytes:   432640/   432640
stddev: 2862.17 PSNR: 27.20 bytes:  1741440/  1741440

Disregarding the last 4 files, you can see that orthogonalization always
leads to better performance.
What do you suggest now?