[FFmpeg-devel] [PATCH] QCELP decoder

Kenan Gillet kenan.gillet
Thu Oct 30 23:18:45 CET 2008


On Oct 30, 2008, at 6:16 AM, Michael Niedermayer wrote:

> On Tue, Oct 28, 2008 at 04:47:02PM -0700, Kenan Gillet wrote:
>> On Tue, Oct 28, 2008 at 4:43 PM, Kenan Gillet  
>> <kenan.gillet at gmail.com> wrote:
>>> thanks Diego and Michael for your reviewing.
>>>
>>> And here is an updated set of patches with a few changes too.
>>>
>>> round 7 summary:
>>> - qcelp-round7-doc-glue.patch.txt has already been oked
>>> - grammar / spelling / cosmetics
>>> - generalize convolve function used in qcelp_lspf2lpc
>>> - remove some multiplication and division in compute_svector
>>> - frame unpacking rework
>>> - most of the previous comments
>>>
>>
>> forgot one patch in previous mail :(
> [..]

[...]
>> +static void interpolate_lspf(float *interpolated_lspf,
>> +                             const float curr_weight,
>> +                             const float *curr_lspf,
>> +                             const float *prev_lspf) {
>> +    int   i;
>> +
>> +    for (i = 0; i < 10; i++)
>> +        interpolated_lspf[i] = (1.0 - curr_weight) * prev_lspf[i]
>> +                             +        curr_weight  * curr_lspf[i];
>> +}
>
> iam ok with the code but the names have to be changed
> both fixed point (ff_acelp_weighted_vector_sum) and this should have
> similar names
> now imho *celp and lsp dont belong in the names as the code is not  
> specific
> for that.

what about makin and using  a more generic
ff_weighted_vector_sumf(
         float* out,
         const float *in_a,
         const float *in_b,
         float weight_coeff_a,
         float weight_coeff_b,
         int length);

or is just renaming it weighted_vector_sum and keeping it static better?


> [...]
>> +/**
>> + * Computes the scaled codebook vector Cdn From INDEX and GAIN
>> + * for all rates.
>> + *
>> + * The specification lacks some information here.
>> + *
>> + * TIA/EIA/IS-733 has an omission on the codebook index  
>> determination
>> + * formula for RATE_FULL and RATE_HALF frames at section  
>> 2.4.8.1.1. It says
>> + * you have to subtract the decoded index parameter from the given  
>> scaled
>> + * codebook vector index 'n' to get the desired circular codebook  
>> index, but
>> + * it does not mention that you have to clamp 'n' to [0-9] in  
>> order to get
>> + * RI-compliant results.
>> + *
>> + * The reason for this mistake seems to be the fact they forgot to  
>> mention you
>> + * have to do these calculations per codebook subframe and adjust  
>> given
>> + * equation values accordingly.
>> + *
>> + * @param q the context
>> + * @param gain array holding the 4 pitch subframe gain values
>> + * other than RATE_FULL or RATE_HALF
>> + * @param cdn_vector array for the generated scaled codebook vector
>> + */
>> +static void compute_svector(const QCELPContext *q,
>> +                            float *gain,
>> +                            float *cdn_vector) {
>> +    int      i, j, k;
>> +    uint16_t cbseed, cindex;
>> +    float    rnd[160];
>> +
>> +    switch (q->framerate) {
>> +    case RATE_FULL:
>> +        for (i = 0; i < 16; i++) {
>> +            gain[i] *= QCELP_RATE_FULL_CODEBOOK_RATIO;
>> +            cindex = -q->cindex[i];
>> +            for (j = 0; j < 10; j++)
>> +                *cdn_vector++ = gain[i] *  
>> qcelp_rate_full_codebook[cindex++ & 127];
>
> the & 127 can be avoided by making the qcelp_rate_full_codebook  
> table 10
> entries larger. (of course only if this matters in terms of overall  
> decoder
> speed)

5 run, lowest and largest discarded on Core Duo 2

with [cindex++ & 127]
215160 dezicycles in compute_vector_fullrate, 1 runs, 0 skips
114840 dezicycles in compute_vector_fullrate, 2 runs, 0 skips
64590 dezicycles in compute_vector_fullrate, 4 runs, 0 skips
39405 dezicycles in compute_vector_fullrate, 8 runs, 0 skips
26805 dezicycles in compute_vector_fullrate, 16 runs, 0 skips
20467 dezicycles in compute_vector_fullrate, 32 runs, 0 skips
17313 dezicycles in compute_vector_fullrate, 64 runs, 0 skips
15821 dezicycles in compute_vector_fullrate, 128 runs, 0 skips
14986 dezicycles in compute_vector_fullrate, 256 runs, 0 skips
14592 dezicycles in compute_vector_fullrate, 511 runs, 1 skips

215400 dezicycles in compute_vector_fullrate, 1 runs, 0 skips
115140 dezicycles in compute_vector_fullrate, 2 runs, 0 skips
64740 dezicycles in compute_vector_fullrate, 4 runs, 0 skips
39510 dezicycles in compute_vector_fullrate, 8 runs, 0 skips
26850 dezicycles in compute_vector_fullrate, 16 runs, 0 skips
20512 dezicycles in compute_vector_fullrate, 32 runs, 0 skips
17332 dezicycles in compute_vector_fullrate, 64 runs, 0 skips
15760 dezicycles in compute_vector_fullrate, 128 runs, 0 skips
14983 dezicycles in compute_vector_fullrate, 256 runs, 0 skips
14590 dezicycles in compute_vector_fullrate, 512 runs, 0 skips

220560 dezicycles in compute_vector_fullrate, 1 runs, 0 skips
117720 dezicycles in compute_vector_fullrate, 2 runs, 0 skips
66060 dezicycles in compute_vector_fullrate, 4 runs, 0 skips
40170 dezicycles in compute_vector_fullrate, 8 runs, 0 skips
27217 dezicycles in compute_vector_fullrate, 16 runs, 0 skips
20688 dezicycles in compute_vector_fullrate, 32 runs, 0 skips
17418 dezicycles in compute_vector_fullrate, 64 runs, 0 skips
15780 dezicycles in compute_vector_fullrate, 128 runs, 0 skips
14982 dezicycles in compute_vector_fullrate, 256 runs, 0 skips
14593 dezicycles in compute_vector_fullrate, 512 runs, 0 skips

with bigger table:
240600 dezicycles in compute_vector_fullrate, 1 runs, 0 skips
127560 dezicycles in compute_vector_fullrate, 2 runs, 0 skips
70890 dezicycles in compute_vector_fullrate, 4 runs, 0 skips
42510 dezicycles in compute_vector_fullrate, 8 runs, 0 skips
28327 dezicycles in compute_vector_fullrate, 16 runs, 0 skips
21221 dezicycles in compute_vector_fullrate, 32 runs, 0 skips
17696 dezicycles in compute_vector_fullrate, 64 runs, 0 skips
15924 dezicycles in compute_vector_fullrate, 128 runs, 0 skips
15043 dezicycles in compute_vector_fullrate, 256 runs, 0 skips
14616 dezicycles in compute_vector_fullrate, 512 runs, 0 skips

248880 dezicycles in compute_vector_fullrate, 1 runs, 0 skips
131820 dezicycles in compute_vector_fullrate, 2 runs, 0 skips
73050 dezicycles in compute_vector_fullrate, 4 runs, 0 skips
43665 dezicycles in compute_vector_fullrate, 8 runs, 0 skips
28942 dezicycles in compute_vector_fullrate, 16 runs, 0 skips
21551 dezicycles in compute_vector_fullrate, 32 runs, 0 skips
17850 dezicycles in compute_vector_fullrate, 64 runs, 0 skips
16075 dezicycles in compute_vector_fullrate, 128 runs, 0 skips
15111 dezicycles in compute_vector_fullrate, 256 runs, 0 skips
14790 dezicycles in compute_vector_fullrate, 512 runs, 0 skips

344040 dezicycles in compute_vector_fullrate, 1 runs, 0 skips
179340 dezicycles in compute_vector_fullrate, 2 runs, 0 skips
96810 dezicycles in compute_vector_fullrate, 4 runs, 0 skips
55485 dezicycles in compute_vector_fullrate, 8 runs, 0 skips
34987 dezicycles in compute_vector_fullrate, 16 runs, 0 skips
24570 dezicycles in compute_vector_fullrate, 32 runs, 0 skips
19355 dezicycles in compute_vector_fullrate, 64 runs, 0 skips
16769 dezicycles in compute_vector_fullrate, 128 runs, 0 skips
15474 dezicycles in compute_vector_fullrate, 256 runs, 0 skips
14831 dezicycles in compute_vector_fullrate, 512 runs, 0 skips

and I am not sure if bigger is better or lower is faster ;)

>
>
>
>> +        }
>> +        break;
>> +    case RATE_HALF:
>> +        for (i = 0; i < 4; i++) {
>> +            gain[i] *= QCELP_RATE_HALF_CODEBOOK_RATIO;
>> +            cindex = -q->cindex[i];
>> +            for (j = 0; j < 40; j++)
>> +                *cdn_vector++ = gain[i] *  
>> qcelp_rate_half_codebook[cindex++ & 127];
>
> same, but 40 entries

will benchmark after your feedback from above.


>
>> +        }
>> +        break;
>> +    case RATE_QUARTER:
>> +        cbseed = (0x0003 & q->lspv[4])<<14 |
>> +                 (0x003F & q->lspv[3])<< 8 |
>> +                 (0x0060 & q->lspv[2])<< 1 |
>> +                 (0x0007 & q->lspv[1])<< 3 |
>> +                 (0x0038 & q->lspv[0])>> 3 ;
>> +        cindex = 0;
>
>> +        for (i = 0; i < 8; i++) {
>> +            gain[i] *= QCELP_SQRT1887 / 32768.0;
>> +            for (k = 0; k < 20; k++) {
>> +                cbseed = 521 * cbseed + 259;
>> +                rnd[cindex] = (int16_t)cbseed;
>> +
>> +                // FIR filter
>> +                *cdn_vector = qcelp_rnd_fir_coefs[1] * rnd[cindex];
>> +                for (j = 1; j < 22 && !(cindex-j+1); j++)
>> +                    *cdn_vector += qcelp_rnd_fir_coefs[j] *  
>> rnd[cindex-j];
>
> the !(cindex-j+1) check can be removed when rnd is made larger,  
> which imho
> is a good idea anyway even if theres no meassureable speed gain.
>
> also isnt this doing the same as convolve() ?

I am double checking with the reference code, and the behavior is  
incorrect right now.
I'll get back to you on this one.


> [...]
>> +/**
>> + * Apply generic gain control.
>> + *
>> + * @param v_out output vector
>> + * @param v_in vector to control gain of
>> + * @param v_gain gain-controlled vector
>> + *
>> + * TIA/EIA/IS-733 2.4.8.3-2/3/4/5, 2.4.8.6
>> + */
>> +static void apply_gain_ctrl(float *v_out,
>> +                            const float *v_in,
>> +                            const float *v_gain) {
>> +    int   i, j, len;
>> +    float scalefactor;
>> +
>> +    for (i = 0, j = 0; i < 4; i++) {
>> +        scalefactor = ff_dot_productf(v_gain + j, v_gain + j, 40);
>> +        if (scalefactor) {
>> +            scalefactor = sqrt(ff_dot_productf(v_in + j, v_in + j,  
>> 40) / scalefactor);
>> +            for (len = j + 40; j < len; j++)
>> +                v_out[j] = scalefactor * v_gain[j];
>> +        }
>> +    }
>> +}
>
> is it intdended that j is not increased when scalefactor == 0 ?

nope :(
fixed

>> +
>> +/**
>> + * Apply pitch synthesis filter and pitch prefilter to the scaled  
>> codebook vector.
>> + * TIA/EIA/IS-733 2.4.5.2
>> + *
>> + * @param q the context
>> + * @param cdn_vector the scaled codebook vector
>> + *
>> + * @return 0 if everything goes well, -1 if frame should be erased
>> + */
>> +static int apply_pitch_filters(QCELPContext *q,
>> +                               float *cdn_vector) {
>> +    int         i;
>> +    float       gain[4];
>> +    const float *v_synthesis_filtered, *v_pre_filtered;
>> +
>> +    if (q->framerate >= RATE_HALF ||
>> +       (q->framerate == I_F_Q && (q->prev_framerate >=  
>> RATE_HALF))) {
>> +
>> +        if (q->framerate >= RATE_HALF) {
>> +
>> +            // Compute gain & lag for the whole frame.
>> +            for (i = 0; i < 4; i++) {
>> +                gain[i] = q->plag[i] ? (q->pgain[i] + 1) / 4.0 :  
>> 0.0;
>> +
>> +                q->plag[i] += 16;
>> +
>> +                if (q->pfrac[i] && q->plag[i] >= 140)
>> +                    return -1;
>> +            }
>> +            memcpy(q->prev_pitch_lag, q->plag, 4 * sizeof(*q- 
>> >plag));
>> +        } else {
>> +            gain[3] = q->erasure_count < 3 ? 0.9 - 0.3 * (q- 
>> >erasure_count - 1)
>> +                                           : 0.0;
>> +            for (i = 0; i < 4; i++)
>> +                gain[i] = FFMIN(q->prev_pitch_gain[i], gain[3]);
>> +
>> +            memset(q->pfrac, 0, 4 *sizeof(*q->pfrac));
>> +            memcpy(q->plag, q->prev_pitch_lag, 4 * sizeof(*q- 
>> >plag));
>> +        }
>> +
>> +        // pitch synthesis filter
>> +        v_synthesis_filtered = do_pitchfilter(q- 
>> >pitch_synthesis_filter_mem, cdn_vector,
>> +                                              gain, q->plag, q- 
>> >pfrac);
>> +
>> +        // pitch prefilter update
>> +        for (i = 0; i < 4; i++)
>> +            gain[i] = 0.5 * FFMIN(gain[i], 1.0);
>> +
>> +        v_pre_filtered = do_pitchfilter(q->pitch_pre_filter_mem,  
>> v_synthesis_filtered,
>> +                                        gain, q->plag, q->pfrac);
>> +
>> +        apply_gain_ctrl(cdn_vector, v_synthesis_filtered,  
>> v_pre_filtered);
>> +
>> +        memcpy(q->prev_pitch_gain, gain, sizeof(q- 
>> >prev_pitch_gain));
>> +
>> +    } else {
>> +        memcpy(q->pitch_synthesis_filter_mem, cdn_vector + 17, 143  
>> * sizeof(float));
>> +        memcpy(q->pitch_pre_filter_mem,       cdn_vector + 17, 143  
>> * sizeof(float));
>> +        memset(q->prev_pitch_gain, 0, sizeof(q->prev_pitch_gain));
>> +        memset(q->prev_pitch_lag,  0, 4 * sizeof(*q->plag));
>> +    }
>
>> +    q->prev_framerate = q->framerate;
>
> Is setting this here correct? i mean its possible that this function  
> is
> called twice ...

moved to the bottom of qcelp_decode_frame


>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * Interpolates LSP frequencies and computes LPC coefficients
>> + * for a given framerate & pitch subframe.
>> + *
>> + * TIA/EIA/IS-733 2.4.3.3.4
>> + *
>> + * @param q the context
>> + * @param curr_lspf LSP frequencies vector of the current frame
>> + * @param lpc float vector for the resulting LPC
>> + * @param subframe_num frame number in decoded stream
>> + */
>> +void interpolate_lpc(QCELPContext *q,
>> +                     const float *curr_lspf,
>> +                     float *lpc,
>> +                     const int subframe_num) {
>> +    float interpolated_lspf[10];
>> +
>> +    if (q->framerate >= RATE_QUARTER) {
>> +        interpolate_lspf(interpolated_lspf, 0.25 * (subframe_num +  
>> 1),
>> +                         curr_lspf, q->prev_lspf);
>> +        lspf2lpc(q, interpolated_lspf, lpc);
>> +    } else if (q->framerate == RATE_OCTAVE) {
>> +        if (!subframe_num) {
>> +            interpolate_lspf(interpolated_lspf, 0.625, curr_lspf,  
>> q->prev_lspf);
>> +            lspf2lpc(q, interpolated_lspf, lpc);
>> +        }
>> +    } else {
>> +        assert(q->framerate == I_F_Q);
>> +
>> +        if (!subframe_num)
>> +            lspf2lpc(q, curr_lspf, lpc);
>> +    }
>
> some code like
> if/else/switch/whatever weight=
>
> if(weight==1.0){
>    lspf2lpc(q, curr_lspf, lpc);
> }else if(weight){
>    interpolate_lspf(interpolated_lspf, weight, curr_lspf, q- 
> >prev_lspf);
>    lspf2lpc(q, interpolated_lspf, lpc);
> }
>
> might be cleaner, it also would skip the interpolate_lspf for >=  
> RATE_QUARTER
> in the last subframe

done



>> +}
>> +
>> +/*
>> + * Determine the framerate from the frame size and/or the first  
>> byte of the frame.
>> + *
>> + * @return 0 on success, negative error number otherwise.
>> + */
>> +static int determine_framerate(AVCodecContext *avctx,
>> +                               QCELPContext *q,
>> +                               const int buf_size,
>> +                               uint8_t **buf) {
>> +    int claimed_rate = -1;
>> +
>> +    switch (buf_size) {
>> +    case 35: claimed_rate = RATE_FULL;
>> +    case 34: q->framerate = RATE_FULL;
>> +             break;
>> +
>> +    case 17: claimed_rate = RATE_HALF;
>> +    case 16: q->framerate = RATE_HALF;
>> +             break;
>> +
>> +    case  8: claimed_rate = RATE_QUARTER;
>> +    case  7: q->framerate = RATE_QUARTER;
>> +             break;
>> +
>> +    case  4: claimed_rate = RATE_OCTAVE;
>> +    case  3: q->framerate = RATE_OCTAVE;
>> +             break;
>> +
>> +    case  1: claimed_rate = SILENCE;
>> +    case  0: q->framerate = SILENCE;
>> +             break;
>> +
>> +    default:
>> +        return -1;
>> +    }
>> +
>> +    if (claimed_rate >= 0) {
>> +        q->framerate = *(*buf)++;
>> +
>> +        if (claimed_rate != q->framerate)
>> +            av_log(avctx, AV_LOG_WARNING, "Claimed framerate and  
>> buffer size mismatch.\n");
>> +    } else
>> +        av_log(avctx, AV_LOG_WARNING,
>> +               "Framerate byte is missing, guessing the framerate  
>> from packet size.\n");
>> +
>
> putting a switch for 34,16,7,3,0 in a seperate function the  
> following could
> be used:
>
> q->framerate= foo(buf_size);
> if(q->framerate < 0){
>    q->framerate = *(*buf)++;
>
>    if (foo(buf_size+1) != q->framerate)
>        av_log(avctx, AV_LOG_WARNING, "Claimed framerate and buffer  
> size mismatch.\n");
> }else{
>
> ...
>

done, use 35, 17,8, 4 and 1 because it is much more probable that the  
buffer contains a framerate byte.
Actually, I have never seen any files where the framerate byte is  
missing; I just kept this check because
it was in the SOC code.
Should I remove it ? or keep it ?


[...]


>> +
>> +/**
>> + * Decodes the scaled codebook vector Cdn.
>> + *
>> + * @param q the context
>> + * @param cdn_vector array for the generated scaled codebook vector
>> + * @return 0  on success, -1 if the frame must be erased
>> + */
>> +static int decode_scaled_codebook_vector(QCELPContext *q,
>> +                                         float cdn_vector[160]) {
>> +    float gain[16];
>> +
>> +    if (decode_gain_and_index(q, gain) < 0)
>> +        return -1;
>> +    compute_svector(q, gain, cdn_vector);
>> +    return 0;
>> +}
>
> i was wondering if this wraper function that is used 2x around 2  
> functions
> is worth it
> no strong oppinion here, just wondering ...

removed


[...]



>> Index: libavcodec/celp_filters.h
>> ===================================================================
>> --- libavcodec/celp_filters.h	(revision 15692)
>> +++ libavcodec/celp_filters.h	(working copy)
>> @@ -69,4 +69,28 @@
>>         int stop_on_overflow,
>>         int rounder);
>>
>> +/**
>> + * LP synthesis filter.
>> + * @param out [out] pointer to output buffer
>> + *        - the array out[-filter_length, -1] must
>> + *        contain the previous result of this filter
>> + * @param filter_coeffs filter coefficients.
>> + * @param in input signal
>> + * @param buffer_length amount of data to process
>> + * @param filter_length filter length (10 for 10th order LP filter)
>> + *
>> + * @return 1 if overflow occurred, 0 - otherwise
>
> hmm void == 1 || 0 ?

removed but it has been already commited,
should i send a patch just to remove this doxy line ?









More information about the ffmpeg-devel mailing list