[FFmpeg-soc] [PATCH] AMR-WB Decoder

Vitor Sessak vitor1001 at gmail.com
Thu Sep 9 21:11:13 CEST 2010


On 09/07/2010 02:05 AM, Marcelo Galvão Póvoa wrote:
> On 6 September 2010 11:31, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>> On 09/06/2010 02:46 AM, Marcelo Galvăo Póvoa wrote:
>>> Ok, fortunately I've found the bug!
>>>
>>
>> [...]
>>
>>> So, here is how the decoder is now (I'm sending my commits to github also).
>>>
>>
>>> +/** Get x bits in the index interval [lsb,lsb+len-1] inclusive */
>>> +#define BIT_STR(x,lsb,len) (((x)>>    (lsb))&    ((1<<    (len)) - 1))
>>> +
>>> +/** Get the bit at specified position */
>>> +#define BIT_POS(x, p) (((x)>>    (p))&    1)
>>>
>>
>> If you move the #defines closer to where it is used, you improve the
>> readability of your code.
>>
>
> Fixed
>
>>> +
>>> +/**
>>> + * Decodes quantized ISF vectors using 36-bit indices (6K60 mode only)
>>> + *
>>> + * @param[in] ind                  Array of 5 indices
>>> + * @param[out] isf_q               Buffer for isf_q[LP_ORDER]
>>> + *
>>> + */
>>> +static void decode_isf_indices_36b(uint16_t *ind, float *isf_q) {
>>> +    int i;
>>> +
>>> +    for (i = 0; i<    9; i++) {
>>> +        isf_q[i] = dico1_isf[ind[0]][i] * (1.0f / (1<<    15));
>>> +    }
>>> +    for (i = 0; i<    7; i++) {
>>> +        isf_q[i + 9] = dico2_isf[ind[1]][i] * (1.0f / (1<<    15));
>>> +    }
>>> +    for (i = 0; i<    5; i++) {
>>> +        isf_q[i] += dico21_isf_36b[ind[2]][i] * (1.0f / (1<<    15));
>>> +    }
>>> +    for (i = 0; i<    4; i++) {
>>> +        isf_q[i + 5] += dico22_isf_36b[ind[3]][i] * (1.0f / (1<<    15));
>>> +    }
>>> +    for (i = 0; i<    7; i++) {
>>> +        isf_q[i + 9] += dico23_isf_36b[ind[4]][i] * (1.0f / (1<<    15));
>>> +    }
>>> +}
>>>
>>
>> I think this would be more readable if the (1.0f / (1<<  15)) rescaling
>> would be done in isf_add_mean_and_past().
>>
>
> Hmm, it improves readability at this part but I think that way the
> output for decode_isf_indices_36b() and the input for
> isf_add_mean_and_past() would have a strange meaning, if you know what
> I mean.

ok, makes sense...

>>> +
>>> +/**
>>> + * Find the pitch vector by interpolating the past excitation at the
>>> + * pitch delay, which is obtained in this function
>>> + *
>>> + * @param[in,out] ctx              The context
>>> + * @param[in] amr_subframe         Current subframe data
>>> + * @param[in] subframe             Current subframe index (0 to 3)
>>> + */
>>> +static void decode_pitch_vector(AMRWBContext *ctx,
>>> +                                const AMRWBSubFrame *amr_subframe,
>>> +                                const int subframe)
>>> +{
>>> +    int pitch_lag_int, pitch_lag_frac;
>>> +    int i;
>>> +    float *exc     = ctx->excitation;
>>> +    enum Mode mode = ctx->fr_cur_mode;
>>> +
>>> +    if (mode<= MODE_8k85) {
>>> +        decode_pitch_lag_low(&pitch_lag_int,&pitch_lag_frac, amr_subframe->adap,
>>> +&ctx->base_pitch_lag, subframe, mode);
>>> +    } else
>>> +        decode_pitch_lag_high(&pitch_lag_int,&pitch_lag_frac, amr_subframe->adap,
>>> +&ctx->base_pitch_lag, subframe);
>>> +
>>> +    ctx->pitch_lag_int = pitch_lag_int;
>>>
>>
>>> +    pitch_lag_int     += (pitch_lag_frac<    0 ? -1 : 0) + (pitch_lag_frac ? 1 : 0);
>>>
>>
>> pitch_lag_int     += pitch_lag_frac ? FFSIGN(pitch_lag_frac) : 0;
>>
>
> I don't see if I understand this correctly, see my "truth table" below:
> pitch_lag_frac | mine_RHS | your_RHS
> <0 | 0 | -1
> =0 | 0 | 0
>> 0 | 1 | 1
>
> So, the simplified form would be this?
>
> pitch_lag_int += pitch_lag_frac>  0 ? 1 : 0;

or, as Michael suggested

pitch_lag_int += pitch_lag_frac > 0;

>>> +/**
>>> + * Reduce fixed vector sparseness by smoothing with one of three IR filters
>>> + * Also known as "adaptive phase dispersion"
>>> + *
>>> + * @param[in] ctx                  The context
>>> + * @param[in,out] fixed_vector     Unfiltered fixed vector
>>> + * @param[out] buf                 Space for modified vector if necessary
>>> + *
>>> + * @return The potentially overwritten filtered fixed vector address
>>> + */
>>> +static float *anti_sparseness(AMRWBContext *ctx,
>>> +                              float *fixed_vector, float *buf)
>>> +{
>>> +    int ir_filter_nr;
>>> +
>>> +    if (ctx->fr_cur_mode>    MODE_8k85) // no filtering in higher modes
>>> +        return fixed_vector;
>>> +
>>> +    if (ctx->pitch_gain[0]<    0.6) {
>>> +        ir_filter_nr = 0;      // strong filtering
>>> +    } else if (ctx->pitch_gain[0]<    0.9) {
>>> +        ir_filter_nr = 1;      // medium filtering
>>> +    } else
>>> +        ir_filter_nr = 2;      // no filtering
>>> +
>>> +    /* detect 'onset' */
>>> +    if (ctx->fixed_gain[0]>    3.0 * ctx->fixed_gain[1]) {
>>> +        if (ir_filter_nr<    2)
>>> +            ir_filter_nr++;
>>> +    } else {
>>> +        int i, count = 0;
>>> +
>>> +        for (i = 0; i<    6; i++)
>>> +            if (ctx->pitch_gain[i]<    0.6)
>>> +                count++;
>>> +
>>> +        if (count>    2)
>>> +            ir_filter_nr = 0;
>>> +
>>> +        if (ir_filter_nr>    ctx->prev_ir_filter_nr + 1)
>>> +            ir_filter_nr--;
>>> +    }
>>> +
>>> +    /* update ir filter strength history */
>>> +    ctx->prev_ir_filter_nr = ir_filter_nr;
>>> +
>>> +    ir_filter_nr += (ctx->fr_cur_mode == MODE_8k85 ? 1 : 0);
>>> +
>>> +    if (ir_filter_nr<    2) {
>>> +        int i, j;
>>> +        const float *coef = ir_filters_lookup[ir_filter_nr];
>>> +
>>> +        /* Circular convolution code in the reference
>>> +         * decoder was modified to avoid using one
>>> +         * extra array. The filtered vector is given by:
>>> +         *
>>> +         * c2(n) = sum(i,0,len-1){ c(i) * coef( (n - i + len) % len ) }
>>> +         */
>>> +
>>> +        memset(buf, 0, sizeof(float) * AMRWB_SFR_SIZE);
>>> +        for (i = 0; i<    AMRWB_SFR_SIZE; i++)
>>
>>> +            if (fixed_vector[i]) {
>>> +                int li = AMRWB_SFR_SIZE - i;
>>> +
>>> +                for (j = 0; j<    li; j++)
>>> +                    buf[i + j] += fixed_vector[i] * coef[j];
>>> +
>>> +                for (j = 0; j<    i; j++)
>>> +                    buf[j] += fixed_vector[i] * coef[j + li];
>>> +            }
>>
>> Can celp_filters.c:ff_celp_circ_addf() simplify this?
>>
>
> I already gave some thought to that but I couldn't figure out how to
> use ff_celp_circ_addf() there. I wrote the algorithm the simplest way
> I could think of.

Try

         for (i = 0; i < AMRWB_SFR_SIZE; i++)
             if (fixed_vector[i])
                 ff_celp_circ_addf(buf, buf, coef, i, fixed_vector[i],
                                   AMRWB_SFR_SIZE);



>>> +/**
>>> + * Extrapolate a ISF vector to the 16kHz range (20th order LP)
>>> + * used at mode 6k60 LP filter for the high frequency band
>>> + *
>>> + * @param[out] out                 Buffer for extrapolated isf
>>> + * @param[in] isf                  Input isf vector
>>> + */
>>> +static void extrapolate_isf(float out[LP_ORDER_16k], float isf[LP_ORDER])
>>> +{
>>> +    float diff_isf[LP_ORDER - 2], diff_mean;
>>> +    float *diff_hi = diff_isf - LP_ORDER + 1; // diff array for extrapolated indices
>>> +    float corr_lag[3];
>>> +    float est, scale;
>>> +    int i, i_max_corr;
>>> +
>>> +    memcpy(out, isf, (LP_ORDER - 1) * sizeof(float));
>>> +    out[LP_ORDER_16k - 1] = isf[LP_ORDER - 1];
>>> +
>>> +    /* Calculate the difference vector */
>>> +    for (i = 0; i<    LP_ORDER - 2; i++)
>>> +        diff_isf[i] = isf[i + 1] - isf[i];
>>> +
>>> +    diff_mean = 0.0;
>>> +    for (i = 2; i<    LP_ORDER - 2; i++)
>>> +        diff_mean += diff_isf[i] / (LP_ORDER - 4);
>>
>> diff_mean += diff_isf[i] * (1.0f / (LP_ORDER - 4));
>>
>
> Fixed
>
>>> +    for (i = LP_ORDER - 1; i<    LP_ORDER_16k - 1; i++)
>>> +        diff_hi[i] = scale * (out[i] - out[i - 1]);
>>> +
>>> +    /* Stability insurance */
>>> +    for (i = LP_ORDER; i<    LP_ORDER_16k - 1; i++)
>>> +        if (diff_hi[i] + diff_hi[i - 1]<    5.0) {
>>> +            if (diff_hi[i]>    diff_hi[i - 1]) {
>>> +                diff_hi[i - 1] = 5.0 - diff_hi[i];
>>> +            } else
>>> +                diff_hi[i] = 5.0 - diff_hi[i - 1];
>>> +        }
>>> +
>>> +    for (i = LP_ORDER - 1; i<    LP_ORDER_16k - 1; i++)
>>> +        out[i] = out[i - 1] + diff_hi[i] * (1.0f / (1<<    15));
>>
>> Hmm, this hunk looks a lot like
>>
>> ff_sort_nearly_sorted_floats(out, LP_ORDER_16k);
>> ff_set_min_dist_lsf(out, 5.0f / (1<<    15));
>> for (i = LP_ORDER - 1; i<    LP_ORDER_16k - 1; i++)
>>      out[i] = out[i - 1] + scale * (1.0f / (1<<    15)) * (out[i] - out[i-1]);
>>
>> or something like that.
>>
>
> Yes, but the min_dist_lsf would have to be applied between out[i] and
> out[i-2] I guess. I can't think of how to simplify this hunk without
> changing the output.
>
>>> +/**
>>> + * Apply to high-band samples a 15th order filter
>>> + * The filter characteristic depends on the given coefficients
>>> + *
>>> + * @param[out] out                 Buffer for filtered output
>>> + * @param[in] fir_coef             Filter coefficients
>>> + * @param[in,out] mem              State from last filtering (updated)
>>> + * @param[in] cp_gain              Compensation gain (usually the filter gain)
>>> + * @param[in] in                   Input speech data (high-band)
>>> + *
>>> + * @remark It is safe to pass the same array in in and out parameters
>>> + */
>>> +static void hb_fir_filter(float *out, const float fir_coef[HB_FIR_SIZE + 1],
>>> +                          float mem[HB_FIR_SIZE], float cp_gain, const float *in)
>>> +{
>>> +    int i, j;
>>> +    float data[AMRWB_SFR_SIZE_16k + HB_FIR_SIZE]; // past and current samples
>>> +
>>> +    memcpy(data, mem, HB_FIR_SIZE * sizeof(float));
>>> +
>>> +    for (i = 0; i<    AMRWB_SFR_SIZE_16k; i++)
>>> +        data[i + HB_FIR_SIZE] = in[i] / cp_gain;
>>> +
>>> +    for (i = 0; i<    AMRWB_SFR_SIZE_16k; i++) {
>>> +        out[i] = 0.0;
>>> +        for (j = 0; j<= HB_FIR_SIZE; j++)
>>> +            out[i] += data[i + j] * fir_coef[j];
>>> +    }
>>> +
>>> +    memcpy(mem, data + AMRWB_SFR_SIZE_16k, HB_FIR_SIZE * sizeof(float));
>>> +}
>>
>> This function does too much memcpy: 30+30+80 elements. I suggest that
>> you use a buffer HB_FIR_SIZE elements larger where is is called, so
>> that the memcpy(data, mem, etc) is not needed.
>>
>
> The problem is that this function input is the output from synthesis
> which is preceded by its memory. I cannot avoid the memcpy for both
> functions. Although I can make the memory buffer 80 elements larger to
> avoid the memcpy of 30 elements. This would also solve the problem
> with the memory update that I mention later. It would use more memory
> at the AMRWBContext though.

No, its ok as is then.

>> Another thing is that the scaling by cp_gain is not needed. You can
>> replace it by multiplying the bpf_6_7_coef table by 4.
>>
>
> Don't you mean dividing by 4? Fixed

Sure.

-Vitor


More information about the FFmpeg-soc mailing list