[FFmpeg-soc] [PATCH] AMR-WB Decoder

Mon Aug 30 14:31:18 CEST 2010

On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>    
>> Marcelo Galvão Póvoa wrote:
>>      
>>> Fixed excitation array incorrect length.
>>>
>>> Should now be applicable cleanly to the latest ffmpeg revision
>>> (fd151a5f8bd152c456a).
>>>        
>> First look:
>>
>>
>>      

[...]

>>> +/**
>>> + * Convert an ISF vector into an ISP vector
>>> + *
>>> + * @param[in] isf                  Isf vector
>>> + * @param[out] isp                 Output isp vector
>>> + * @param[in] size                 Isf/isp size
>>> + */
>>> +static void isf2isp(const float *isf, double *isp, int size)
>>> +{
>>> +    int i;
>>> +
>>> +    for (i = 0; i<  size - 1; i++)
>>> +        isp[i] = cos(2.0 * M_PI * isf[i]);
>>> +
>>> +    isp[size - 1] = cos(4.0 * M_PI * isf[size - 1]);
>>> +}
>>>        
>> Almost duplication of amrnbdec.c:lsf2lsp().
>>
>>      
> Yes, but this last element doubling diverges between AMR-NB and AMR-WB
> reference sources.
>    

Can't you just do "isp[size - 1] *=  2" or it is used elsewhere?+       
  f[i] = val * f[i - 1] + 2 * f[i - 2];

[...]

>>> +        for(j = i - 1; j>  1; j--)
>>> +            f[j] += f[j - 1] * val + f[j - 2];
>>> +        f[1] += 0.25 * val;
>>> +    }
>>> +}
>>>        
>> Hmm, can't this function be replaced by ff_lsp2polyf() with some rescaling?
>>
>>      
> Actually I realize now that the difference between these functions is
> just the entire output scaled down by 0.25. That was silly because I
> was scaling by 4.0 after this function. It is weird why the reference
> decoder does this process. Maybe it should worth someone checking it
> out to be sure about this.
>    

I suppose you checked that after this change the output of your decoder 
didn't change, no?

>>> +/**
>>> + * Convert a ISP vector to LP coefficient domain {a_k}
>>> + * Equations from TS 26.190 section 5.2.4
>>> + *
>>> + * @param[in] isp                  ISP vector for a subframe
>>> + * @param[out] lp                  LP coefficients
>>> + * @param[in] lp_half_order        Half the number of LPs to construct
>>> + */
>>> +static void isp2lp(const double *isp, float *lp, int lp_half_order) {
>>> +    double pa[10 + 1], qa[10 + 1];
>>> +    double last_isp = isp[2 * lp_half_order - 1];
>>> +    double qa_old = 0.0;
>>> +    float *lp2 =&lp[2 * lp_half_order];
>>> +    int i;
>>> +
>>> +    if (lp_half_order>  8) { // high-band specific
>>> +        lsp2polyf_16k(isp,     pa, lp_half_order);
>>> +        lsp2polyf_16k(isp + 1, qa, lp_half_order - 1);
>>> +
>>> +        for (i = 0; i<= lp_half_order; i++)
>>> +            pa[i] *= 4.0;
>>> +        for (i = 0; i<  lp_half_order; i++)
>>> +            qa[i] *= 4.0;
>>> +    } else {
>>> +        ff_lsp2polyf(isp,     pa, lp_half_order);
>>> +        ff_lsp2polyf(isp + 1, qa, lp_half_order - 1);
>>> +    }
>>> +
>>> +    for (i = 1; i<  lp_half_order; i++) {
>>> +        double paf = (1 + last_isp) * pa[i];
>>> +        double qaf = (1 - last_isp) * (qa[i] - qa_old);
>>> +
>>> +        qa_old = qa[i - 1];
>>> +
>>> +        lp[i]   = 0.5 * (paf + qaf);
>>> +        lp2[-i] = 0.5 * (paf - qaf);
>>> +    }
>>> +
>>> +    lp[0] = 1.0;
>>> +    lp[lp_half_order] = 0.5 * (1 + last_isp) * pa[lp_half_order];
>>> +    lp2[0] = last_isp;
>>> +}
>>>        
>> Please double-check if this is not a duplication of sipr.c:lsp2lpc_sipr().
>>
>>      
> Interesting, I believe they are the same function except for the fact
> that in AMR-WB the first LP coefficient is always 1.0 but this should
> be easily adapted by passing lp + 1 to the function and setting lp[0]
> = 1.0. Also, the LP order is different between these codecs. To reuse
> this existing function it should be moved to another file (lsp.c I
> guess) and modified. Should I do it and include in the next patch?
>    

You can send directly a patch to ffmpeg-devel to move this code to lsp.c.

>>> +static void decode_pitch_vector(AMRWBContext *ctx,
>>> +                                const AMRWBSubFrame *amr_subframe,
>>> +                                const int subframe)
>>> +{
>>> +    int pitch_lag_int, pitch_lag_frac;
>>> +    int i;
>>> +    float *exc     = ctx->excitation;
>>> +    enum Mode mode = ctx->fr_cur_mode;
>>> +
>>> +    if (mode<= MODE_8k85) {
>>> +        decode_pitch_lag_low(&pitch_lag_int,&pitch_lag_frac,
>>> amr_subframe->ada
>>> p,
>>> +&ctx->base_pitch_lag, subframe, mode);
>>> +    } else
>>> +        decode_pitch_lag_high(&pitch_lag_int,&pitch_lag_frac,
>>> amr_subframe->ad
>>> ap,
>>> +&ctx->base_pitch_lag, subframe);
>>> +
>>> +    ctx->pitch_lag_int = pitch_lag_int;
>>> +    pitch_lag_int     += (pitch_lag_frac<  0 ? -1 : 0) + (pitch_lag_frac
>>> ? 1 : 0);
>>> +
>>> +
>>> +    /* Calculate the pitch vector by interpolating the past excitation at
>>> the
>>> +       pitch lag using a hamming windowed sinc function */
>>> +    ff_acelp_interpolatef(exc, exc + 1 - pitch_lag_int,
>>> +                          ac_inter, 4,
>>> +                          pitch_lag_frac + (pitch_lag_frac>  0 ? 0 : 4),
>>> +                          LP_ORDER, AMRWB_SFR_SIZE + 1);
>>>        
>> ac_inter is yet another hamming function. Can you check if you cannot reuse
>> acelp_vectors.c:ff_b60_sinc or sipr16kdata.h:sinc_win?
>>
>>      
> Well, I cannot tell precisely if it cannot be reused but I guess so.
> The number of coefficients in the table is related to the fractional
> resolution of the interpolation: in AMR-NB and SIPR it is 4 but in
> AMR-WB it is 6. Also, I don't see similarities between the table I
> extracted from the reference and these tables.
>    

Ok, so leave it like this by now.

>>> +/**
>>> + * Reduce fixed vector sparseness by smoothing with one of three IR
>>> filters
>>> + * Also known as "adaptive phase dispersion"
>>> + *
>>> + * @param[in] ctx                  The context
>>> + * @param[in,out] fixed_vector     Unfiltered fixed vector
>>> + * @param[out] buf                 Space for modified vector if necessary
>>> + *
>>> + * @return The potentially overwritten filtered fixed vector address
>>> + */
>>> +static float *anti_sparseness(AMRWBContext *ctx,
>>> +                              float *fixed_vector, float *buf)
>>>        
>> amrnbedec.c has a function with the same name. Any code can be reused?
>>
>>      
> The function skeletons are similar but there are differences
> everywhere. I can think about easily reusing only about 6 lines, which
> wouldn't worth much
>    

Ok, leave it as-is then.

>>> +/**
>>> + * Apply to synthesis a 2nd order high-pass filter
>>> + *
>>> + * @param[out] out                 Buffer for filtered output
>>> + * @param[in] hpf_coef             Filter coefficients as used below
>>> + * @param[in,out] mem              State from last filtering (updated)
>>> + * @param[in] in                   Input speech data
>>> + *
>>> + * @remark It is safe to pass the same array in in and out parameters
>>> + */
>>> +static void high_pass_filter(float *out, const float hpf_coef[2][3],
>>> +                             float mem[4], const float *in)
>>> +{
>>> +    int i;
>>> +    float *x = mem - 1, *y = mem + 2; // previous inputs and outputs
>>> +
>>> +    for (i = 0; i<  AMRWB_SFR_SIZE; i++) {
>>> +        float x0 = in[i];
>>> +
>>> +        out[i] = hpf_coef[0][0] * x0   + hpf_coef[1][0] * y[0] +
>>> +                 hpf_coef[0][1] * x[1] + hpf_coef[1][1] * y[1] +
>>> +                 hpf_coef[0][2] * x[2];
>>> +
>>> +        y[1] = y[0];
>>> +        y[0] = out[i];
>>> +
>>> +        x[2] = x[1];
>>> +        x[1] = x0;
>>> +    }
>>> +}
>>>        
>> acelp_filter.c:ff_acelp_apply_order_2_transfer_function()
>>
>>      
> Are you sure? I can't see them as equivalent, could you explain?
>    

I'll give a better look at it soon...

>>> +/**
>>> + * Upsample a signal by 5/4 ratio (from 12.8kHz to 16kHz) using
>>> + * a FIR interpolation filter. Uses past data from before *in address
>>> + *
>>> + * @param[out] out                 Buffer for interpolated signal
>>> + * @param[in] in                   Current signal data (length
>>> 0.8*o_size)
>>> + * @param[in] o_size               Output signal length
>>> + */
>>> +static void upsample_5_4(float *out, const float *in, int o_size)
>>> +{
>>> +    const float *in0 = in - UPS_FIR_SIZE + 1;
>>> +    int i;
>>> +
>>> +    for (i = 0; i<  o_size; i++) {
>>> +        int int_part  = (4 * i) / 5;
>>> +        int frac_part = (4 * i) - 5 * int_part;
>>>        
>> You can break this loop in two to avoid the division:
>>
>> i = 0;
>> for (j = 0; j<  o_size/5; j++)
>>     for (k = 0; k<  5; k++) {
>>         ....
>>         i++;
>>     }
>>
>>      
> I've done this in not so neat way, adding 4 and calculating a simple
> remainder by 5. Tell me if can be done in a better way.
>    

Does something like the following work?

     int int_part = 0, frac_part = 0;

     i = 0;
     for (j = 0; j<  o_size / 5; j++) {
         out[i] = in[int_part];
         frac_part = 4;
         i++;
         for (k = 1; k<  5; k++) {
             out[i] = ff_dot_productf(in0 + int_part, upsample_fir[4 - frac_part],
                                          UPS_MEM_SIZE);
             int_part++;
             frac_part--;
             i++;
        }
     }

>>> +/**
>>> + * Generate the high-band excitation with the same energy from the lower
>>> + * one and scaled by the given gain
>>> + *
>>> + * @param[in] ctx                  The context
>>> + * @param[out] hb_exc              Buffer for the excitation
>>> + * @param[in] synth_exc            Low-band excitation used for synthesis
>>> + * @param[in] hb_gain              Wanted excitation gain
>>> + */
>>> +static void scaled_hb_excitation(AMRWBContext *ctx, float *hb_exc,
>>> +                                 const float *synth_exc, float hb_gain)
>>> +{
>>> +    int i;
>>> +    float energy = ff_dot_productf(synth_exc, synth_exc, AMRWB_SFR_SIZE);
>>> +
>>> +    /* Generate a white-noise excitation */
>>> +    for (i = 0; i<  AMRWB_SFR_SIZE_16k; i++)
>>> +        hb_exc[i] = 32768.0 - (uint16_t) av_lfg_get(&ctx->prng);
>>> +
>>> +    ff_scale_vector_to_given_sum_of_squares(hb_exc, hb_exc, energy,
>>> +                                            AMRWB_SFR_SIZE_16k);
>>> +
>>> +    for (i = 0; i<  AMRWB_SFR_SIZE_16k; i++)
>>> +        hb_exc[i] *= hb_gain;
>>> +}
>>>        
>> Why are you scaling it twice?
>>
>>      
> The first scaling is to match the energy with the lower band part. The
> second is the high_band gain itself. This can be done in only one loop
> using ff_scale_vector_to_given_sum_of_squares() ?
>    

I think you will be obliged to duplicate a little of  
ff_scale_vector_to_given_sum_of_squares(), but it will make your code 
faster...

>>> +/**
>>> + * Apply to high-band samples a 15th order filter
>>> + * The filter characteristic depends on the given coefficients
>>> + *
>>> + * @param[out] out                 Buffer for filtered output
>>> + * @param[in] fir_coef             Filter coefficients
>>> + * @param[in,out] mem              State from last filtering (updated)
>>> + * @param[in] cp_gain              Compensation gain (usually the filter
>>> gain)
>>> + * @param[in] in                   Input speech data (high-band)
>>> + *
>>> + * @remark It is safe to pass the same array in in and out parameters
>>> + */
>>> +static void hb_fir_filter(float *out, const float fir_coef[HB_FIR_SIZE +
>>> 1],
>>> +                          float mem[HB_FIR_SIZE], float cp_gain, const
>>> float *in)
>>> +{
>>> +    int i, j;
>>> +    float data[AMRWB_SFR_SIZE_16k + HB_FIR_SIZE]; // past and current
>>> samples
>>> +
>>> +    memcpy(data, mem, HB_FIR_SIZE * sizeof(float));
>>> +
>>> +    for (i = 0; i<  AMRWB_SFR_SIZE_16k; i++)
>>> +        data[i + HB_FIR_SIZE] = in[i] / cp_gain;
>>> +
>>> +    for (i = 0; i<  AMRWB_SFR_SIZE_16k; i++) {
>>> +        out[i] = 0.0;
>>> +        for (j = 0; j<= HB_FIR_SIZE; j++)
>>> +            out[i] += data[i + j] * fir_coef[j];
>>> +    }
>>> +
>>> +    memcpy(mem, data + AMRWB_SFR_SIZE_16k, HB_FIR_SIZE * sizeof(float));
>>> +}
>>>        
>> I think it is cleaner (and more consistent) to do like the synthesis filter
>> and get one single buffer for output and memory...
>>
>>      
> The problem is that this filter memory consists of past inputs and not
> outputs. If I would do with a single buffer I would have to copy the
> input to it in order to avoid overwriting the last filter output.
>    

ok

Also

> +/**
> + * Parses a speech frame, storing data in the Context
> + *
> + * @param[in,out] ctx              The context
> + * @param[in] buf                  Pointer to the input buffer
> + * @param[in] buf_size             Size of the input buffer
> + *
> + * @return The frame mode
> + */
> +static enum Mode unpack_bitstream(AMRWBContext *ctx, const uint8_t *buf,
> +                                  int buf_size)
> +{
> +    GetBitContext gb;
> +    uint16_t *data;
> +
> +    init_get_bits(&gb, buf, buf_size * 8);
> +
> +    /* decode frame header (1st octet) */
> +    skip_bits(&gb, 1);  // padding bit
> +    ctx->fr_cur_mode  = get_bits(&gb, 4);
> +    ctx->fr_quality   = get_bits1(&gb);
> +    skip_bits(&gb, 2);  // padding bits
> +
> +    // XXX: We are using only the "MIME/storage" format
> +    // used by libopencore. This format is simpler and
> +    // does not carry the auxiliary information of the frame
> +
> +    data = (uint16_t *)&ctx->frame;
> +    memset(data, 0, sizeof(AMRWBFrame));
> +    buf++;
> +
> +    if (ctx->fr_cur_mode<  MODE_SID) { /* Normal speech frame */
> +        const uint16_t *perm = amr_bit_orderings_by_mode[ctx->fr_cur_mode];
> +        int field_size;
> +
> +        while ((field_size = *perm++)) {
> +            int field = 0;
> +            int field_offset = *perm++;
> +            while (field_size--) {
> +               uint16_t bit_idx = *perm++;
> +               field<<= 1;
> +               /* The bit index inside the byte is reversed (MSB->LSB) */
> +               field |= BIT_POS(buf[bit_idx>>  3], 7 - (bit_idx&  7));
> +            }
> +            data[field_offset] = field;
> +        }
> +    }
> +
> +    return ctx->fr_cur_mode;
> +}
>    

Can't you reorder the bit indexes in a way it is in the LSB->MSB so you 
can use just the usual bitstream functions?

-Vitor