[FFmpeg-soc] [PATCH] AMR-WB Decoder

Marcelo Galvão Póvoa marspeoplester at gmail.com
Mon Aug 30 16:44:41 CEST 2010


On 30 August 2010 09:31, Vitor Sessak <vitor1001 at gmail.com> wrote:
> On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
>>
>> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>>
>>>
>>> Marcelo Galvão Póvoa wrote:
>>>
>>>>
>>>> Fixed excitation array incorrect length.
>>>>
>>>> Should now be applicable cleanly to the latest ffmpeg revision
>>>> (fd151a5f8bd152c456a).
>>>>
>>>
>>> First look:
>>>
>>>
>>>
>
> [...]
>
>>>> +/**
>>>> + * Convert an ISF vector into an ISP vector
>>>> + *
>>>> + * @param[in] isf                  Isf vector
>>>> + * @param[out] isp                 Output isp vector
>>>> + * @param[in] size                 Isf/isp size
>>>> + */
>>>> +static void isf2isp(const float *isf, double *isp, int size)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i<  size - 1; i++)
>>>> +        isp[i] = cos(2.0 * M_PI * isf[i]);
>>>> +
>>>> +    isp[size - 1] = cos(4.0 * M_PI * isf[size - 1]);
>>>> +}
>>>>
>>>
>>> Almost duplication of amrnbdec.c:lsf2lsp().
>>>
>>>
>>
>> Yes, but this last element doubling diverges between AMR-NB and AMR-WB
>> reference sources.
>>
>
> Can't you just do "isp[size - 1] *=  2" or it is used elsewhere?
>

It's not the last element output that is doubled but the input. Since
it is a cosine, I would have to calculate it again for the last
element I guess.

> [...]
>
>>>> +        for(j = i - 1; j>  1; j--)
>>>> +            f[j] += f[j - 1] * val + f[j - 2];
>>>> +        f[1] += 0.25 * val;
>>>> +    }
>>>> +}
>>>>
>>>
>>> Hmm, can't this function be replaced by ff_lsp2polyf() with some
>>> rescaling?
>>>
>>>
>>
>> Actually I realize now that the difference between these functions is
>> just the entire output scaled down by 0.25. That was silly because I
>> was scaling by 4.0 after this function. It is weird why the reference
>> decoder does this process. Maybe it should worth someone checking it
>> out to be sure about this.
>>
>
> I suppose you checked that after this change the output of your decoder
> didn't change, no?
>

Actually I cannot compare the output bitwise because the high band
involves a random excitation, I am afraid that it is not the same for
each run (or it is?). I did several comparisons manually between the
resulting vectors using gdb and they were exactly the same. Also, I
thought about it and I'm convinced that the code does the same as
before. But it is weird, maybe someone should verify it.

>>>> +/**
>>>> + * Convert a ISP vector to LP coefficient domain {a_k}
>>>> + * Equations from TS 26.190 section 5.2.4
>>>> + *
>>>> + * @param[in] isp                  ISP vector for a subframe
>>>> + * @param[out] lp                  LP coefficients
>>>> + * @param[in] lp_half_order        Half the number of LPs to construct
>>>> + */
>>>> +static void isp2lp(const double *isp, float *lp, int lp_half_order) {
>>>> +    double pa[10 + 1], qa[10 + 1];
>>>> +    double last_isp = isp[2 * lp_half_order - 1];
>>>> +    double qa_old = 0.0;
>>>> +    float *lp2 =&lp[2 * lp_half_order];
>>>> +    int i;
>>>> +
>>>> +    if (lp_half_order>  8) { // high-band specific
>>>> +        lsp2polyf_16k(isp,     pa, lp_half_order);
>>>> +        lsp2polyf_16k(isp + 1, qa, lp_half_order - 1);
>>>> +
>>>> +        for (i = 0; i<= lp_half_order; i++)
>>>> +            pa[i] *= 4.0;
>>>> +        for (i = 0; i<  lp_half_order; i++)
>>>> +            qa[i] *= 4.0;
>>>> +    } else {
>>>> +        ff_lsp2polyf(isp,     pa, lp_half_order);
>>>> +        ff_lsp2polyf(isp + 1, qa, lp_half_order - 1);
>>>> +    }
>>>> +
>>>> +    for (i = 1; i<  lp_half_order; i++) {
>>>> +        double paf = (1 + last_isp) * pa[i];
>>>> +        double qaf = (1 - last_isp) * (qa[i] - qa_old);
>>>> +
>>>> +        qa_old = qa[i - 1];
>>>> +
>>>> +        lp[i]   = 0.5 * (paf + qaf);
>>>> +        lp2[-i] = 0.5 * (paf - qaf);
>>>> +    }
>>>> +
>>>> +    lp[0] = 1.0;
>>>> +    lp[lp_half_order] = 0.5 * (1 + last_isp) * pa[lp_half_order];
>>>> +    lp2[0] = last_isp;
>>>> +}
>>>>
>>>
>>> Please double-check if this is not a duplication of
>>> sipr.c:lsp2lpc_sipr().
>>>
>>>
>>
>> Interesting, I believe they are the same function except for the fact
>> that in AMR-WB the first LP coefficient is always 1.0 but this should
>> be easily adapted by passing lp + 1 to the function and setting lp[0]
>> = 1.0. Also, the LP order is different between these codecs. To reuse
>> this existing function it should be moved to another file (lsp.c I
>> guess) and modified. Should I do it and include in the next patch?
>>
>
> You can send directly a patch to ffmpeg-devel to move this code to lsp.c.
>

Ok, I will try it.

[...]

>
>>>> +/**
>>>> + * Upsample a signal by 5/4 ratio (from 12.8kHz to 16kHz) using
>>>> + * a FIR interpolation filter. Uses past data from before *in address
>>>> + *
>>>> + * @param[out] out                 Buffer for interpolated signal
>>>> + * @param[in] in                   Current signal data (length
>>>> 0.8*o_size)
>>>> + * @param[in] o_size               Output signal length
>>>> + */
>>>> +static void upsample_5_4(float *out, const float *in, int o_size)
>>>> +{
>>>> +    const float *in0 = in - UPS_FIR_SIZE + 1;
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i<  o_size; i++) {
>>>> +        int int_part  = (4 * i) / 5;
>>>> +        int frac_part = (4 * i) - 5 * int_part;
>>>>
>>>
>>> You can break this loop in two to avoid the division:
>>>
>>> i = 0;
>>> for (j = 0; j<  o_size/5; j++)
>>>    for (k = 0; k<  5; k++) {
>>>        ....
>>>        i++;
>>>    }
>>>
>>>
>>
>> I've done this in not so neat way, adding 4 and calculating a simple
>> remainder by 5. Tell me if can be done in a better way.
>>
>
> Does something like the following work?
>
>    int int_part = 0, frac_part = 0;
>
>    i = 0;
>    for (j = 0; j<  o_size / 5; j++) {
>        out[i] = in[int_part];
>        frac_part = 4;
>        i++;
>        for (k = 1; k<  5; k++) {
>            out[i] = ff_dot_productf(in0 + int_part, upsample_fir[4 -
> frac_part],
>                                         UPS_MEM_SIZE);
>            int_part++;
>            frac_part--;
>            i++;
>       }
>    }
>

Yes, I guess it works. Despite being very specific to the 5/4 case, it
should be more efficient. Fixed

[...]

>> +/**
>> + * Parses a speech frame, storing data in the Context
>> + *
>> + * @param[in,out] ctx              The context
>> + * @param[in] buf                  Pointer to the input buffer
>> + * @param[in] buf_size             Size of the input buffer
>> + *
>> + * @return The frame mode
>> + */
>> +static enum Mode unpack_bitstream(AMRWBContext *ctx, const uint8_t *buf,
>> +                                  int buf_size)
>> +{
>> +    GetBitContext gb;
>> +    uint16_t *data;
>> +
>> +    init_get_bits(&gb, buf, buf_size * 8);
>> +
>> +    /* decode frame header (1st octet) */
>> +    skip_bits(&gb, 1);  // padding bit
>> +    ctx->fr_cur_mode  = get_bits(&gb, 4);
>> +    ctx->fr_quality   = get_bits1(&gb);
>> +    skip_bits(&gb, 2);  // padding bits
>> +
>> +    // XXX: We are using only the "MIME/storage" format
>> +    // used by libopencore. This format is simpler and
>> +    // does not carry the auxiliary information of the frame
>> +
>> +    data = (uint16_t *)&ctx->frame;
>> +    memset(data, 0, sizeof(AMRWBFrame));
>> +    buf++;
>> +
>> +    if (ctx->fr_cur_mode<  MODE_SID) { /* Normal speech frame */
>> +        const uint16_t *perm =
>> amr_bit_orderings_by_mode[ctx->fr_cur_mode];
>> +        int field_size;
>> +
>> +        while ((field_size = *perm++)) {
>> +            int field = 0;
>> +            int field_offset = *perm++;
>> +            while (field_size--) {
>> +               uint16_t bit_idx = *perm++;
>> +               field<<= 1;
>> +               /* The bit index inside the byte is reversed (MSB->LSB) */
>> +               field |= BIT_POS(buf[bit_idx>>  3], 7 - (bit_idx&  7));
>> +            }
>> +            data[field_offset] = field;
>> +        }
>> +    }
>> +
>> +    return ctx->fr_cur_mode;
>> +}
>>
>
> Can't you reorder the bit indexes in a way it is in the LSB->MSB so you can
> use just the usual bitstream functions?
>

Which bitstream functions? If the tables were in LSB->MSB order, it
would be the same algorithm as in AMR-NB, the only difference is that
I do (7 - index) to reverse the order.

On 30 August 2010 10:00, Rob <robert.swain at gmail.com> wrote:
> On 30 August 2010 14:31, Vitor Sessak <vitor1001 at gmail.com> wrote:
>> On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
>>> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>>>> Marcelo Galvão Póvoa wrote:
>
> [...]
>
>>>>> +/**
>>>>> + * Generate the high-band excitation with the same energy from the
>>>>> lower
>>>>> + * one and scaled by the given gain
>>>>> + *
>>>>> + * @param[in] ctx                  The context
>>>>> + * @param[out] hb_exc              Buffer for the excitation
>>>>> + * @param[in] synth_exc            Low-band excitation used for
>>>>> synthesis
>>>>> + * @param[in] hb_gain              Wanted excitation gain
>>>>> + */
>>>>> +static void scaled_hb_excitation(AMRWBContext *ctx, float *hb_exc,
>>>>> +                                 const float *synth_exc, float hb_gain)
>>>>> +{
>>>>> +    int i;
>>>>> +    float energy = ff_dot_productf(synth_exc, synth_exc,
>>>>> AMRWB_SFR_SIZE);
>>>>> +
>>>>> +    /* Generate a white-noise excitation */
>>>>> +    for (i = 0; i<  AMRWB_SFR_SIZE_16k; i++)
>>>>> +        hb_exc[i] = 32768.0 - (uint16_t) av_lfg_get(&ctx->prng);
>>>>> +
>>>>> +    ff_scale_vector_to_given_sum_of_squares(hb_exc, hb_exc, energy,
>>>>> +                                            AMRWB_SFR_SIZE_16k);
>>>>> +
>>>>> +    for (i = 0; i<  AMRWB_SFR_SIZE_16k; i++)
>>>>> +        hb_exc[i] *= hb_gain;
>>>>> +}
>>>>
>>>> Why are you scaling it twice?
>>>
>>> The first scaling is to match the energy with the lower band part. The
>>> second is the high_band gain itself. This can be done in only one loop
>>> using ff_scale_vector_to_given_sum_of_squares() ?
>>
>> I think you will be obliged to duplicate a little of
>>  ff_scale_vector_to_given_sum_of_squares(), but it will make your code
>> faster...
>
> The point is to avoid iterating over the subframe to multiply by some
> value twice by merging the two beforehand and then applying to the
> vector. Looking at the code:
>
> void ff_scale_vector_to_given_sum_of_squares(float *out, const float *in,
>                                             float sum_of_squares, const int n)
> {
>    int i;
>    float scalefactor = ff_dot_productf(in, in, n);
>    if (scalefactor)
>        scalefactor = sqrt(sum_of_squares / scalefactor);
>    for (i = 0; i < n; i++)
>        out[i] = in[i] * scalefactor;
> }
>
> We want:
> scalefactor * hb_gain =
> sqrt(sum_of_squares / ff_dot_productf(in, in, n)) * hb_gain =
> sqrt(sum_of_squares * hb_gain * hb_gain / ff_dot_productf())
> hb_gain is [0.1, 1.0] according to find_hb_gain() so there's no sign
> issue
>
> So you should just be able to pass in energy * hb_gain * hb_gain I
> think. Or duplicate the code.
>

Yes, in fact it is better to pass energy * hb_gain * hb_gain. Fixed

-- 
Marcelo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: amrwb.patch
Type: application/octet-stream
Size: 155416 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-soc/attachments/20100830/10a06362/attachment.obj>


More information about the FFmpeg-soc mailing list