[FFmpeg-devel] [PATCH] WMA Voice postfilter

Vitor Sessak vitor1001
Mon Apr 12 01:48:07 CEST 2010


Ronald S. Bultje wrote:
> Hi,
> 
> On Thu, Apr 1, 2010 at 11:52 AM, Reimar D?ffinger
> <Reimar.Doeffinger at gmx.de> wrote:
>> On Thu, Apr 01, 2010 at 11:25:25AM -0400, Ronald S. Bultje wrote:
>>> I'm probably asking silly questions here, but can I allocate aligned
>>> memory on the stack? Or do they have to be in the struct?
>> You better avoid it on the stack, it won't work reliably on all
>> supported configurations.
> 
> Thanks for the advice, this patch is tested with yasm and does not
> crash, so I think alignment is OK now. By putting synth buffer in the
> context, I was also able to remove one memcpy(). (One or two more of
> those are possible in the original code, previously already suggested
> by Vitor, so I might go ahead and do that in a future patch also.)

Some comments:

> +static int kalman_smoothen(WMAVoiceContext *s, int pitch,
> +                           const float *in, float *out, int size)
> +{
> +    int n;
> +    float optimal_gain = 0, dot;
> +    const float *ptr = &in[-FFMAX(s->min_pitch_val, pitch - 3)],
> +                *end = &in[-FFMIN(s->max_pitch_val, pitch + 3)],
> +                *best_hist_ptr;
> +
> +    /* find best fitting point in history */
> +    do {
> +        dot = ff_dot_productf(in, ptr, size);
> +        if (dot > optimal_gain) {
> +            optimal_gain  = dot;
> +            best_hist_ptr = ptr;
> +        }
> +    } while (--ptr >= end);
> +
> +    if (optimal_gain <= 0)
> +        return -1;
> +    dot = ff_dot_productf(best_hist_ptr, best_hist_ptr, size);
> +    if (dot <= 0) // would be 1.0
> +        return -1;
> +
> +    if (optimal_gain <= dot) {
> +        dot = 0.5 / (0.5 + 0.3 * optimal_gain / dot); // 0.0625-1.0000

This can be done with one less division:

dot = (0.5 * dot) / (0.5 * dot + 0.3 * optimal_gain); // 0.0625-1.0000

> +/**
> + * Derive denoise filter coefficients from the LPCs.
> + */
> +static void calc_coeffs(WMAVoiceContext *s, float *lpcs,
> +                        int fcb_type, float *coeffs, int remainder)

I think this would be better named "calc_input_response()"

> +    /* Create frequency power spectrum of speech input (i.e. RDFT of LPCs);
> +     * we shift each value to an offset +1 so we don't have to create temp
> +     * values. */
> +    ff_rdft_calc(&s->rdft, lpcs);
> +#define log_range(x, assign) do { \
> +        float tmp = log10f(assign);  lpcs[x] = tmp; \
> +        max       = FFMAX(max, tmp); min     = FFMIN(min, tmp); \
> +    } while (0)
> +    for (n = 1; n < 0x40; n++)
> +        log_range(n + 1, lpcs[n * 2]     * lpcs[n * 2] +
> +                         lpcs[n * 2 + 1] * lpcs[n * 2 + 1]);
> +    log_range(0x41,      lpcs[1]         * lpcs[1]);
> +    log_range(1,         lpcs[0]         * lpcs[0]);
> +#undef log_range

I think that readability could be improved using a single temp value 
(float last_coef = lpcs[1]) and avoiding the shifts.

> +    /* Now, use this spectrum to pick out these frequencies with higher
> +     * (relative) power/energy (which we then take to be "not noise"),
> +     * and set up a table (still in lpc[]) of (relative) gains per frequency.
> +     * These frequencies will be maintained, while others ("noise") will be
> +     * decreased in the filter output. */
> +    irange    = 64.0 / range; // so irange*(max-value) is in the range [0, 63]
> +    gain_mul  = range * (fcb_type == FCB_TYPE_HARDCODED ? (5.0 / 13.0) :
> +                                                          (5.0 / 14.7));
> +    angle_mul = gain_mul * (8.0 * M_LN10 / M_PI);
> +    for (n = 1; n <= 0x41; n++) {
> +        float pow;
> +
> +        idx = FFMAX(0, lrint((max - lpcs[n]) * irange) - 1);
> +        pow = wmavoice_denoise_power_table[s->denoise_strength][idx];
> +        /* note how we shift it back to offset +0  here */
> +        lpcs[n - 1] = angle_mul * pow;
> +
> +        idx = (pow * gain_mul - 0.0295) * 70.570526123;
> +        if (idx > 0x7F) {
> +            coeffs[n] = wmavoice_energy_table[127] *
> +                        powf(1.0331663, idx - 127);
> +        } else
> +            coeffs[n] = wmavoice_energy_table[FFMAX(0, idx)];

I think a comment saying that 70.570526123 == 1./log10(1.0331663) would 
be useful (and the if() is just a failback for some value that didn't 
fit in the table).

> +    /* calculate the Hilbert transform of the gains, which we do (since this
> +     * is a sinus input) by doing a phase shift (in theory, H(sin())=cos()). */
> +    ff_dct_calc(&s->dct, lpcs);
> +    ff_dct_calc(&s->dst, lpcs);

I'd say the point of doing the Hilbert transform (that should be said in 
some comment) is that

Hilbert_Transform(RDFT(F)) == Laplace_Transform(F)

which is what you need to do a Wiener filter (unless I'm missing something).

> +    /* Project values as coefficients - note how this shifts them back
> +     * from offset=1 to offset=0. */

More importantly, before you calculated the norm of the coefficient 
indexes (which discards all phase data). Now, here, you are putting back 
  a (different) phase...

> +    idx = 255 + av_clip(lpcs[64],               -255, 255);
> +    coeffs[0] = coeffs[1]    * s->cos[idx];
> +    idx = 255 + av_clip(lpcs[64] - 2 * lpcs[63], -255, 255);
> +    coeffs[1] = coeffs[0x41] * s->cos[idx];
> +    for (n = 0x3F;; n--) {
> +        idx = 255 + av_clip(-lpcs[64] - 2 * lpcs[n - 1], -255, 255);
> +        coeffs[n * 2 + 1] = coeffs[n + 1] * s->sin[idx];
> +        coeffs[n * 2]     = coeffs[n + 1] * s->cos[idx];
> +
> +        if (!--n) break;
> +
> +        idx = 255 + av_clip( lpcs[64] - 2 * lpcs[n - 1], -255, 255);
> +        coeffs[n * 2 + 1] = coeffs[n + 1] * s->sin[idx];
> +        coeffs[n * 2]     = coeffs[n + 1] * s->cos[idx];
> +    }
> +

> +    /* fix scale and tilt correction */
> +    ff_rdft_calc(&s->irdft, coeffs);

... and doing the inverse RDFT.

> +    memset(&coeffs[remainder], 0, sizeof(coeffs[0]) * (0x80 - remainder));
> +    if (s->denoise_tilt_corr) {
> +        float tilt_mem = 0;
> +
> +        coeffs[remainder - 1] = 0;
> +        ff_tilt_compensation(&tilt_mem,
> +                             -1.8 * tilt_factor(coeffs, remainder - 1),
> +                             coeffs, remainder);
> +    }
> +    sq = (1.0 / 64.0) * sqrtf(1 / ff_dot_productf(coeffs, coeffs, remainder));
> +    for (n = 0; n < remainder; n++)
> +        coeffs[n] *= sq;
> +    ff_rdft_calc(&s->rdft, coeffs);
> +}

This last RDFT does not really belong to this function. It really would 
make much more sense in the caller, since it will read like

ff_rdft_calc(&s->rdft, synth_pf);
ff_rdft_calc(&s->rdft, coeffs);

for(i = 0; i < N ; i++)
      [complex multiplication synth_pf[i] *= coeffs[i]];

ff_rdft_calc(&s->irdft, synth_pf);

Which makes it clear that we are convolving synth_pf with coeffs.

> +        double i_lsps[MAX_LSPS];

i_?

Besides these comments, patch ok.

-Vitor



More information about the ffmpeg-devel mailing list