[FFmpeg-devel] [PATCH] WMA Voice postfilter

Vitor Sessak vitor1001
Thu Mar 18 21:33:30 CET 2010


Ronald S. Bultje wrote:
> Hi,
> 
> see attached, please be kind.

Small problem:

> vitor at vitor-laptop:~/ffmpeg/ffmpeg.2$ svn up libavcodec/wmavoice{.c,_data.h}
> At revision 22593.
> At revision 22593.
> vitor at vitor-laptop:~/ffmpeg/ffmpeg.2$ patch -p1 < /tmp/wmavoice-apf.patch 
> patching file libavcodec/wmavoice.c
> Hunk #8 FAILED at 1517.
> Hunk #9 succeeded at 1773 (offset 1 line).
> Hunk #10 succeeded at 1799 (offset 1 line).
> Hunk #11 succeeded at 1956 (offset 1 line).
> Hunk #12 succeeded at 1984 (offset 1 line).
> Hunk #13 succeeded at 2002 (offset 1 line).
> 1 out of 13 hunks FAILED -- saving rejects to file libavcodec/wmavoice.c.rej
> patching file libavcodec/wmavoice_data.h

> 
> Index: ffmpeg-svn/libavcodec/wmavoice.c
> ===================================================================
> --- ffmpeg-svn.orig/libavcodec/wmavoice.c	2010-03-16 18:57:01.000000000 -0400
> +++ ffmpeg-svn/libavcodec/wmavoice.c	2010-03-18 14:16:35.000000000 -0400
> @@ -36,6 +36,8 @@
>  #include "acelp_filters.h"
>  #include "lsp.h"
>  #include "libavutil/lzo.h"
> +#include "avfft.h"
> +#include "fft.h"
>  
>  #define MAX_BLOCKS           8   ///< maximum number of blocks per frame
>  #define MAX_LSPS             16  ///< maximum filter order
> @@ -142,6 +144,12 @@
>  
>      int do_apf;                   ///< whether to apply the averaged
>                                    ///< projection filter (APF)
> +    int denoise_strength;         ///< strength of denoising in Wiener filter
> +                                  ///< [0-11]
> +    int denoise_tilt_corr;        ///< Whether to apply tilt correction to the
> +                                  ///< Wiener filter coefficients (postfilter)
> +    int dc_level;                 ///< Predicted amount of DC noise, based
> +                                  ///< on which a DC removal filter is used

I would add a

/* postfilter specific */

comment to separate it from the other global values.

> +static void adaptive_gain_control(float *buf_out, const float *speech_synth,
> +                                  int size, float alpha, float *gain_mem)
> +{
> +    int i;
> +    float speech_energy = 0.0, postfilter_energy = 0.0, gain_scale_factor;
> +    float mem = *gain_mem;
> +
> +    for (i = 0; i < size; i++) {
> +        speech_energy     += fabs(speech_synth[i]);
> +        postfilter_energy += fabs(buf_out[i]);

fabsf() is probably faster on x64.

> +    /* calculate the Hilbert transform of the gains, which we do (since this
> +     * is a sinus input) by doing a phase shift (in theory, H(sin())=cos()).
> +     * Because input is symmetric (mirror above), every im[n] is zero. */
> +    ff_rdft_calc(&s->rdft, &lpcs[1]);
> +    lpcs[1] = lpcs[2];
> +    lpcs[2] = lpcs[0] = 0;
> +    ff_rdft_calc(&s->irdft, lpcs);

I think this deserve to be in a separate function (and that would 
include the mirroring), it could be reused in case we need a Hilbert 
transform in another codec. Also I think it should be possible to do it 
with a half as big FFT...

> +/**
> + * Averaging projection filter, the postfilter used in WMAVoice.
> + *
> + * This uses the following steps:
> + * - A zero-synthesis filter (generate excitation from synth signal)
> + * - Kalman smoothing on excitation, based on pitch
> + * - Re-synthesized smoothened output
> + * - Iterative Wiener denoise filter
> + * - Adaptive gain filter
> + * - DC filter
> + *
> + * @param s WMAVoice decoding context
> + * @param synth Speech synthesis output (before postfilter)
> + * @param samples Output buffer for filtered samples
> + * @param size Buffer size of synth & samples
> + * @param lpcs Generated LPCs used for speech synthesis
> + * @param fcb_type Frame type (silence, hardcoded, AW-pulses or FCB-pulses)
> + * @param pitch Pitch of the input signal
> + */
> +static void postfilter(WMAVoiceContext *s, const float *synth,
> +                       float *samples,    int size,
> +                       const float *lpcs, float *zero_exc_pf,
> +                       int fcb_type,      int pitch)

size is always 80, so it's better to fix it with a define.

I'll give a second look at it later when I have the time.

-Vitor



More information about the ffmpeg-devel mailing list