[FFmpeg-devel] [PATCH] add silenceremove filter

Clément Bœsch u at pkh.me
Sun Aug 31 20:05:46 CEST 2014


On Fri, Aug 29, 2014 at 06:22:36PM +0000, Paul B Mahol wrote:
> Signed-off-by: Paul B Mahol <onemda at gmail.com>
> ---
>  doc/filters.texi               |  62 ++++++
>  libavfilter/Makefile           |   1 +
>  libavfilter/af_silenceremove.c | 478 +++++++++++++++++++++++++++++++++++++++++
>  libavfilter/allfilters.c       |   1 +
>  4 files changed, 542 insertions(+)
>  create mode 100644 libavfilter/af_silenceremove.c
> 
> diff --git a/doc/filters.texi b/doc/filters.texi
> index cca15fc..ea7da88 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -1881,6 +1881,68 @@ ffmpeg -i silence.mp3 -af silencedetect=noise=0.0001 -f null -
>  @end example
>  @end itemize
>  
> + at section silenceremove
> +
> +Removes silence from the beginning, middle or end of the audio.

We tend to use the "Remove silence ..." form (no 's').

Also, what is exactly the meaning of "from" here? Shouldn't it be "at the
beginning, in the middle or at the end of..."? (Note: as you know, I'm not
a native English speaker).

> +
> +The filter accepts the following options:
> +
> + at table @option
> + at item start_periods
> +This value is used to indicate if audio should be trimmed at beginning of
> +the audio. A value of zero indicates no silence should be trimmed from the
> +beginning. When specifying an non-zero value, it trims audio up until it

"*a* non-zero value"?

> +finds non-silence. Normally, when trimming silence from beginning of audio
> +the @var{start_periods} will be @code{1} but it can be increased to higher
> +values to trim all audio up to specific count of non-silence periods.
> +

Please specify the default value, ditto below

> + at item start_duration
> +Specify amount of time that non-silence must be detected before it stops

Specify *the* amount of time

> +trimming audio. By increasing the duration, burst of noise can be treated
> +as silence and trimmed off.
> +
> + at item start_threshold
> +This indicate what sample value should be treated as silence. For digital
> +audio, a value of @code{0} may be fine but for audio recorded from analog,
> +you may wish to increase the value to account for background noise.
> +
> + at item stop_periods
> +Set the count for trimming silence from the end of audio.
> +To remove silence from the middle of a file, specify a @var{stop_periods}
> +that is negative. This value is the threated as a positive value and is also

is *then* *treated*?

drop the "also" maybe

> +used to indicate the effect should restart processing as specified by
> + at var{start_periods}, making it suitable for removing periods of silence
> +in the middle of the audio.
> +
> + at item stop_duration
> +Specify a duration of silence that must exist before audio is not copied any
> +more. By specifying a higher duration, silence that is wanted can be left
> +in the audio.
> +
> + at item stop_threshold
> +This indicate what sample value should be treated as silence but for
> +trimming silence from the end of audio.
> +
> + at item leave_silence
> +This indicate that @var{stop_duration} length of audio should be left intact
> +at the beginning of each period of silence.
> +For example, if you want to remove long pauses between words but do not want
> +to remove the pauses completely.
> +
> + at end table
> +

I must say I'm extremely confused at all of this; more examples would be
welcome if you don't mind.

> + at subsection Examples
> +
> + at itemize
> + at item
> +The following example shows how this filter can be used to start recording

*a* recording?

> +that does not contain the delay at the start which usually occurs between

> +pressing the record button and the star of the performance:

start?

[...]
> +typedef struct SilenceRemoveContext {
> +    const AVClass *class;
> +

> +    int mode;

Could be made an enum

[...]
> +static const AVOption silenceremove_options[] = {
> +    { "start_periods",   NULL, OFFSET(start_periods),   AV_OPT_TYPE_INT,      {.i64=0},     0, 9000, FLAGS },
> +    { "start_duration",  NULL, OFFSET(start_duration),  AV_OPT_TYPE_DURATION, {.i64=0},     0, 9000, FLAGS },
> +    { "start_threshold", NULL, OFFSET(start_threshold), AV_OPT_TYPE_DOUBLE,   {.dbl=0},     0,    1, FLAGS },
> +    { "stop_periods",    NULL, OFFSET(stop_periods),    AV_OPT_TYPE_INT,      {.i64=0}, -9000, 9000, FLAGS },
> +    { "stop_duration",   NULL, OFFSET(stop_duration),   AV_OPT_TYPE_DURATION, {.i64=0},     0, 9000, FLAGS },
> +    { "stop_threshold",  NULL, OFFSET(stop_threshold),  AV_OPT_TYPE_DOUBLE,   {.dbl=0},     0,    1, FLAGS },
> +    { "leave_silence",   NULL, OFFSET(leave_silence),   AV_OPT_TYPE_INT,      {.i64=0},     0,    1, FLAGS },
                            ^^^^
                        this column makes me sad :(

> +    { NULL }
> +};
> +
> +AVFILTER_DEFINE_CLASS(silenceremove);
> +
> +#define SILENCE_TRIM        0
> +#define SILENCE_TRIM_FLUSH  1
> +#define SILENCE_COPY        2
> +#define SILENCE_COPY_FLUSH  3
> +#define SILENCE_STOP        4
> +
[...]
> +static int config_input(AVFilterLink *inlink)
> +{
> +    AVFilterContext *ctx = inlink->dst;
> +    SilenceRemoveContext *s = ctx->priv;
> +

> +    s->window_size = (inlink->sample_rate / 50) * inlink->channels;

what is 50?

> +    s->window = av_malloc_array(s->window_size, sizeof(*s->window));
> +
> +    clear_rms(s);
> +
> +    s->start_duration = av_rescale(s->start_duration, inlink->sample_rate,
> +                                   AV_TIME_BASE);
> +    s->stop_duration  = av_rescale(s->stop_duration, inlink->sample_rate,
> +                                   AV_TIME_BASE);
> +
> +    if (s->start_duration) {
> +        s->start_holdoff = av_malloc_array(s->start_duration,
> +                                           sizeof(*s->start_holdoff) *
> +                                           inlink->channels);
> +        if (!s->start_holdoff)
> +            return AVERROR(ENOMEM);
> +    }
> +
> +    s->start_holdoff_offset = 0;
> +    s->start_holdoff_end    = 0;
> +    s->start_found_periods  = 0;
> +
> +    if (s->stop_duration) {
> +        s->stop_holdoff = av_malloc_array(s->stop_duration,
> +                                          sizeof(*s->stop_holdoff) *
> +                                          inlink->channels);
> +        if (!s->stop_holdoff)
> +            return AVERROR(ENOMEM);
> +    }
> +
> +    s->stop_holdoff_offset = 0;
> +    s->stop_holdoff_end    = 0;
> +    s->stop_found_periods  = 0;
> +
> +    if (s->start_periods)
> +        s->mode = SILENCE_TRIM;
> +    else
> +        s->mode = SILENCE_COPY;
> +
> +    return 0;
> +}
> +
> +static int config_output(AVFilterLink *outlink)
> +{
> +    outlink->flags |= FF_LINK_FLAG_REQUEST_LOOP;
> +
> +    return 0;
> +}
> +

nit: you probably can merge the 2 config callbacks

> +static int above_threshold(double value, double threshold)
> +{
> +    return value > threshold;
> +}
> +

Is this really necessary?

> +static double compute_rms(SilenceRemoveContext *s, double sample)
> +{
> +    double new_sum;
> +
> +    new_sum  = s->rms_sum;
> +    new_sum -= *s->window_current;
> +    new_sum += sample * sample;
> +

> +    return sqrt(new_sum / s->window_size);

Can't you remove the sqrt() and the div by updating the threshold
reference instead once (th=square(th*s->window_size)) ? I believe it will
kind of make the code faster.

> +}
> +
> +static void update_rms(SilenceRemoveContext *s, double sample)
> +{
> +    s->rms_sum -= *s->window_current;
> +    *s->window_current = sample * sample;
> +    s->rms_sum += *s->window_current;
> +
> +    s->window_current++;
> +    if (s->window_current >= s->window_end)
> +        s->window_current = s->window;
> +}
> +

> +static void flush(AVFrame *out, AVFilterLink *outlink,
> +                  int *nb_samples_written, int *ret)
> +{
> +    if (*nb_samples_written) {
> +        out->nb_samples = *nb_samples_written / outlink->channels;
> +        *ret = ff_filter_frame(outlink, out);
> +        *nb_samples_written = 0;
> +    } else {
> +        av_frame_free(&out);
> +    }
> +}

I suppose you don't want to return the ret value because it avoids
overwriting its already set value?

> +
> +static int filter_frame(AVFilterLink *inlink, AVFrame *in)
> +{
> +    AVFilterContext *ctx = inlink->dst;
> +    AVFilterLink *outlink = ctx->outputs[0];
> +    SilenceRemoveContext *s = ctx->priv;
> +    int i, j, threshold, ret = 0;
> +    int nbs, nb_samples_read, nb_samples_written;
> +    double *obuf, *ibuf = (double *)in->data[0];
> +    AVFrame *out;
> +
> +    nb_samples_read = nb_samples_written = 0;
> +
> +    switch (s->mode) {
> +    case SILENCE_TRIM:

> +silence_trim:

Can't you split all these cases/labels into function and do return func()
instead of all these weird gotos indirections?

[...]
> +static av_cold void uninit(AVFilterContext *ctx)
> +{
> +    SilenceRemoveContext *s = ctx->priv;
> +
> +    av_freep(&s->start_holdoff);
> +    av_freep(&s->stop_holdoff);

I misses the free of window I believe

[...]

No other comment from me, assuming updates for Changelog & RELEASE_NOTES
and avfilter bump.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140831/3c85542b/attachment.asc>


More information about the ffmpeg-devel mailing list