[FFmpeg-devel] [RFC] libavfilter audio API and related issues

Wed Jun 16 01:00:26 CEST 2010

On date Sunday 2010-06-13 19:20:55 -0700, S.N. Hemanth Meenakshisundaram encoded:
> 
> On 06/05/2010 01:24 PM, S.N. Hemanth Meenakshisundaram wrote:
> >
> >>>>>On 05/02/2010 12:08 PM, Stefano Sabatini wrote:
> >>>>>>On date Wednesday 2010-04-28 07:07:54 +0000, S.N. Hemanth
> >>>>>>Meenakshisundaram encoded:
> >>>>>>
> >>>>>>>Stefano Sabatini<stefano.sabatini-lala<at>   poste.it>   writes:
> >>>>>>>
> >>>>>>>>Follow some notes about a possible design for the audio support in
> >>>>>>>>libavfilter.
> >>>>>>>>[...]
> >Hi All,
> >
> >Attached is a rough draft of the audio API. [...]
> 
> Hi All,
> 
> I now have a working audio filter framework. Made some changes to
> the audio API from last week and the earlier ffplay changes based on
> comments and some problems encountered along the way. Now when
> playing a video/audio file, the input and output filters are created
> and the audio data passes through these two filters and plays out.
> The video filters work without any breakages.
> 
> Here are the changes and comments incorporated:
> 
> 1. AVFilterSamplesRef now has a pts field as suggested. I can't find
> a pos field for audio in existing code so I haven't added that yet.

Maybe pts should be moved to AVFilterBuffer, same for pos, see also
the Michael comment.

> 2. avfilter_get_audio_buffer now takes buffer size as an input
> parameter instead of samples_nb (number of samples) since buffer
> size is readily available from audio_decode_frame. samples_nb is
> calculated based on buffer size and sample format+channel layout.
> 
> 3. Moved around some code in ffplay to initialize audio filters only
> after audio decoder is initialized.

Yes (BTW I have some patches around which I need to test more, parsing
should be done before the actual playing is done, and make ffplay fail
if the parsing fail).

> 4. Minor additions to avfiltergraph.c to support audio filters.
> 
> 5. Made macros for some repeated code in formats.c and using memset
> to set the values of the 'data[8]' array in defaults.c as Stefano
> suggested. Also, in case of packed formats, all data pointers of
> valid channels now point to the single monolithic buffer. Some more
> macros may be possible in formats.c
> 
> 6. Some other nits and alignments pointed out earlier.
> 
> Some questions:
> 
> 1. Audio decode frame uses parameters like num_channels, sample_fmt
> etc from 'is->audio_st->codec' (the audio codec context). That is
> also where I read these parameters and put them in the audio buffers
> given to the filter chain. So if these parameters change between
> audio frames, will the new values reflect in the codec context?

Do you mean if they should be set again in the encoder? I don't have a
deep knowledge of lavc so I cannot say if this is effectively
supported, I remember that changing encoding parameter on the fly is
not supported (in that case you need to re-set the encoder).

> 2. I can't find a variable indicating whether audio data is planar
> or packed. Is there one? I am assuming it is always packed right
> now. How can I find out whether a frame returned by
> audio_decode_frame is planar or packed?

I suppose they're always packed, but I really don't remember, that's
for sure this should be clearly documented. Also I'd like to move the
sample definitions to lavu (samplefmt.h/sampledesc.h?), like it is
done for the pixel formats, this would avoid a compile-time dependency
of lavfi in lavc.

> 3. The audio decoder's codec context has two variables num_channels
> and channel_layout. num_channels is the one currently used by
> audio_decode_frame when calculating number of samples in a frame
> etc. Will the decoders all populate channel_layout as well
> correctly?

Channel layout has been introduced recently, so expect some problem
with it (it may be left unset), but again I leave to reply someone
with a better lavc knowledge.

> 4. Currently, audio_decode_frame always outputs frames with a 16-bit
> sample size. If the codec output is in a different format, it calls
> a conversion function. I guess the plan is to eventually remove this
> and replace it with an output filter that can output in any required
> sample format?

Yes.

> Please review and comment.
> 
> Regards,
> Hemanth
> 

> Index: ffplay.c
> ===================================================================
> --- ffplay.c	(revision 23486)
> +++ ffplay.c	(working copy)
> @@ -105,6 +105,7 @@
>  
>  #if CONFIG_AVFILTER
>      AVFilterPicRef *picref;
> +    AVFilterSamplesRef *samplesref;
>  #endif
>  } VideoPicture;
>  
> @@ -209,6 +210,8 @@
>  
>  #if CONFIG_AVFILTER
>      AVFilterContext *out_video_filter;          ///<the last filter in the video chain
> +    AVFilterContext *out_audio_filter;          ///<the last filter in the audio chain
> +    AVFilterGraph *agraph;
>  #endif
>  
>      float skip_frames;
> @@ -218,6 +221,7 @@
>  
>  static void show_help(void);
>  static int audio_write_get_buf_size(VideoState *is);
> +static int audio_decode_frame(VideoState *is, double *pts_ptr);
>  
>  /* options specified by the user */
>  static AVInputFormat *file_iformat;
> @@ -265,6 +269,7 @@
>  static int rdftspeed=20;
>  #if CONFIG_AVFILTER
>  static char *vfilters = NULL;
> +static char *afilters = NULL;
>  #endif
>  
>  /* current context */
> @@ -1775,6 +1780,131 @@
>                                    { .name = NULL }},
>      .outputs   = (AVFilterPad[]) {{ .name = NULL }},
>  };
> +
> +typedef struct {
> +    VideoState *is;
> +} AudioFilterPriv;
> +
> +static int input_audio_init(AVFilterContext *ctx, const char *args, void *opaque)
> +{
> +    AudioFilterPriv *priv = ctx->priv;
> +
> +    if(!opaque) return -1;

AVERROR(EINVAL)

> +    priv->is = opaque;
> +
> +    return 0;
> +}
> +
> +static void input_audio_uninit(AVFilterContext *ctx)
> +{
> +}
> +
> +static int input_request_samples(AVFilterLink *link)
> +{
> +    AudioFilterPriv *priv = link->src->priv;
> +    AVFilterSamplesRef *samplesref;
> +    AVCodecContext *c;

please: c -> avctx

> +    double pts = 0;
> +    int buf_size = 0;
> +
> +    buf_size = audio_decode_frame(priv->is, &pts);
> +    c = priv->is->audio_st->codec;
> +    if (buf_size <= 0)
> +        return -1;
> +
> +    /* FIXME Currently audio streams seem to have no info on planar/packed.
> +     * Assuming packed here and passing 0 as last attribute to get_audio_buffer.
> +     */
> +    samplesref = avfilter_get_audio_buffer(link, AV_PERM_WRITE, buf_size,
> +                                           c->channel_layout, c->sample_fmt, 0);
> +    memcpy(samplesref->data[0], priv->is->audio_buf, buf_size);
> +
> +    samplesref->pts         = (int64_t) pts;
> +    samplesref->sample_rate = (int64_t) c->sample_rate;
> +    avfilter_filter_samples(link, samplesref);
> +
> +    return 0;
> +}
> +
> +static int input_query_audio_formats(AVFilterContext *ctx)
> +{
> +    AudioFilterPriv *priv = ctx->priv;
> +    enum SampleFormat sample_fmts[] = {
> +        priv->is->audio_st->codec->sample_fmt, SAMPLE_FMT_NONE
> +    };
> +
> +    avfilter_set_common_formats(ctx, avfilter_make_aformat_list(sample_fmts));
> +    return 0;
> +}
> +
> +static int input_config_audio_props(AVFilterLink *link)
> +{
> +    return 0;
> +}
> +
> +static AVFilter input_audio_filter =
> +{
> +    .name      = "ffplay_audio_input",
> +
> +    .priv_size = sizeof(AudioFilterPriv),
> +
> +    .init      = input_audio_init,
> +    .uninit    = input_audio_uninit,
> +
> +    .query_formats = input_query_audio_formats,
> +
> +    .inputs    = (AVFilterPad[]) {{ .name = NULL }},
> +    .outputs   = (AVFilterPad[]) {{ .name = "default",
> +                                    .type = AVMEDIA_TYPE_AUDIO,
> +                                    .request_samples = input_request_samples,
> +                                    .config_props  = input_config_audio_props, },
> +                                  { .name = NULL }},
> +};
> +
> +static void output_filter_samples(AVFilterLink *link, AVFilterSamplesRef *samplesref)
> +{
> +}
> +
> +static int output_query_audio_formats(AVFilterContext *ctx)
> +{
> +    enum SampleFormat sample_fmts[] = { SAMPLE_FMT_S16, SAMPLE_FMT_NONE };
> +
> +    avfilter_set_common_formats(ctx, avfilter_make_aformat_list(sample_fmts));
> +    return 0;
> +}
> +
> +static int get_filtered_audio_samples(AVFilterContext *ctx, VideoState *is, double *pts)
> +{
> +    AVFilterSamplesRef *samplesref;
> +
> +    if(avfilter_request_samples(ctx->inputs[0]))
> +        return -1;

nit: if (...)

also propagate the error code.

> +    if(!(samplesref = ctx->inputs[0]->cur_samples))
> +        return -1;

maybe a better error code, I can't suggest a better one right now...

> +    ctx->inputs[0]->cur_samples = NULL;
> +
> +    *pts          = samplesref->pts;
> +
> +    memcpy(is->audio_buf1, samplesref->data[0], samplesref->size);
> +    is->audio_buf = is->audio_buf1;
> +
> +    return samplesref->size;
> +}
> +
> +static AVFilter output_audio_filter =
> +{
> +    .name      = "ffplay_audio_output",
> +
> +    .query_formats = output_query_audio_formats,
> +
> +    .inputs    = (AVFilterPad[]) {{ .name           = "default",
> +                                    .type           = AVMEDIA_TYPE_AUDIO,
> +                                    .filter_samples = output_filter_samples,
> +                                    .min_perms      = AV_PERM_READ, },
> +                                  { .name = NULL }},
> +    .outputs   = (AVFilterPad[]) {{ .name = NULL }},
> +};
>  #endif  /* CONFIG_AVFILTER */
>  
>  static int video_thread(void *arg)
> @@ -2166,7 +2296,11 @@
>  
>      while (len > 0) {
>          if (is->audio_buf_index >= is->audio_buf_size) {
> +#if CONFIG_AVFILTER
> +           audio_size = get_filtered_audio_samples(is->out_audio_filter, is, &pts);
> +#else
>             audio_size = audio_decode_frame(is, &pts);
> +#endif
>             if (audio_size < 0) {
>                  /* if error, just output silence */
>                 is->audio_buf = is->audio_buf1;
> @@ -2198,6 +2332,9 @@
>      AVCodecContext *avctx;
>      AVCodec *codec;
>      SDL_AudioSpec wanted_spec, spec;
> +#if CONFIG_AVFILTER
> +    AVFilterContext *afilt_src = NULL, *afilt_out = NULL;
> +#endif
>  
>      if (stream_index < 0 || stream_index >= ic->nb_streams)
>          return -1;
> @@ -2266,6 +2403,46 @@
>          is->audio_diff_threshold = 2.0 * SDL_AUDIO_BUFFER_SIZE / avctx->sample_rate;
>  
>          memset(&is->audio_pkt, 0, sizeof(is->audio_pkt));
> +

> +#if CONFIG_AVFILTER
> +    is->agraph = av_mallocz(sizeof(AVFilterGraph));
> +    if(!(afilt_src = avfilter_open(&input_audio_filter,  "asrc")))   goto the_end;
> +    if(!(afilt_out = avfilter_open(&output_audio_filter, "aout")))   goto the_end;
> +
> +    if(avfilter_init_filter(afilt_src, NULL, is))             goto the_end;
> +    if(avfilter_init_filter(afilt_out, NULL, NULL))           goto the_end;
> +
> +    if(afilters) {

nits: if_(...)

> +        AVFilterInOut *outputs = av_malloc(sizeof(AVFilterInOut));
> +        AVFilterInOut *inputs  = av_malloc(sizeof(AVFilterInOut));
> +
> +        outputs->name    = av_strdup("ain");
> +        outputs->filter  = afilt_src;
> +        outputs->pad_idx = 0;
> +        outputs->next    = NULL;
> +
> +        inputs->name    = av_strdup("aout");
> +        inputs->filter  = afilt_out;
> +        inputs->pad_idx = 0;
> +        inputs->next    = NULL;
> +
> +        if (avfilter_graph_parse(is->agraph, afilters, inputs, outputs, NULL) < 0)
> +            goto the_end;
> +        av_freep(&afilters);
> +    } else {
> +        if(avfilter_link(afilt_src, 0, afilt_out, 0) < 0)          goto the_end;
> +    }
> +    avfilter_graph_add_filter(is->agraph, afilt_src);
> +    avfilter_graph_add_filter(is->agraph, afilt_out);
> +
> +    if(avfilter_graph_check_validity(is->agraph, NULL))           goto the_end;
> +    if(avfilter_graph_config_formats(is->agraph, NULL))           goto the_end;
> +    if(avfilter_graph_config_links(is->agraph, NULL))             goto the_end;

nits: if_(...)

Also:
if ((ret = avfilter...())) goto end;
end:
 ...
 return ret;

> +
> +    is->out_audio_filter = afilt_out;
> +#endif
> +
>          packet_queue_init(&is->audioq);
>          SDL_PauseAudio(0);
>          break;
> @@ -2289,6 +2466,12 @@
>          break;
>      }
>      return 0;
> +#if CONFIG_AVFILTER
> +the_end:
> +    avfilter_graph_destroy(is->agraph);
> +    av_freep(&(is->agraph));
> +    return -1;
> +#endif
>  }
>  
>  static void stream_component_close(VideoState *is, int stream_index)
> @@ -2310,6 +2493,10 @@
>          if (is->reformat_ctx)
>              av_audio_convert_free(is->reformat_ctx);
>          is->reformat_ctx = NULL;
> +#if CONFIG_AVFILTER
> +        avfilter_graph_destroy(is->agraph);
> +        av_freep(&(is->agraph));
> +#endif
>          break;
>      case AVMEDIA_TYPE_VIDEO:
>          packet_queue_abort(&is->videoq);
> @@ -3069,6 +3256,7 @@
>      { "window_title", OPT_STRING | HAS_ARG, {(void*)&window_title}, "set window title", "window title" },
>  #if CONFIG_AVFILTER
>      { "vf", OPT_STRING | HAS_ARG, {(void*)&vfilters}, "video filters", "filter list" },
> +    { "af", OPT_STRING | HAS_ARG, {(void*)&afilters}, "audio filters", "filter list" },
>  #endif
>      { "rdftspeed", OPT_INT | HAS_ARG| OPT_AUDIO | OPT_EXPERT, {(void*)&rdftspeed}, "rdft speed", "msecs" },
>      { "default", OPT_FUNC2 | HAS_ARG | OPT_AUDIO | OPT_VIDEO | OPT_EXPERT, {(void*)opt_default}, "generic catch all option", "" },
> Index: libavfilter/avfiltergraph.c
> ===================================================================
> --- libavfilter/avfiltergraph.c	(revision 23486)
> +++ libavfilter/avfiltergraph.c	(working copy)
> @@ -170,7 +170,10 @@
>          return;
>  
>      link->in_formats->format_count = 1;
> -    link->format = link->in_formats->formats[0];
> +    if (link->type == AVMEDIA_TYPE_VIDEO)
> +        link->format = link->in_formats->formats[0];
> +    else if (link->type == AVMEDIA_TYPE_AUDIO)
> +        link->aformat = link->in_formats->aformats[0];
>  
>      avfilter_formats_unref(&link->in_formats);
>      avfilter_formats_unref(&link->out_formats);
> Index: libavfilter/defaults.c
> ===================================================================
> --- libavfilter/defaults.c	(revision 23486)
> +++ libavfilter/defaults.c	(working copy)
> @@ -21,6 +21,7 @@
>  
>  #include "libavcodec/imgconvert.h"
>  #include "avfilter.h"
> +#include "libavcodec/audioconvert.h"
>  
>  /* TODO: buffer pool.  see comment for avfilter_default_get_video_buffer() */
>  static void avfilter_default_free_video_buffer(AVFilterPic *pic)
> @@ -29,6 +30,12 @@
>      av_free(pic);
>  }
>  
> +static void avfilter_default_free_audio_buffer(AVFilterBuffer *buffer)

I don't think there should be a function to free an *audio* buffer,
AVFilterBuffer is generic so that should be avfilter_default_free_buffer().

> +{
> +    av_free(buffer->data[0]);
> +    av_free(buffer);
> +}
> +
>  /* TODO: set the buffer's priv member to a context structure for the whole
>   * filter chain.  This will allow for a buffer pool instead of the constant
>   * alloc & free cycle currently implemented. */
> @@ -65,6 +72,66 @@
>      return ref;
>  }
>  
> +AVFilterSamplesRef *avfilter_default_get_audio_buffer(AVFilterLink *link, int perms,
> +                                                      int size, int64_t channel_layout,
> +                                                      enum SampleFormat sample_fmt, int planar)

naming nit: avfilter_default_get_samples_ref should be less confusing
(yes the corresponding video function maybe should be renamed as
well).

> +{
> +    AVFilterBuffer *buffer = av_mallocz(sizeof(AVFilterBuffer));
> +    AVFilterSamplesRef *ref = av_mallocz(sizeof(AVFilterSamplesRef));
> +    int i, sampsize, numchan, bufsize, per_channel_size, stepsize = 0;

sample_size, num_chans, step_size, easier for non-native.

> +    char *buf;
> +
> +    ref->buffer         = buffer;
> +    ref->channel_layout = channel_layout;
> +    ref->sample_fmt     = sample_fmt;
> +    ref->size           = size;
> +    ref->planar         = planar;
> +
> +    /* make sure the buffer gets read permission or it's useless for output */
> +    ref->perms = perms | AV_PERM_READ;
> +
> +    buffer->refcount   = 1;
> +    buffer->free       = avfilter_default_free_audio_buffer;
> +

> +    sampsize = (av_get_bits_per_sample_format(sample_fmt))>>3;

superfluous ( ).

> +    numchan = avcodec_channel_layout_num_channels(channel_layout);
> +
> +    per_channel_size = size/numchan;
> +    ref->samples_nb = per_channel_size/sampsize;
> +
> +    /* Set the number of bytes to traverse to reach next sample of a particular channel:
> +     * For planar, this is simply the sample size.
> +     * For packed, this is the number of samples * sample_size.
> +     */

> +    for (i = 0; i < numchan; i++)
> +        buffer->linesize[i] = (planar > 0)?(per_channel_size):sampsize;
> +    for (i = numchan+1; i < 8; i++)
> +        buffer->linesize[i] = 0;

memset

> +
> +    /* Calculate total buffer size, round to multiple of 16 to be SIMD friendly */
> +    bufsize = (size + 15)&~15;
> +    buf = av_malloc(bufsize);
> +
> +    /* For planar, set the start point of each channel's data within the buffer
> +     * For packed, set the start point of the entire buffer only
> +     */
> +    buffer->data[0] = buf;
> +    if(planar > 0) {
> +        for(i = 1; i < numchan; i++) {
> +            stepsize += per_channel_size;
> +            buffer->data[i] = buf + stepsize;
> +        }
> +    } else {
> +        memset(&buffer->data[1], (long)buf, (numchan-1)*sizeof(buffer->data[0]));
> +    }
> +    memset(&buffer->data[numchan], 0, (8-numchan)*sizeof(buffer->data[0]));
> +
> +    memcpy(ref->data,     buffer->data,     sizeof(buffer->data));
> +    memcpy(ref->linesize, buffer->linesize, sizeof(buffer->linesize));
> +
> +    return ref;
> +}
> +
>  void avfilter_default_start_frame(AVFilterLink *link, AVFilterPicRef *picref)
>  {
>      AVFilterLink *out = NULL;
> @@ -113,6 +180,23 @@
>      }
>  }
>  
> +void avfilter_default_filter_samples(AVFilterLink *link, AVFilterSamplesRef *samplesref)
> +{
> +    AVFilterLink *out = NULL;
> +
> +    if(link->dst->output_count)
> +        out = link->dst->outputs[0];
> +
> +    if(out) {
> +        out->outsamples = avfilter_default_get_audio_buffer(link, AV_PERM_WRITE, samplesref->size,
> +                                                            samplesref->channel_layout,
> +                                                            samplesref->sample_fmt, samplesref->planar);
> +        out->outsamples->pts            = samplesref->pts;
> +        out->outsamples->sample_rate    = samplesref->sample_rate;
> +        avfilter_filter_samples(out, avfilter_ref_samples(out->outsamples, ~0));
> +    }
> +}

nits: if_(...) (here and in the rest of the patch please)...

>  /**
>   * default config_link() implementation for output video links to simplify
>   * the implementation of one input one output video filters */
> @@ -157,6 +241,7 @@
>  
>      if(!count) {
>          av_free(formats->formats);
> +        av_free(formats->aformats);
>          av_free(formats->refs);
>          av_free(formats);
>      }
> @@ -183,8 +268,21 @@
>      avfilter_end_frame(link->dst->outputs[0]);
>  }
>  
> +void avfilter_null_filter_samples(AVFilterLink *link, AVFilterSamplesRef *samplesref)
> +{
> +    avfilter_filter_samples(link->dst->outputs[0], samplesref);
> +}
> +
>  AVFilterPicRef *avfilter_null_get_video_buffer(AVFilterLink *link, int perms, int w, int h)
>  {
>      return avfilter_get_video_buffer(link->dst->outputs[0], perms, w, h);
>  }
>  
> +AVFilterSamplesRef *avfilter_null_get_audio_buffer(AVFilterLink *link, int perms, int size,
> +                                                   int64_t channel_layout,
> +                                                   enum SampleFormat sample_fmt, int packed)
> +{
> +    return avfilter_get_audio_buffer(link->dst->outputs[0], perms, size,
> +                                     channel_layout, sample_fmt, packed);
> +}
> +
> Index: libavfilter/formats.c
> ===================================================================
> --- libavfilter/formats.c	(revision 23486)
> +++ libavfilter/formats.c	(working copy)
> @@ -39,6 +39,13 @@
>      av_free(a);
>  }
>  
> +#define CMP_AND_ADD(acount, bcount, afmt, bfmt, retfmt) { \
> +    for(i = 0; i < acount; i++) \
> +        for(j = 0; j < bcount; j++) \
> +            if(afmt[i] == bfmt[j]) \
> +                retfmt[k++] = afmt[i]; \
> +}        
> +
>  AVFilterFormats *avfilter_merge_formats(AVFilterFormats *a, AVFilterFormats *b)
>  {
>      AVFilterFormats *ret;
> @@ -46,13 +53,17 @@
>  
>      ret = av_mallocz(sizeof(AVFilterFormats));
>  
> -    /* merge list of formats */
> -    ret->formats = av_malloc(sizeof(*ret->formats) * FFMIN(a->format_count,
> +    if(a->type == AVMEDIA_TYPE_VIDEO) {
> +        /* merge list of formats */
> +        ret->formats = av_malloc(sizeof(*ret->formats) * FFMIN(a->format_count,
>                                                             b->format_count));
> -    for(i = 0; i < a->format_count; i ++)
> -        for(j = 0; j < b->format_count; j ++)
> -            if(a->formats[i] == b->formats[j])
> -                ret->formats[k++] = a->formats[i];
> +        CMP_AND_ADD(a->format_count, b->format_count, a->formats, b->formats, ret->formats);
> +    } else if(a->type == AVMEDIA_TYPE_AUDIO) {
> +        /* merge list of formats */
> +        ret->aformats = av_malloc(sizeof(*ret->aformats) * FFMIN(a->format_count,
> +                                                           b->format_count));
> +        CMP_AND_ADD(a->format_count, b->format_count, a->aformats, b->aformats, ret->aformats);
> +    }
>  
>      ret->format_count = k;
>      /* check that there was at least one common format */
> @@ -81,6 +92,7 @@
>      formats               = av_mallocz(sizeof(AVFilterFormats));
>      formats->formats      = av_malloc(sizeof(*formats->formats) * count);
>      formats->format_count = count;
> +    formats->type = AVMEDIA_TYPE_VIDEO;
>      memcpy(formats->formats, pix_fmts, sizeof(*formats->formats) * count);
>  
>      return formats;
> @@ -115,6 +127,51 @@
>      return ret;
>  }
>  
> +AVFilterFormats *avfilter_make_aformat_list(const enum SampleFormat *sample_fmts)
> +{
> +    AVFilterFormats *formats;
> +    int count;
> +
> +    for (count = 0; sample_fmts[count] != SAMPLE_FMT_NONE; count++)
> +        ;
> +
> +    formats               = av_mallocz(sizeof(AVFilterFormats));
> +    formats->aformats     = av_malloc(sizeof(*formats->aformats) * count);
> +    formats->format_count = count;
> +    formats->type = AVMEDIA_TYPE_AUDIO;
> +    memcpy(formats->aformats, sample_fmts, sizeof(*formats->aformats) * count);
> +
> +    return formats;
> +}
> +
> +int avfilter_add_sampleformat(AVFilterFormats **avff, enum SampleFormat sample_fmt)
> +{
> +    enum SampleFormat *sample_fmts;
> +
> +    if (!(*avff) && !(*avff = av_mallocz(sizeof(AVFilterFormats))))
> +        return AVERROR(ENOMEM);
> +
> +    sample_fmts = av_realloc((*avff)->aformats,
> +                          sizeof((*avff)->aformats) * ((*avff)->format_count+1));
> +    if (!sample_fmts)
> +        return AVERROR(ENOMEM);
> +
> +    (*avff)->aformats = sample_fmts;
> +    (*avff)->aformats[(*avff)->format_count++] = sample_fmt;
> +    return 0;
> +}
> +
> +AVFilterFormats *avfilter_all_sampleformats(void)
> +{
> +    AVFilterFormats *ret = NULL;
> +    enum SampleFormat sample_fmt;
> +
> +    for (sample_fmt = 0; sample_fmt < SAMPLE_FMT_NB; sample_fmt++)
> +        avfilter_add_sampleformat(&ret, sample_fmt);
> +
> +    return ret;
> +}
> +
>  void avfilter_formats_ref(AVFilterFormats *f, AVFilterFormats **ref)
>  {
>      *ref = f;
> @@ -146,6 +203,7 @@
>  
>      if(!--(*ref)->refcount) {
>          av_free((*ref)->formats);
> +        av_free((*ref)->aformats);
>          av_free((*ref)->refs);
>          av_free(*ref);
>      }
> Index: libavfilter/avfilter.c
> ===================================================================
> --- libavfilter/avfilter.c	(revision 23486)
> +++ libavfilter/avfilter.c	(working copy)
> @@ -60,6 +60,22 @@
>      av_free(ref);
>  }
>  
> +AVFilterSamplesRef *avfilter_ref_samples(AVFilterSamplesRef *ref, int pmask)
> +{
> +    AVFilterSamplesRef *ret = av_malloc(sizeof(AVFilterSamplesRef));
> +    *ret = *ref;
> +    ret->perms &= pmask;
> +    ret->buffer->refcount++;
> +    return ret;
> +}
> +
> +void avfilter_unref_samples(AVFilterSamplesRef *ref)
> +{
> +    if(!(--ref->buffer->refcount))
> +        ref->buffer->free(ref->buffer);
> +    av_free(ref);
> +}
> +
>  void avfilter_insert_pad(unsigned idx, unsigned *count, size_t padidx_off,
>                           AVFilterPad **pads, AVFilterLink ***links,
>                           AVFilterPad *newpad)
> @@ -97,7 +113,9 @@
>      link->dst     = dst;
>      link->srcpad  = srcpad;
>      link->dstpad  = dstpad;
> +    link->type    = src->output_pads[srcpad].type;
>      link->format  = PIX_FMT_NONE;
> +    link->aformat = SAMPLE_FMT_NONE;
>  
>      return 0;
>  }
> @@ -210,6 +228,20 @@
>      return ret;
>  }
>  
> +AVFilterSamplesRef *avfilter_get_audio_buffer(AVFilterLink *link, int perms, int size,
> +                                              int64_t channel_layout, enum SampleFormat sample_fmt, int planar)
> +{
> +    AVFilterSamplesRef *ret = NULL;
> +
> +    if(link_dpad(link).get_audio_buffer)
> +        ret = link_dpad(link).get_audio_buffer(link, perms, size, channel_layout, sample_fmt, planar);
> +
> +    if(!ret)
> +        ret = avfilter_default_get_audio_buffer(link, perms, size, channel_layout, sample_fmt, planar);
> +
> +    return ret;
> +}
> +
>  int avfilter_request_frame(AVFilterLink *link)
>  {
>      DPRINTF_START(NULL, request_frame); dprintf_link(NULL, link, 1);
> @@ -221,6 +253,14 @@
>      else return -1;
>  }
>  
> +int avfilter_request_samples(AVFilterLink *link)
> +{
> +    if(link_spad(link).request_samples)
> +        return link_spad(link).request_samples(link);
> +    else if(link->src->inputs[0])
> +        return avfilter_request_samples(link->src->inputs[0]);
> +    else return AVERROR(EINVAL);
> +}
>  int avfilter_poll_frame(AVFilterLink *link)
>  {
>      int i, min=INT_MAX;
> @@ -334,6 +374,31 @@
>      draw_slice(link, y, h, slice_dir);
>  }
>  
> +void avfilter_filter_samples(AVFilterLink *link, AVFilterSamplesRef *samplesref)
> +{
> +    void (*filter_samples)(AVFilterLink *, AVFilterSamplesRef *);
> +    AVFilterPad *dst = &link_dpad(link);
> +
> +    if(!(filter_samples = dst->filter_samples))
> +        filter_samples = avfilter_default_filter_samples;
> +
> +    /* prepare to copy the samples if the buffer has insufficient permissions */
> +    if((dst->min_perms & samplesref->perms) != dst->min_perms ||
> +        dst->rej_perms & samplesref->perms) {
> +
> +        link->cur_samples = avfilter_default_get_audio_buffer(link, dst->min_perms,
> +                                                              samplesref->size, samplesref->channel_layout,
> +                                                              samplesref->sample_fmt, samplesref->planar);
> +        link->cur_samples->pts            = samplesref->pts;
> +        link->cur_samples->sample_rate    = samplesref->sample_rate;
> +        avfilter_unref_samples(samplesref);
> +    }
> +    else
> +        link->cur_samples = samplesref;
> +
> +    filter_samples(link, link->cur_samples);
> +}
> +
>  #define MAX_REGISTERED_AVFILTERS_NB 64
>  
>  static AVFilter *registered_avfilters[MAX_REGISTERED_AVFILTERS_NB + 1];
> Index: libavfilter/avfilter.h
> ===================================================================
> --- libavfilter/avfilter.h	(revision 23486)
> +++ libavfilter/avfilter.h	(working copy)
> @@ -36,6 +36,12 @@
>                                             LIBAVFILTER_VERSION_MICRO)
>  #define LIBAVFILTER_BUILD       LIBAVFILTER_VERSION_INT
>  
> +#define LIBAVFILTER_AUDIO_BUF_SIZE(ref) { \
> +    int num_channels = avcodec_channel_layout_num_channels(ref->channel_layout); \
> +    int bytes_per_sample = (av_get_bits_per_sample_format(ref->sample_fmt))>>3; \
> +    ref->size = ref->samples_nb * num_channels * bytes_per_sample; \
> +}
> +
>  #include <stddef.h>
>  #include "libavcodec/avcodec.h"
>  
> @@ -88,6 +94,29 @@
>      int w, h;                  ///< width and height of the allocated buffer
>  } AVFilterPic;
>  
> +/*
> + * Temporary structure for audio data used by the filter system. Later to be
> + * merged with FilterPic above and generalized.
> + */
> +typedef struct AVFilterBuffer
> +{
> +    uint8_t *data[8];           ///< audio data for each channel
> +    int linesize[8];            ///< number of bytes to next sample
> +
> +    unsigned refcount;          ///< number of references to this buffer
> +
> +    /** private data to be used by a custom free function */
> +    void *priv;
> +    /**
> +     * A pointer to the function to deallocate this buffer if the default
> +     * function is not sufficient. This could, for example, add the memory
> +     * back into a memory pool to be reused later without the overhead of
> +     * reallocating it from scratch.
> +     */
> +    void (*free)(struct AVFilterBuffer *buffer);
> +
> +} AVFilterBuffer;

Maybe pts and pos should be added here.

>  /**
>   * A reference to an AVFilterPic. Since filters can manipulate the origin of
>   * a picture to, for example, crop image without any memcpy, the picture origin
> @@ -96,31 +125,57 @@
>   *
>   * TODO: add anything necessary for frame reordering
>   */
> +#define AV_PERM_READ     0x01   ///< can read from the buffer
> +#define AV_PERM_WRITE    0x02   ///< can write to the buffer
> +#define AV_PERM_PRESERVE 0x04   ///< nobody else can overwrite the buffer
> +#define AV_PERM_REUSE    0x08   ///< can output the buffer multiple times, with the same contents each time
> +#define AV_PERM_REUSE2   0x10   ///< can output the buffer multiple times, modified each time
>  typedef struct AVFilterPicRef
>  {
>      AVFilterPic *pic;           ///< the picture that this is a reference to
>      uint8_t *data[4];           ///< picture data for each plane
>      int linesize[4];            ///< number of bytes per line

> +    int64_t pts;                ///< presentation timestamp in units of 1/AV_TIME_BASE
> +    int64_t pos;                ///< byte position in stream, -1 if unknown
> +
>      int w;                      ///< image width
>      int h;                      ///< image height
>  
> -    int64_t pts;                ///< presentation timestamp in units of 1/AV_TIME_BASE
> -    int64_t pos;                ///< byte position in stream, -1 if unknown

ehm...

> -
>      AVRational pixel_aspect;    ///< pixel aspect ratio
>  
>      int perms;                  ///< permissions
> -#define AV_PERM_READ     0x01   ///< can read from the buffer
> -#define AV_PERM_WRITE    0x02   ///< can write to the buffer
> -#define AV_PERM_PRESERVE 0x04   ///< nobody else can overwrite the buffer
> -#define AV_PERM_REUSE    0x08   ///< can output the buffer multiple times, with the same contents each time
> -#define AV_PERM_REUSE2   0x10   ///< can output the buffer multiple times, modified each time
>  
>      int interlaced;             ///< is frame interlaced
>      int top_field_first;
>  } AVFilterPicRef;
>  
>  /**
> + * A reference to an AVFilterBuffer for audio. Since filters can manipulate the
> + * origin of an audio buffer to, for example, reduce precision without any memcpy,
> + * sample format and channel_layout are per-reference properties. Sample step is

> + * also useful when reducing no. of channels, etc, and so is also per-reference.

"no." -> the number of...

abbreviations of that kind are ugly, and we want to look professional ;-).

> + */
> +typedef struct AVFilterSamplesRef
> +{
> +    AVFilterBuffer *buffer;       ///< the audio buffer that this is a reference to
> +    uint8_t *data[8];             ///< audio data for each channel
> +    int linesize[8];              ///< number of bytes to next sample
> +    int64_t pts;                  ///< presentation timestamp in units of 1/AV_TIME_BASE
> +
> +    int64_t channel_layout;       ///< channel layout of current buffer (see avcodec.h)
> +    int64_t sample_rate;          ///< samples per second
> +    enum SampleFormat sample_fmt; ///< sample format (see avcodec.h)
> +
> +    int samples_nb;               ///< number of samples in this buffer
> +    /* Should this go here or in the AVFilterBuffer struct? */
> +    int size;                     ///< size of buffer
> +
> +    int perms;                    ///< permissions
> +
> +    int planar;                   ///< is buffer planar or packed
> +} AVFilterSamplesRef;
> +
> +/**
>   * Adds a new reference to a picture.
>   * @param ref   an existing reference to the picture
>   * @param pmask a bitmask containing the allowable permissions in the new
> @@ -138,6 +193,23 @@
>  void avfilter_unref_pic(AVFilterPicRef *ref);
>  
>  /**
> + * Adds a new reference to an audio samples buffer.
> + * @param ref   an existing reference to the buffer

Nit: empty line before first @param, same below.

> + * @param pmask a bitmask containing the allowable permissions in the new
> + *              reference
> + * @return      a new reference to the buffer with the same properties as the
> + *              old, excluding any permissions denied by pmask
> + */
> +AVFilterSamplesRef *avfilter_ref_samples(AVFilterSamplesRef *ref, int pmask);
> +
> +/**
> + * Removes a reference to a buffer of audio samples. If this is the last reference
> + * to the buffer, the buffer itself is also automatically freed.
> + * @param ref reference to the buffer
> + */
> +void avfilter_unref_samples(AVFilterSamplesRef *ref);
> +
> +/**
>   * A list of supported formats for one end of a filter link. This is used
>   * during the format negotiation process to try to pick the best format to
>   * use to minimize the number of necessary conversions. Each filter gives a
> @@ -181,15 +253,17 @@
>  struct AVFilterFormats
>  {
>      unsigned format_count;      ///< number of formats
> -    enum PixelFormat *formats;  ///< list of pixel formats
> +    enum AVMediaType type;      ///< filter type
> +    enum PixelFormat *formats;   ///< list of pixel formats for video
> +    enum SampleFormat *aformats; ///< list of sample formats for audio
>  
>      unsigned refcount;          ///< number of references to this list
>      AVFilterFormats ***refs;    ///< references to this list
>  };
>  
>  /**
> - * Creates a list of supported formats. This is intended for use in
> - * AVFilter->query_formats().
> + * Creates a list of supported pixel formats. This is intended for use in
> + * AVFilter->query_formats() for video filters.
>   * @param pix_fmt list of pixel formats, terminated by PIX_FMT_NONE
>   * @return the format list, with no existing references
>   */
> @@ -211,6 +285,29 @@
>  AVFilterFormats *avfilter_all_colorspaces(void);
>  
>  /**
> + * Creates a list of supported sample formats. This is intended for use in
> + * AVFilter->query_formats() for audio filters.
> + * @param sample_fmt list of sample formats, terminated by SAMPLE_FMT_NONE
> + * @return the format list, with no existing references
> + */
> +AVFilterFormats *avfilter_make_aformat_list(const enum SampleFormat *sample_fmts);
> +
> +/**
> + * Adds sample_fmt to the list of sample formats contained in *avff.
> + * If *avff is NULL the function allocates the filter formats struct
> + * and puts its pointer in *avff.
> + *
> + * @return a non negative value in case of success, or a negative
> + * value corresponding to an AVERROR code in case of error
> + */
> +int avfilter_add_sampleformat(AVFilterFormats **avff, enum SampleFormat sample_fmt);
> +
> +/**
> + * Returns a list of all sampleformats supported by FFmpeg.
> + */
> +AVFilterFormats *avfilter_all_sampleformats(void);
> +
> +/**
>   * Returns a format list which contains the intersection of the formats of
>   * a and b. Also, all the references of a, all the references of b, and
>   * a and b themselves will be deallocated.
> @@ -280,8 +377,7 @@
>      const char *name;
>  
>      /**
> -     * AVFilterPad type. Only video supported now, hopefully someone will
> -     * add audio in the future.
> +     * AVFilterPad type. Audio support still in progress.
>       */
>      enum AVMediaType type;
>  
> @@ -315,7 +411,7 @@
>      void (*start_frame)(AVFilterLink *link, AVFilterPicRef *picref);
>  
>      /**
> -     * Callback function to get a buffer. If NULL, the filter system will
> +     * Callback function to get a video buffer. If NULL, the filter system will
>       * use avfilter_default_get_video_buffer().
>       *
>       * Input video pads only.
> @@ -323,6 +419,16 @@
>      AVFilterPicRef *(*get_video_buffer)(AVFilterLink *link, int perms, int w, int h);
>  
>      /**
> +     * Callback function to get an audio buffer. If NULL, the filter system will
> +     * use avfilter_default_get_audio_buffer().
> +     *
> +     * Input audio pads only.
> +     */
> +    AVFilterSamplesRef *(*get_audio_buffer)(AVFilterLink *link, int perms,
> +                                            int size, int64_t channel_layout,
> +                                            enum SampleFormat sample_fmt, int planar);
> +
> +    /**
>       * Callback called after the slices of a frame are completely sent. If
>       * NULL, the filter layer will default to releasing the reference stored
>       * in the link structure during start_frame().
> @@ -340,6 +446,14 @@
>      void (*draw_slice)(AVFilterLink *link, int y, int height, int slice_dir);
>  
>      /**
> +     * Samples filtering callback. This is where a filter receives audio data
> +     * and should do its processing.
> +     *
> +     * Input audio pads only.
> +     */
> +    void (*filter_samples)(AVFilterLink *link, AVFilterSamplesRef *samplesref);
> +
> +    /**
>       * Frame poll callback. This returns the number of immediately available
>       * frames. It should return a positive value if the next request_frame()
>       * is guaranteed to return one frame (with no delay).
> @@ -360,6 +474,15 @@
>      int (*request_frame)(AVFilterLink *link);
>  
>      /**
> +     * Samples request callback. A call to this should result in at least one
> +     * sample being output over the given link. This should return zero on
> +     * success, and another value on error.
> +     *
> +     * Output audio pads only.
> +     */
> +    int (*request_samples)(AVFilterLink *link);
> +
> +    /**
>       * Link configuration callback.
>       *
>       * For output pads, this should set the link properties such as
> @@ -382,13 +505,19 @@
>  void avfilter_default_draw_slice(AVFilterLink *link, int y, int h, int slice_dir);
>  /** default handler for end_frame() for video inputs */
>  void avfilter_default_end_frame(AVFilterLink *link);
> -/** default handler for config_props() for video outputs */
> +/** default handler for filter_samples() for audio inputs */
> +void avfilter_default_filter_samples(AVFilterLink *link, AVFilterSamplesRef *samplesref);
> +/** default handler for config_props() for audio/video outputs */
>  int avfilter_default_config_output_link(AVFilterLink *link);
> -/** default handler for config_props() for video inputs */
> +/** default handler for config_props() for audio/video inputs */
>  int avfilter_default_config_input_link (AVFilterLink *link);
>  /** default handler for get_video_buffer() for video inputs */
>  AVFilterPicRef *avfilter_default_get_video_buffer(AVFilterLink *link,
>                                                    int perms, int w, int h);
> +/** default handler for get_audio_buffer() for audio inputs */
> +AVFilterSamplesRef *avfilter_default_get_audio_buffer(AVFilterLink *link, int perms,
> +                                                      int size, int64_t channel_layout,
> +                                                      enum SampleFormat sample_fmt, int planar);
>  /**
>   * A helper for query_formats() which sets all links to the same list of
>   * formats. If there are no links hooked to this filter, the list of formats is
> @@ -407,10 +536,18 @@
>  /** end_frame() handler for filters which simply pass video along */
>  void avfilter_null_end_frame(AVFilterLink *link);
>  
> +/** filter_samples() handler for filters which simply pass audio along */
> +void avfilter_null_filter_samples(AVFilterLink *link, AVFilterSamplesRef *samplesref);
> +
>  /** get_video_buffer() handler for filters which simply pass video along */
>  AVFilterPicRef *avfilter_null_get_video_buffer(AVFilterLink *link,
>                                                    int perms, int w, int h);
>  
> +/** get_audio_buffer() handler for filters which simply pass audio along */
> +AVFilterSamplesRef *avfilter_null_get_audio_buffer(AVFilterLink *link, int perms,
> +                                                   int size, int64_t channel_layout,
> +                                                   enum SampleFormat sample_fmt, int planar);
> +
>  /**
>   * Filter definition. This defines the pads a filter contains, and all the
>   * callback functions used to interact with the filter.
> @@ -498,10 +635,22 @@
>          AVLINK_INIT             ///< complete
>      } init_state;
>  
> +    /**
> +     * AVFilterPad type. Audio support still in progress.
> +     */
> +    enum AVMediaType type;
> +
> +    /* These three parameters apply only to video */
>      int w;                      ///< agreed upon image width
>      int h;                      ///< agreed upon image height
>      enum PixelFormat format;    ///< agreed upon image colorspace
>  
> +    /* These four parameters apply only to audio */
> +    int samples_nb;             ///< number of samples in this buffer
> +    int64_t channel_layout;     ///< channel layout of current buffer (see avcodec.h)
> +    int64_t sample_rate;        ///< samples per second
> +    enum SampleFormat aformat;  ///< sample format (see avcodec.h)
> + 
>      /**
>       * Lists of formats supported by the input and output filters respectively.
>       * These lists are used for negotiating the format to actually be used,
> @@ -511,16 +660,21 @@
>      AVFilterFormats *out_formats;
>  
>      /**
> -     * The picture reference currently being sent across the link by the source
> -     * filter. This is used internally by the filter system to allow
> -     * automatic copying of pictures which do not have sufficient permissions
> -     * for the destination. This should not be accessed directly by the
> -     * filters.
> +     * The picture (for video) or samples (for audio) reference currently being
> +     * sent across the link by the source filter. This is used internally by the
> +     * filter system to allow automatic copying of pictures/samples which do not
> +     * have sufficient permissions for the destination. This should not be accessed
> +     * directly by the filters.
>       */
>      AVFilterPicRef *srcpic;
>  
>      AVFilterPicRef *cur_pic;
>      AVFilterPicRef *outpic;
> +

> +    AVFilterSamplesRef *srcsamples;
> +
> +    AVFilterSamplesRef *cur_samples;
> +    AVFilterSamplesRef *outsamples;

This is sooo unconsistent, I'd say src_samples, cur_samples,
out_samples, video names should be fixed accordingly.

>  };
>  
>  /**
> @@ -555,6 +709,22 @@
>                                            int w, int h);
>  
>  /**
> + * Requests an audio samples buffer with a specific set of permissions.
> + *
> + * @param link           the output link to the filter from which the buffer will
> + *                       be requested
> + * @param perms          the required access permissions

> + * @param samples_nb     the no. of samples in the buffer to allocate
> + * @param channel_layout the no. & type of channels per sample in the buffer to allocate

"no." -> number of...

> + * @param sample_fmt     the format of each sample in the buffer to allocate
> + * @return               A reference to the samples. This must be unreferenced with
> + *                       avfilter_unref_samples when you are finished with it.
> + */
> +AVFilterSamplesRef *avfilter_get_audio_buffer(AVFilterLink *link, int perms,
> +                                             int size, int64_t channel_layout,
> +                                             enum SampleFormat sample_fmt, int planar);
> +
> +/**
>   * Requests an input frame from the filter at the other end of the link.
>   * @param link the input link
>   * @return     zero on success
> @@ -562,6 +732,14 @@
>  int avfilter_request_frame(AVFilterLink *link);
>  
>  /**
> + * Requests input audio samples from the filter at the other end of the link.
> + *
> + * @param  link the input link
> + * @return zero on success
> + */
> +int avfilter_request_samples(AVFilterLink *link);
> +
> +/**
>   * Polls a frame from the filter chain.
>   * @param  link the input link
>   * @return the number of immediately available frames, a negative
> @@ -602,6 +780,14 @@
>   */
>  void avfilter_draw_slice(AVFilterLink *link, int y, int h, int slice_dir);
>  
> +/**
> + * Sends a buffer of audio samples to the next filter.
> + *
> + * @param link   the output link over which the audio samples are being sent
> + * @param planar samples are packed if 0 or planar if 1
> + */
> +void avfilter_filter_samples(AVFilterLink *link, AVFilterSamplesRef *samplesref);
> +
>  /** Initializes the filter system. Registers all builtin filters. */
>  void avfilter_register_all(void);

Anyway nice work :-).

Regards.
-- 
FFmpeg = Formidable and Fast Miracolous Patchable Educated Geek