[FFmpeg-devel] [PATCH] avfilter: add QSV variants of the stack filters

Mon Feb 6 09:38:13 EET 2023

On Ma, 2023-01-30 at 09:48 +0100, Paul B Mahol wrote:
> On 1/30/23, Xiang, Haihao <haihao.xiang-at-intel.com at ffmpeg.org> wrote:
> > From: Haihao Xiang <haihao.xiang at intel.com>
> > 
> > Include hstack_qsv, vstack_qsv and xstack_qsv. They may accept input
> > streams with different sizes.
> > 
> > Examples:
> > $ ffmpeg -hwaccel qsv -hwaccel_output_format qsv -i input.mp4 \
> > -filter_complex "[0:v][0:v]hstack_qsv" -f null -
> > 
> > $ ffmpeg \
> > -hwaccel qsv -hwaccel_output_format qsv -i input.mp4 \
> > -hwaccel qsv -hwaccel_output_format qsv -i input.mp4 \
> > -hwaccel qsv -hwaccel_output_format qsv -i input.mp4 \
> > -hwaccel qsv -hwaccel_output_format qsv -i input.mp4 \
> > -filter_complex
> > "[0:v][1:v][2:v][3:v]xstack_qsv=inputs=4:fill=0x000000:layout=0_0_1920x1080|
> > w0_0_1920x1080|0_h0_1920x1080|w0_h0_1920x1080"
> > \
> > -f null -
> > 
> > Signed-off-by: Haihao Xiang <haihao.xiang at intel.com>
> > ---
> >  Changelog                  |   1 +
> >  configure                  |   6 +
> >  doc/filters.texi           |  88 ++++++
> >  libavfilter/Makefile       |   3 +
> >  libavfilter/allfilters.c   |   3 +
> >  libavfilter/version.h      |   2 +-
> >  libavfilter/vf_stack_qsv.c | 563 +++++++++++++++++++++++++++++++++++++
> >  7 files changed, 665 insertions(+), 1 deletion(-)
> >  create mode 100644 libavfilter/vf_stack_qsv.c
> > 
> > diff --git a/Changelog b/Changelog
> > index a0f1ad7211..0d700320fd 100644
> > --- a/Changelog
> > +++ b/Changelog
> > @@ -34,6 +34,7 @@ version <next>:
> >  - ssim360 video filter
> >  - ffmpeg CLI new options: -enc_stats_pre[_fmt], -enc_stats_post[_fmt]
> >  - hstack_vaapi, vstack_vaapi and xstack_vaapi filters
> > +- hstack_qsv, vstack_qsv and xstack_qsv filters
> > 
> > 
> >  version 5.1:
> > diff --git a/configure b/configure
> > index 47790d10f5..037a47f2ab 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3770,6 +3770,12 @@ yadif_videotoolbox_filter_deps="metal corevideo
> > videotoolbox"
> >  hstack_vaapi_filter_deps="vaapi_1"
> >  vstack_vaapi_filter_deps="vaapi_1"
> >  xstack_vaapi_filter_deps="vaapi_1"
> > +hstack_qsv_filter_deps="libmfx"
> > +hstack_qsv_filter_select="qsvvpp"
> > +vstack_qsv_filter_deps="libmfx"
> > +vstack_qsv_filter_select="qsvvpp"
> > +xstack_qsv_filter_deps="libmfx"
> > +xstack_qsv_filter_select="qsvvpp"
> > 
> >  # examples
> >  avio_list_dir_deps="avformat avutil"
> > diff --git a/doc/filters.texi b/doc/filters.texi
> > index 3a54c68f3e..43c77dc041 100644
> > --- a/doc/filters.texi
> > +++ b/doc/filters.texi
> > @@ -26772,6 +26772,94 @@ See @ref{xstack}.
> > 
> >  @c man end VAAPI VIDEO FILTERS
> > 
> > + at chapter QSV Video Filters
> > + at c man begin QSV VIDEO FILTERS
> > +
> > +Below is a description of the currently available QSV video filters.
> > +
> > +To enable compilation of these filters you need to configure FFmpeg with
> > + at code{--enable-libmfx} or @code{--enable-libvpl}.
> > +
> > +To use QSV filters, you need to setup the QSV device correctly. For more
> > information, please read
> > @url{https://trac.ffmpeg.org/wiki/Hardware/QuickSync}
> > +
> > + at section hstack_qsv
> > +Stack input videos horizontally.
> > +
> > +This is the QSV variant of the @ref{hstack} filter, each input stream may
> > +have different height, this filter will scale down/up each input stream
> > while
> > +keeping the orignal aspect.
> > +
> > +It accepts the following options:
> > +
> > + at table @option
> > + at item inputs
> > +See @ref{hstack}.
> > +
> > + at item shortest
> > +See @ref{hstack}.
> > +
> > + at item height
> > +Set height of output. If set to 0, this filter will set height of output
> > to
> > +height of the first input stream. Default value is 0.
> > + at end table
> > +
> > + at section vstack_qsv
> > +Stack input videos vertically.
> > +
> > +This is the QSV variant of the @ref{vstack} filter, each input stream may
> > +have different width, this filter will scale down/up each input stream
> > while
> > +keeping the orignal aspect.
> > +
> > +It accepts the following options:
> > +
> > + at table @option
> > + at item inputs
> > +See @ref{vstack}.
> > +
> > + at item shortest
> > +See @ref{vstack}.
> > +
> > + at item width
> > +Set width of output. If set to 0, this filter will set width of output to
> > +width of the first input stream. Default value is 0.
> > + at end table
> > +
> > + at section xstack_qsv
> > +Stack video inputs into custom layout.
> > +
> > +This is the QSV variant of the @ref{xstack} filter.
> > +
> > +It accepts the following options:
> > +
> > + at table @option
> > + at item inputs
> > +See @ref{xstack}.
> > +
> > + at item shortest
> > +See @ref{xstack}.
> > +
> > + at item layout
> > +See @ref{xstack}.
> > +Moreover, this permits the user to supply output size for each input
> > stream.
> > + at example
> > +xstack_qsv=inputs=4:layout=0_0_1920x1080|0_h0_1920x1080|w0_0_1920x1080|w0_h
> > 0_1920x1080
> > + at end example
> > +
> > + at item grid
> > +See @ref{xstack}.
> > +
> > + at item grid_tile_size
> > +Set output size for each input stream when @option{grid} is set. If this
> > option
> > +is not set, this filter will set output size by default to the size of the
> > +first input stream. For the syntax of this option, check the
> > + at ref{video size syntax,,"Video size" section in the ffmpeg-utils
> > manual,ffmpeg-utils}.
> > +
> > + at item fill
> > +See @ref{xstack}.
> > + at end table
> > +
> > + at c man end QSV VIDEO FILTERS
> > +
> >  @chapter Video Sources
> >  @c man begin VIDEO SOURCES
> > 
> > diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> > index b45dcd00fc..23e7b89d09 100644
> > --- a/libavfilter/Makefile
> > +++ b/libavfilter/Makefile
> > @@ -561,6 +561,9 @@ OBJS-$(CONFIG_ZSCALE_FILTER)                 +=
> > vf_zscale.o
> >  OBJS-$(CONFIG_HSTACK_VAAPI_FILTER)           += vf_stack_vaapi.o
> > framesync.o vaapi_vpp.o
> >  OBJS-$(CONFIG_VSTACK_VAAPI_FILTER)           += vf_stack_vaapi.o
> > framesync.o vaapi_vpp.o
> >  OBJS-$(CONFIG_XSTACK_VAAPI_FILTER)           += vf_stack_vaapi.o
> > framesync.o vaapi_vpp.o
> > +OBJS-$(CONFIG_HSTACK_QSV_FILTER)             += vf_stack_qsv.o framesync.o
> > +OBJS-$(CONFIG_VSTACK_QSV_FILTER)             += vf_stack_qsv.o framesync.o
> > +OBJS-$(CONFIG_XSTACK_QSV_FILTER)             += vf_stack_qsv.o framesync.o
> > 
> >  OBJS-$(CONFIG_ALLRGB_FILTER)                 += vsrc_testsrc.o
> >  OBJS-$(CONFIG_ALLYUV_FILTER)                 += vsrc_testsrc.o
> > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> > index 9cdcca4853..d7db46c2af 100644
> > --- a/libavfilter/allfilters.c
> > +++ b/libavfilter/allfilters.c
> > @@ -526,6 +526,9 @@ extern const AVFilter ff_vf_zscale;
> >  extern const AVFilter ff_vf_hstack_vaapi;
> >  extern const AVFilter ff_vf_vstack_vaapi;
> >  extern const AVFilter ff_vf_xstack_vaapi;
> > +extern const AVFilter ff_vf_hstack_qsv;
> > +extern const AVFilter ff_vf_vstack_qsv;
> > +extern const AVFilter ff_vf_xstack_qsv;
> > 
> >  extern const AVFilter ff_vsrc_allrgb;
> >  extern const AVFilter ff_vsrc_allyuv;
> > diff --git a/libavfilter/version.h b/libavfilter/version.h
> > index 057ab63415..93036a615d 100644
> > --- a/libavfilter/version.h
> > +++ b/libavfilter/version.h
> > @@ -31,7 +31,7 @@
> > 
> >  #include "version_major.h"
> > 
> > -#define LIBAVFILTER_VERSION_MINOR  56
> > +#define LIBAVFILTER_VERSION_MINOR  57
> >  #define LIBAVFILTER_VERSION_MICRO 100
> > 
> > 
> > diff --git a/libavfilter/vf_stack_qsv.c b/libavfilter/vf_stack_qsv.c
> > new file mode 100644
> > index 0000000000..f3a623f26c
> > --- /dev/null
> > +++ b/libavfilter/vf_stack_qsv.c
> > @@ -0,0 +1,563 @@
> > +/*
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> > USA
> > + */
> > +
> > +/**
> > + * @file
> > + * Hardware accelerated hstack, vstack and xstack filters based on Intel
> > Quick Sync Video VPP
> > + */
> > +
> > +#include "config_components.h"
> > +
> > +#include "libavutil/opt.h"
> > +#include "libavutil/common.h"
> > +#include "libavutil/pixdesc.h"
> > +#include "libavutil/eval.h"
> > +#include "libavutil/hwcontext.h"
> > +#include "libavutil/avstring.h"
> > +#include "libavutil/avassert.h"
> > +#include "libavutil/imgutils.h"
> > +#include "libavutil/mathematics.h"
> > +#include "libavutil/parseutils.h"
> > +
> > +#include "internal.h"
> > +#include "filters.h"
> > +#include "formats.h"
> > +#include "video.h"
> > +
> > +#include "framesync.h"
> > +#include "qsvvpp.h"
> > +
> > +#define OFFSET(x) offsetof(QSVStackContext, x)
> > +#define FLAGS (AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_FILTERING_PARAM)
> > +
> > +enum {
> > +    QSV_STACK_H = 0,
> > +    QSV_STACK_V = 1,
> > +    QSV_STACK_X = 2
> > +};
> > +
> > +typedef struct QSVStackContext {
> > +    QSVVPPContext qsv;
> > +
> > +    FFFrameSync fs;
> > +    QSVVPPParam qsv_param;
> > +    mfxExtVPPComposite comp_conf;
> > +    int mode;
> > +
> > +    /* Options */
> > +    int nb_inputs;
> > +    int shortest;
> > +    int tile_width;
> > +    int tile_height;
> > +    int nb_grid_columns;
> > +    int nb_grid_rows;
> > +    char *layout;
> > +    uint8_t fillcolor[4];
> > +    char *fillcolor_str;
> > +    int fillcolor_enable;
> > +} QSVStackContext;
> > +
> > +static void rgb2yuv(float r, float g, float b, int *y, int *u, int *v, int
> > depth)
> > +{
> > +    *y = ((0.21260*219.0/255.0) * r + (0.71520*219.0/255.0) * g +
> > +         (0.07220*219.0/255.0) * b) * ((1 << depth) - 1);
> > +    *u = (-(0.11457*224.0/255.0) * r - (0.38543*224.0/255.0) * g +
> > +         (0.50000*224.0/255.0) * b + 0.5) * ((1 << depth) - 1);
> > +    *v = ((0.50000*224.0/255.0) * r - (0.45415*224.0/255.0) * g -
> > +         (0.04585*224.0/255.0) * b + 0.5) * ((1 << depth) - 1);
> > +}
> > +
> > +static int process_frame(FFFrameSync *fs)
> > +{
> > +    AVFilterContext *ctx = fs->parent;
> > +    QSVVPPContext *qsv = fs->opaque;
> > +    AVFrame *frame = NULL;
> > +    int ret = 0;
> > +
> > +    for (int i = 0; i < ctx->nb_inputs; i++) {
> > +        ret = ff_framesync_get_frame(fs, i, &frame, 0);
> > +        if (ret == 0)
> > +            ret = ff_qsvvpp_filter_frame(qsv, ctx->inputs[i], frame);
> > +        if (ret < 0 && ret != AVERROR(EAGAIN))
> > +            break;
> > +    }
> > +
> > +    if (ret == 0 && qsv->got_frame == 0) {
> > +        for (int i = 0; i < ctx->nb_inputs; i++)
> > +            FF_FILTER_FORWARD_WANTED(ctx->outputs[0], ctx->inputs[i]);
> > +
> > +        ret = FFERROR_NOT_READY;
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +static int init_framesync(AVFilterContext *ctx)
> > +{
> > +    QSVStackContext *sctx = ctx->priv;
> > +    int ret;
> > +
> > +    ret = ff_framesync_init(&sctx->fs, ctx, ctx->nb_inputs);
> > +    if (ret < 0)
> > +        return ret;
> > +
> > +    sctx->fs.on_event = process_frame;
> > +    sctx->fs.opaque = sctx;
> > +
> > +    for (int i = 0; i < ctx->nb_inputs; i++) {
> > +        FFFrameSyncIn *in = &sctx->fs.in[i];
> > +        in->before = EXT_STOP;
> > +        in->after = sctx->shortest ? EXT_STOP : EXT_INFINITY;
> > +        in->sync = 1;
> > +        in->time_base = ctx->inputs[i]->time_base;
> > +    }
> > +
> > +    return ff_framesync_configure(&sctx->fs);
> > +}
> > +
> > +#define SET_INPUT_STREAM(is, x, y, w, h) do {   \
> > +        is->DstX = x;                           \
> > +        is->DstY = y;                           \
> > +        is->DstW = w;                           \
> > +        is->DstH = h;                           \
> > +        is->GlobalAlpha = 255;                  \
> > +        is->GlobalAlphaEnable = 0;              \
> > +        is->PixelAlphaEnable = 0;               \
> > +    } while (0)
> > +
> > +static int config_output(AVFilterLink *outlink)
> > +{
> > +    AVFilterContext *ctx = outlink->src;
> > +    QSVStackContext *sctx = ctx->priv;
> > +    AVFilterLink *inlink0 = ctx->inputs[0];
> > +    int width, height, ret;
> > +    enum AVPixelFormat in_format;
> > +    int depth = 8;
> > +
> > +    if (inlink0->format == AV_PIX_FMT_QSV) {
> > +         if (!inlink0->hw_frames_ctx || !inlink0->hw_frames_ctx->data)
> > +             return AVERROR(EINVAL);
> > +
> > +         in_format =
> > ((AVHWFramesContext*)inlink0->hw_frames_ctx->data)->sw_format;
> > +    } else
> > +        in_format = inlink0->format;
> > +
> > +    sctx->qsv_param.out_sw_format = in_format;
> > +
> > +    for (int i = 1; i < sctx->nb_inputs; i++) {
> > +        AVFilterLink *inlink = ctx->inputs[i];
> > +
> > +        if (inlink0->format == AV_PIX_FMT_QSV) {
> > +            AVHWFramesContext *hwfc0 = (AVHWFramesContext
> > *)inlink0->hw_frames_ctx->data;
> > +            AVHWFramesContext *hwfc = (AVHWFramesContext
> > *)inlink->hw_frames_ctx->data;
> > +
> > +            if (inlink0->format != inlink->format) {
> > +                av_log(ctx, AV_LOG_ERROR, "Mixing hardware and software
> > pixel formats is not supported.\n");
> > +
> > +                return AVERROR(EINVAL);
> > +            } else if (hwfc0->device_ctx != hwfc->device_ctx) {
> > +                av_log(ctx, AV_LOG_ERROR, "Inputs with different underlying
> > QSV devices are forbidden.\n");
> > +
> > +                return AVERROR(EINVAL);
> > +            }
> > +        }
> > +    }
> > +
> > +    if (in_format == AV_PIX_FMT_P010)
> > +        depth = 10;
> > +
> > +    if (sctx->fillcolor_enable) {
> > +        int Y, U, V;
> > +
> > +        rgb2yuv(sctx->fillcolor[0] / 255.0, sctx->fillcolor[1] / 255.0,
> > +                sctx->fillcolor[2] / 255.0, &Y, &U, &V, depth);
> > +        sctx->comp_conf.Y = Y;
> > +        sctx->comp_conf.U = U;
> > +        sctx->comp_conf.V = V;
> > +    }
> > +
> > +    if (sctx->mode == QSV_STACK_H) {
> > +        height = sctx->tile_height;
> > +        width = 0;
> > +
> > +        if (height == 0)
> > +            height = inlink0->h;
> > +
> > +        for (int i = 0; i < sctx->nb_inputs; i++) {
> > +            AVFilterLink *inlink = ctx->inputs[i];
> > +            mfxVPPCompInputStream *is = &sctx->comp_conf.InputStream[i];
> > +
> > +            SET_INPUT_STREAM(is, width, 0, av_rescale(height, inlink->w,
> > inlink->h), height);
> > +            width += av_rescale(height, inlink->w, inlink->h);
> > +        }
> > +    } else if (sctx->mode == QSV_STACK_V) {
> > +        height = 0;
> > +        width = sctx->tile_width;
> > +
> > +        if (width == 0)
> > +            width = inlink0->w;
> > +
> > +        for (int i = 0; i < sctx->nb_inputs; i++) {
> > +            AVFilterLink *inlink = ctx->inputs[i];
> > +            mfxVPPCompInputStream *is = &sctx->comp_conf.InputStream[i];
> > +
> > +            SET_INPUT_STREAM(is, 0, height, width, av_rescale(width,
> > inlink->h, inlink->w));
> > +            height += av_rescale(width, inlink->h, inlink->w);
> > +        }
> > +    } else if (sctx->nb_grid_rows && sctx->nb_grid_columns) {
> > +        int xpos = 0, ypos = 0;
> > +        int ow, oh, k = 0;
> > +
> > +        ow = sctx->tile_width;
> > +        oh = sctx->tile_height;
> > +
> > +        if (!ow || !oh) {
> > +            ow = ctx->inputs[0]->w;
> > +            oh = ctx->inputs[0]->h;
> > +        }
> > +
> > +        for (int i = 0; i < sctx->nb_grid_columns; i++) {
> > +            ypos = 0;
> > +
> > +            for (int j = 0; j < sctx->nb_grid_rows; j++) {
> > +                mfxVPPCompInputStream *is =
> > &sctx->comp_conf.InputStream[k];
> > +
> > +                SET_INPUT_STREAM(is, xpos, ypos, ow, oh);
> > +                k++;
> > +                ypos += oh;
> > +            }
> > +
> > +            xpos += ow;
> > +        }
> > +
> > +        width = ow * sctx->nb_grid_columns;
> > +        height = oh * sctx->nb_grid_rows;
> > +    } else {
> > +        char *arg, *p = sctx->layout, *saveptr = NULL;
> > +        char *arg2, *p2, *saveptr2 = NULL;
> > +        char *arg3, *p3, *saveptr3 = NULL;
> > +        int xpos, ypos, size;
> > +        int ow, oh;
> > +
> > +        width = ctx->inputs[0]->w;
> > +        height = ctx->inputs[0]->h;
> > +
> > +        for (int i = 0; i < sctx->nb_inputs; i++) {
> > +            AVFilterLink *inlink = ctx->inputs[i];
> > +            mfxVPPCompInputStream *is = &sctx->comp_conf.InputStream[i];
> > +
> > +            ow = inlink->w;
> > +            oh = inlink->h;
> > +
> > +            if (!(arg = av_strtok(p, "|", &saveptr)))
> > +                return AVERROR(EINVAL);
> > +
> > +            p = NULL;
> > +            p2 = arg;
> > +            xpos = ypos = 0;
> > +
> > +            for (int j = 0; j < 3; j++) {
> > +                if (!(arg2 = av_strtok(p2, "_", &saveptr2))) {
> > +                    if (j == 2)
> > +                        break;
> > +                    else
> > +                        return AVERROR(EINVAL);
> > +                }
> > +
> > +                p2 = NULL;
> > +                p3 = arg2;
> > +
> > +                if (j == 2) {
> > +                    if ((ret = av_parse_video_size(&ow, &oh, p3)) < 0) {
> > +                        av_log(ctx, AV_LOG_ERROR, "Invalid size '%s'\n",
> > p3);
> > +                        return ret;
> > +                    }
> > +
> > +                    break;
> > +                }
> > +
> > +                while ((arg3 = av_strtok(p3, "+", &saveptr3))) {
> > +                    p3 = NULL;
> > +                    if (sscanf(arg3, "w%d", &size) == 1) {
> > +                        if (size == i || size < 0 || size >=
> > sctx->nb_inputs)
> > +                            return AVERROR(EINVAL);
> > +
> > +                        if (!j)
> > +                            xpos +=
> > sctx->comp_conf.InputStream[size].DstW;
> > +                        else
> > +                            ypos +=
> > sctx->comp_conf.InputStream[size].DstW;
> > +                    } else if (sscanf(arg3, "h%d", &size) == 1) {
> > +                        if (size == i || size < 0 || size >=
> > sctx->nb_inputs)
> > +                            return AVERROR(EINVAL);
> > +
> > +                        if (!j)
> > +                            xpos +=
> > sctx->comp_conf.InputStream[size].DstH;
> > +                        else
> > +                            ypos +=
> > sctx->comp_conf.InputStream[size].DstH;
> > +                    } else if (sscanf(arg3, "%d", &size) == 1) {
> > +                        if (size < 0)
> > +                            return AVERROR(EINVAL);
> > +
> > +                        if (!j)
> > +                            xpos += size;
> > +                        else
> > +                            ypos += size;
> > +                    } else {
> > +                        return AVERROR(EINVAL);
> > +                    }
> > +                }
> > +            }
> > +
> > +            SET_INPUT_STREAM(is, xpos, ypos, ow, oh);
> > +            width = FFMAX(width, xpos + ow);
> > +            height = FFMAX(height, ypos + oh);
> > +        }
> > +    }
> > +
> > +    outlink->w = width;
> > +    outlink->h = height;
> > +    outlink->frame_rate = inlink0->frame_rate;
> > +    outlink->time_base = av_inv_q(outlink->frame_rate);
> > +    outlink->sample_aspect_ratio = inlink0->sample_aspect_ratio;
> > +
> > +    ret = init_framesync(ctx);
> > +
> > +    if (ret < 0)
> > +        return ret;
> > +
> > +    return ff_qsvvpp_init(ctx, &sctx->qsv_param);
> > +}
> > +
> > +/*
> > + * Callback for qsvvpp
> > + * @Note: qsvvpp composition does not generate PTS for result frame.
> > + *        so we assign the PTS from framesync to the output frame.
> > + */
> > +
> > +static int filter_callback(AVFilterLink *outlink, AVFrame *frame)
> > +{
> > +    QSVStackContext *sctx = outlink->src->priv;
> > +
> > +    frame->pts = av_rescale_q(sctx->fs.pts,
> > +                              sctx->fs.time_base, outlink->time_base);
> > +    return ff_filter_frame(outlink, frame);
> > +}
> > +
> > +
> > +static int stack_qsv_init(AVFilterContext *ctx)
> > +{
> > +    QSVStackContext *sctx = ctx->priv;
> > +    int ret;
> > +
> > +    if (!strcmp(ctx->filter->name, "hstack_qsv"))
> > +        sctx->mode = QSV_STACK_H;
> > +    else if (!strcmp(ctx->filter->name, "vstack_qsv"))
> > +        sctx->mode = QSV_STACK_V;
> > +    else {
> > +        int is_grid;
> > +
> > +        av_assert0(strcmp(ctx->filter->name, "xstack_qsv") == 0);
> > +        sctx->mode = QSV_STACK_X;
> > +        is_grid = sctx->nb_grid_rows && sctx->nb_grid_columns;
> > +
> > +        if (sctx->layout && is_grid) {
> > +            av_log(ctx, AV_LOG_ERROR, "Both layout and grid were specified.
> > Only one is allowed.\n");
> > +            return AVERROR(EINVAL);
> > +        }
> > +
> > +        if (!sctx->layout && !is_grid) {
> > +            if (sctx->nb_inputs == 2) {
> > +                sctx->nb_grid_rows = 1;
> > +                sctx->nb_grid_columns = 2;
> > +                is_grid = 1;
> > +            } else {
> > +                av_log(ctx, AV_LOG_ERROR, "No layout or grid
> > specified.\n");
> > +                return AVERROR(EINVAL);
> > +            }
> > +        }
> > +
> > +        if (is_grid)
> > +            sctx->nb_inputs = sctx->nb_grid_rows * sctx->nb_grid_columns;
> > +
> > +        if (strcmp(sctx->fillcolor_str, "none") &&
> > +            av_parse_color(sctx->fillcolor, sctx->fillcolor_str, -1, ctx)
> > > = 0) {
> > +            sctx->fillcolor_enable = 1;
> > +        } else {
> > +            sctx->fillcolor_enable = 0;
> > +        }
> > +    }
> > +
> > +    for (int i = 0; i < sctx->nb_inputs; i++) {
> > +        AVFilterPad pad = { 0 };
> > +
> > +        pad.type = AVMEDIA_TYPE_VIDEO;
> > +        pad.name = av_asprintf("input%d", i);
> > +
> > +        if (!pad.name)
> > +            return AVERROR(ENOMEM);
> > +
> > +        if ((ret = ff_append_inpad_free_name(ctx, &pad)) < 0)
> > +            return ret;
> > +    }
> > +
> > +    /* fill composite config */
> > +    sctx->comp_conf.Header.BufferId = MFX_EXTBUFF_VPP_COMPOSITE;
> > +    sctx->comp_conf.Header.BufferSz = sizeof(sctx->comp_conf);
> > +    sctx->comp_conf.NumInputStream = sctx->nb_inputs;
> > +    sctx->comp_conf.InputStream = av_calloc(sctx->nb_inputs,
> > +
> > sizeof(*sctx->comp_conf.InputStream));
> > +    if (!sctx->comp_conf.InputStream)
> > +        return AVERROR(ENOMEM);
> > +
> > +    /* initialize QSVVPP params */
> > +    sctx->qsv_param.filter_frame = filter_callback;
> > +    sctx->qsv_param.ext_buf =
> > av_mallocz(sizeof(*sctx->qsv_param.ext_buf));
> > +
> > +    if (!sctx->qsv_param.ext_buf)
> > +        return AVERROR(ENOMEM);
> > +
> > +    sctx->qsv_param.ext_buf[0] = (mfxExtBuffer *)&sctx->comp_conf;
> > +    sctx->qsv_param.num_ext_buf = 1;
> > +    sctx->qsv_param.num_crop = 0;
> > +
> > +    return 0;
> > +}
> > +
> > +static av_cold void stack_qsv_uninit(AVFilterContext *ctx)
> > +{
> > +    QSVStackContext *sctx = ctx->priv;
> > +
> > +    ff_qsvvpp_close(ctx);
> > +    ff_framesync_uninit(&sctx->fs);
> > +    av_freep(&sctx->comp_conf.InputStream);
> > +    av_freep(&sctx->qsv_param.ext_buf);
> > +}
> > +
> > +static int stack_qsv_activate(AVFilterContext *ctx)
> > +{
> > +    QSVStackContext *sctx = ctx->priv;
> > +    return ff_framesync_activate(&sctx->fs);
> > +}
> > +
> > +static int stack_qsv_query_formats(AVFilterContext *ctx)
> > +{
> > +    static const enum AVPixelFormat pixel_formats[] = {
> > +        AV_PIX_FMT_NV12,
> > +        AV_PIX_FMT_P010,
> > +        AV_PIX_FMT_QSV,
> > +        AV_PIX_FMT_NONE,
> > +    };
> > +
> > +    return ff_set_common_formats_from_list(ctx, pixel_formats);
> > +}
> > +
> > +static const AVFilterPad stack_qsv_outputs[] = {
> > +    {
> > +        .name          = "default",
> > +        .type          = AVMEDIA_TYPE_VIDEO,
> > +        .config_props  = config_output,
> > +    },
> > +};
> > +
> > +#define STACK_COMMON_OPTS \
> > +    { "inputs", "Set number of inputs", OFFSET(nb_inputs), AV_OPT_TYPE_INT,
> > { .i64 = 2 }, 2, UINT16_MAX, .flags = FLAGS },                   \
> > +    { "shortest", "Force termination when the shortest input terminates",
> > OFFSET(shortest), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, FLAGS },
> > +
> > +#if CONFIG_HSTACK_QSV_FILTER
> > +
> > +static const AVOption hstack_qsv_options[] = {
> > +    STACK_COMMON_OPTS
> > +
> > +    { "height", "Set output height (0 to use the height of input 0)",
> > OFFSET(tile_height), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, UINT16_MAX, FLAGS },
> > +    { NULL }
> > +};
> > +
> > +AVFILTER_DEFINE_CLASS(hstack_qsv);
> > +
> > +const AVFilter ff_vf_hstack_qsv = {
> > +    .name           = "hstack_qsv",
> > +    .description    = NULL_IF_CONFIG_SMALL("Quick Sync Video hstack."),
> > +    .priv_size      = sizeof(QSVStackContext),
> > +    .priv_class     = &hstack_qsv_class,
> > +    FILTER_QUERY_FUNC(stack_qsv_query_formats),
> > +    FILTER_OUTPUTS(stack_qsv_outputs),
> > +    .init           = stack_qsv_init,
> > +    .uninit         = stack_qsv_uninit,
> > +    .activate       = stack_qsv_activate,
> > +    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
> > +    .flags          = AVFILTER_FLAG_DYNAMIC_INPUTS,
> > +};
> > +
> > +#endif
> > +
> > +#if CONFIG_VSTACK_QSV_FILTER
> > +
> > +static const AVOption vstack_qsv_options[] = {
> > +    STACK_COMMON_OPTS
> > +
> > +    { "width",   "Set output width (0 to use the width of input 0)",
> > OFFSET(tile_width), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, UINT16_MAX, FLAGS },
> > +    { NULL }
> > +};
> > +
> > +AVFILTER_DEFINE_CLASS(vstack_qsv);
> > +
> > +const AVFilter ff_vf_vstack_qsv = {
> > +    .name           = "vstack_qsv",
> > +    .description    = NULL_IF_CONFIG_SMALL("Quick Sync Video vstack."),
> > +    .priv_size      = sizeof(QSVStackContext),
> > +    .priv_class     = &vstack_qsv_class,
> > +    FILTER_QUERY_FUNC(stack_qsv_query_formats),
> > +    FILTER_OUTPUTS(stack_qsv_outputs),
> > +    .init           = stack_qsv_init,
> > +    .uninit         = stack_qsv_uninit,
> > +    .activate       = stack_qsv_activate,
> > +    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
> > +    .flags          = AVFILTER_FLAG_DYNAMIC_INPUTS,
> > +};
> > +
> > +#endif
> > +
> > +#if CONFIG_XSTACK_QSV_FILTER
> > +
> > +static const AVOption xstack_qsv_options[] = {
> > +    STACK_COMMON_OPTS
> > +
> > +    { "layout", "Set custom layout", OFFSET(layout), AV_OPT_TYPE_STRING,
> > {.str = NULL}, 0, 0, .flags = FLAGS },
> > +    { "grid",   "set fixed size grid layout", OFFSET(nb_grid_columns),
> > AV_OPT_TYPE_IMAGE_SIZE, {.str=NULL}, 0, 0, .flags = FLAGS },
> > +    { "grid_tile_size",   "set tile size in grid layout",
> > OFFSET(tile_width), AV_OPT_TYPE_IMAGE_SIZE, {.str=NULL}, 0, 0, .flags =
> > FLAGS },
> > +    { "fill",   "Set the color for unused pixels", OFFSET(fillcolor_str),
> > AV_OPT_TYPE_STRING, {.str = "none"}, .flags = FLAGS },
> > +    { NULL }
> > +};
> > +
> > +AVFILTER_DEFINE_CLASS(xstack_qsv);
> > +
> > +const AVFilter ff_vf_xstack_qsv = {
> > +    .name           = "xstack_qsv",
> > +    .description    = NULL_IF_CONFIG_SMALL("Quick Sync Video xstack."),
> > +    .priv_size      = sizeof(QSVStackContext),
> > +    .priv_class     = &xstack_qsv_class,
> > +    FILTER_QUERY_FUNC(stack_qsv_query_formats),
> > +    FILTER_OUTPUTS(stack_qsv_outputs),
> > +    .init           = stack_qsv_init,
> > +    .uninit         = stack_qsv_uninit,
> > +    .activate       = stack_qsv_activate,
> > +    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE,
> > +    .flags          = AVFILTER_FLAG_DYNAMIC_INPUTS,
> > +};
> > +
> > +#endif
> > --
> > 2.25.1
> > 
> 
> Please  avoid duplicating code.

Thanks for the comment, I will factor out the common code for vaapi and qsv
based stack filters in the new patchset. Note the qsv / vaapi stack filters
don't require input streams have the same width or height, moreover user may
specify output width or height for each input stream (HW may do up/down scaling
while stacking the input videos), I won't share code between the SW stack
filters and qsv/vaapi stack filters.

BRs
Haihao