[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output

Michael Niedermayer michaelni at gmx.at
Sun Oct 30 14:42:38 CET 2011


On Sat, Oct 29, 2011 at 04:47:41PM +0200, Stefano Sabatini wrote:
> On date Saturday 2011-10-29 04:10:04 +0200, Michael Niedermayer encoded:
> > On Sat, Oct 29, 2011 at 12:56:15AM +0200, Stefano Sabatini wrote:
> [...]
> > > > please benchmark this with START/STOP_TIMER against the previous code
> > > 
> > > RGB path was disabled before this one, I split the present patch and
> > > did some tests.
> > > 
> > > * Test with no alpha in the main input
> > > 
> > > before alpha premultiplication
> > > 1287135 dezicycles in first, 2 runs, 0 skips
> > > 1335442 dezicycles in first, 4 runs, 0 skips
> > > 1245555 dezicycles in first, 8 runs, 0 skips
> > > 1162359 dezicycles in first, 16 runs, 0 skips
> > > 1144390 dezicycles in first, 32 runs, 0 skips
> > > 1134602 dezicycles in first, 64 runs, 0 skips
> > > 1133281 dezicycles in first, 128 runs, 0 skips
> > > 1114852 dezicycles in first, 256 runs, 0 skips
> > > 1108999 dezicycles in first, 512 runs, 0 skips
> > > 1101536 dezicycles in first, 1024 runs, 0 skips
> > > 1096821 dezicycles in first, 2048 runs, 0 skips
> > > 1090508 dezicycles in first, 4096 runs, 0 skips
> > > 1085896 dezicycles in first, 8192 runs, 0 skips
> > > 1084802 dezicycles in first, 16384 runs, 0 skips
> > > 1083604 dezicycles in first, 32768 runs, 0 skips
> > > 
> > > after alpha premultiplication
> > > 1224390 dezicycles in second, 2 runs, 0 skips
> > > 1202235 dezicycles in second, 4 runs, 0 skips
> > > 1191453 dezicycles in second, 8 runs, 0 skips
> > > 1183031 dezicycles in second, 16 runs, 0 skips
> > > 1230087 dezicycles in second, 32 runs, 0 skips
> > > 1227492 dezicycles in second, 64 runs, 0 skips
> > > 1230488 dezicycles in second, 128 runs, 0 skips
> > > 1215128 dezicycles in second, 256 runs, 0 skips
> > > 1207364 dezicycles in second, 512 runs, 0 skips
> > > 1199813 dezicycles in second, 1024 runs, 0 skips
> > > 1195857 dezicycles in second, 2048 runs, 0 skips
> > > 1193954 dezicycles in second, 4096 runs, 0 skips
> > > 1194128 dezicycles in second, 8192 runs, 0 skips
> > > 1187481 dezicycles in second, 16384 runs, 0 skips
> > > 1181874 dezicycles in second, 32768 runs, 0 skips
> > > 
> > > * Test with alpha in the main input:
> > > 28684935 dezicycles in first, 2 runs, 0 skips
> > > 28553902 dezicycles in first, 4 runs, 0 skips
> > > 28776015 dezicycles in first, 8 runs, 0 skips
> > > 29073680 dezicycles in first, 16 runs, 0 skips
> > > 28816918 dezicycles in first, 32 runs, 0 skips
> > > 28908704 dezicycles in first, 64 runs, 0 skips
> > > 28745401 dezicycles in first, 128 runs, 0 skips
> > > 28614980 dezicycles in first, 256 runs, 0 skips
> > > 28609710 dezicycles in first, 512 runs, 0 skips
> > > 28537037 dezicycles in first, 1024 runs, 0 skips
> > > 28517850 dezicycles in first, 2048 runs, 0 skips
> > > 28466515 dezicycles in first, 4096 runs, 0 skips
> > > 28438388 dezicycles in first, 8192 runs, 0 skips
> > > 28440383 dezicycles in first, 16384 runs, 0 skips
> > > 28426314 dezicycles in first, 32768 runs, 0 skips
> > > 
> > > 33347880 dezicycles in second, 2 runs, 0 skips
> > > 33131272 dezicycles in second, 4 runs, 0 skips
> > > 38018970 dezicycles in second, 8 runs, 0 skips
> > > 48715928 dezicycles in second, 16 runs, 0 skips
> > > 44290285 dezicycles in second, 32 runs, 0 skips
> > > 43696766 dezicycles in second, 64 runs, 0 skips
> > > 38599173 dezicycles in second, 128 runs, 0 skips
> > > 36112571 dezicycles in second, 256 runs, 0 skips
> > > 34737837 dezicycles in second, 512 runs, 0 skips
> > > 34066213 dezicycles in second, 1024 runs, 0 skips
> > > 33640178 dezicycles in second, 2048 runs, 0 skips
> > > 33368757 dezicycles in second, 4096 runs, 0 skips
> > > 33233522 dezicycles in second, 8192 runs, 0 skips
> > > 33132908 dezicycles in second, 16384 runs, 0 skips
> > > 33062949 dezicycles in second, 32768 runs, 0 skips
> > > 
> > > Results are as expected, alpha pre-multiplication is significantly
> > > slower but it may also be what the user wants, so I could make it
> > > optional (and preserve the original alpha?, enabled by default?).
> > 
> > thats not what i meant
> > 
> > the original code looked like this:
> > > -                d[r] = (d[r] * (0xff - s[3]) + s[0] * s[3] + 128) >> 8;
> > > -                d[1] = (d[1] * (0xff - s[3]) + s[1] * s[3] + 128) >> 8;
> > > -                d[b] = (d[b] * (0xff - s[3]) + s[2] * s[3] + 128) >> 8;
> > 
> > when i saw what you replaced it by i was ... scared ;)
> > 
> > if and switch are added in the innermost loop
> > constants are replaced by variables
> > variables are replaced by reading out of arrays from structures
> > a division is added
> > 
> > all this make the code significantly slower
> > 
> > Can you explain what equation you are trying to implement ?
> 
> 
> Changed the code (second patch), now testbench results changed from:
> 29891505 dezicycles in first, 2 runs, 0 skips
> 29780850 dezicycles in first, 4 runs, 0 skips
> 30056100 dezicycles in first, 8 runs, 0 skips
> 30378746 dezicycles in first, 16 runs, 0 skips
> 31263998 dezicycles in first, 32 runs, 0 skips
> 31422349 dezicycles in first, 64 runs, 0 skips
> 31441573 dezicycles in first, 128 runs, 0 skips
> 31319009 dezicycles in first, 256 runs, 0 skips
> 30925767 dezicycles in first, 512 runs, 0 skips
> 33965521 dezicycles in first, 1024 runs, 0 skips
> 32342480 dezicycles in first, 2048 runs, 0 skips
> 31631954 dezicycles in first, 4096 runs, 0 skips
> 31252298 dezicycles in first, 8192 runs, 0 skips
> 31572626 dezicycles in first, 16383 runs, 1 skips
> 31102288 dezicycles in first, 32767 runs, 1 skips
> 
> to:
> 26084640 dezicycles in first, 2 runs, 0 skips
> 23856690 dezicycles in first, 4 runs, 0 skips
> 24238267 dezicycles in first, 8 runs, 0 skips
> 26151311 dezicycles in first, 16 runs, 0 skips
> 25807400 dezicycles in first, 32 runs, 0 skips
> 27391090 dezicycles in first, 64 runs, 0 skips
> 26028030 dezicycles in first, 128 runs, 0 skips
> 23729756 dezicycles in first, 256 runs, 0 skips
> 22114165 dezicycles in first, 512 runs, 0 skips
> 21465190 dezicycles in first, 1024 runs, 0 skips
> 20951560 dezicycles in first, 2048 runs, 0 skips
> 20736770 dezicycles in first, 4096 runs, 0 skips
> 20573711 dezicycles in first, 8192 runs, 0 skips
> 20570483 dezicycles in first, 16384 runs, 0 skips
> 20634111 dezicycles in first, 32768 runs, 0 skips
> 
> With the second patch applied (non alpha in input):
> 24551340 dezicycles in second, 2 runs, 0 skips
> 23764147 dezicycles in second, 4 runs, 0 skips
> 23118037 dezicycles in second, 8 runs, 0 skips
> 22992204 dezicycles in second, 16 runs, 0 skips
> 22960603 dezicycles in second, 32 runs, 0 skips
> 23015486 dezicycles in second, 64 runs, 0 skips
> 23007612 dezicycles in second, 128 runs, 0 skips
> 22955180 dezicycles in second, 256 runs, 0 skips
> 23277693 dezicycles in second, 512 runs, 0 skips
> 23147960 dezicycles in second, 1024 runs, 0 skips
> 22940401 dezicycles in second, 2048 runs, 0 skips
> 22811952 dezicycles in second, 4096 runs, 0 skips
> 22760982 dezicycles in second, 8192 runs, 0 skips
> 22676573 dezicycles in second, 16384 runs, 0 skips
> 22622130 dezicycles in second, 32768 runs, 0 skips
> (due to the added ifs).
> 
> With alpha in the main input/output:
> 41009130 dezicycles in second, 2 runs, 0 skips
> 36964740 dezicycles in second, 4 runs, 0 skips
> 34723803 dezicycles in second, 8 runs, 0 skips
> 39728604 dezicycles in second, 16 runs, 0 skips
> 40790327 dezicycles in second, 32 runs, 0 skips
> 38958495 dezicycles in second, 64 runs, 0 skips
> 36674410 dezicycles in second, 128 runs, 0 skips
> 35057610 dezicycles in second, 256 runs, 0 skips
> 33985402 dezicycles in second, 512 runs, 0 skips
> 33323452 dezicycles in second, 1024 runs, 0 skips
> 32870493 dezicycles in second, 2048 runs, 0 skips
> 32565989 dezicycles in second, 4096 runs, 0 skips
> 32464448 dezicycles in second, 8192 runs, 0 skips
> 32574558 dezicycles in second, 16384 runs, 0 skips
> 32468892 dezicycles in second, 32768 runs, 0 skips
> 
> Regarding the second patch, I kept Mark's code but after some time
> spent tinkering on it I couldn't figure out the meaning of the
> equation:
>     d[da] = ( (d[da] << 8) + (256 - d[da]) * s[sa] ) >> 8;
> 
> Mark, could you give another look at this patch, check if it's correct
> and explain the meaning of this last assignment?
> -- 
> FFmpeg = Fancy Fast Mythic Ponderous Evanescent Gangster

>  doc/filters.texi         |   15 +++++
>  libavfilter/vf_overlay.c |  120 +++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 115 insertions(+), 20 deletions(-)
> c09092b8ecd4d804398284c9d5a10469b7f28167  0002-vf_overlay-enable-RGB-path.patch
> From dfc4f58cfbf8edceef0a84d7779c237543dcd2ca Mon Sep 17 00:00:00 2001
> From: Stefano Sabatini <stefasab at gmail.com>
> Date: Sat, 29 Oct 2011 00:10:43 +0200
> Subject: [PATCH] vf_overlay: enable RGB path
> 
> Add option rgb which forces the RGB path.
> (note: remove timer when committing)
> ---
>  doc/filters.texi         |   15 +++++-
>  libavfilter/vf_overlay.c |  120 +++++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 115 insertions(+), 20 deletions(-)
> 
> diff --git a/doc/filters.texi b/doc/filters.texi
> index 9530112..0da5702 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -1631,10 +1631,10 @@ Overlay one video on top of another.
>  It takes two inputs and one output, the first input is the "main"
>  video on which the second input is overlayed.
>  
> -It accepts the parameters: @var{x}:@var{y}.
> +It accepts the parameters: @var{x}:@var{y}[:@var{options}].
>  
>  @var{x} is the x coordinate of the overlayed video on the main video,
> - at var{y} is the y coordinate. The parameters are expressions containing
> + at var{y} is the y coordinate. @var{x} and @var{y} are expressions containing
>  the following parameters:
>  
>  @table @option
> @@ -1651,6 +1651,17 @@ overlay input width and height
>  same as @var{overlay_w} and @var{overlay_h}
>  @end table
>  
> + at var{options} is an optional list of @var{key}=@var{value} pairs,
> +separated by ":".
> +
> +The description of the accepted options follows.
> +
> + at table @option
> + at item rgb
> +If set to 1, force the filter to accept inputs in the RGB
> +colorspace. Default value is 0.
> + at end table
> +
>  Be aware that frames are taken from each input video in timestamp
>  order, hence, if their initial timestamps differ, it is a a good idea
>  to pass the two inputs through a @var{setpts=PTS-STARTPTS} filter to
> diff --git a/libavfilter/vf_overlay.c b/libavfilter/vf_overlay.c
> index 57c9fe9..1f06926 100644
> --- a/libavfilter/vf_overlay.c
> +++ b/libavfilter/vf_overlay.c
> @@ -32,7 +32,9 @@
>  #include "libavutil/pixdesc.h"
>  #include "libavutil/imgutils.h"
>  #include "libavutil/mathematics.h"
> +#include "libavutil/timer.h"
>  #include "internal.h"
> +#include "drawutils.h"
>  
>  static const char *var_names[] = {
>      "main_w",    "W", ///< width  of the main    video
> @@ -53,13 +55,31 @@ enum var_name {
>  #define MAIN    0
>  #define OVERLAY 1
>  
> +#define R 0
> +#define G 1
> +#define B 2
> +#define A 3
> +
> +#define Y 0
> +#define U 1
> +#define V 2
> +
>  typedef struct {
>      const AVClass *class;
>      int x, y;                   ///< position of overlayed picture
>  
> +    int allow_packed_rgb;
> +    uint8_t main_is_packed_rgb;
> +    uint8_t main_rgba_map[4];
> +    uint8_t main_has_alpha;
> +    uint8_t overlay_is_packed_rgb;
> +    uint8_t overlay_rgba_map[4];
> +    uint8_t overlay_has_alpha;
> +
>      AVFilterBufferRef *overpicref;
>  
> -    int max_plane_step[4];      ///< steps per pixel for each plane
> +    int main_pix_step[4];       ///< steps per pixel for each plane of the main output
> +    int overlay_pix_step[4];    ///< steps per pixel for each plane of the overlay
>      int hsub, vsub;             ///< chroma subsampling values
>  
>      char *x_expr, *y_expr;
> @@ -70,6 +90,7 @@ typedef struct {
>  static const AVOption overlay_options[] = {
>      { "x", "set the x expression", OFFSET(x_expr), AV_OPT_TYPE_STRING, {.str = "0"}, CHAR_MIN, CHAR_MAX },
>      { "y", "set the y expression", OFFSET(y_expr), AV_OPT_TYPE_STRING, {.str = "0"}, CHAR_MIN, CHAR_MAX },
> +    {"rgb", "force packed RGB in input and output", OFFSET(allow_packed_rgb), AV_OPT_TYPE_INT, {.dbl=0}, 0, 1 },
>      {NULL},
>  };
>  
> @@ -128,27 +149,59 @@ static av_cold void uninit(AVFilterContext *ctx)
>  
>  static int query_formats(AVFilterContext *ctx)
>  {
> -    const enum PixelFormat inout_pix_fmts[] = { PIX_FMT_YUV420P,  PIX_FMT_NONE };
> -    const enum PixelFormat blend_pix_fmts[] = { PIX_FMT_YUVA420P, PIX_FMT_NONE };
> -    AVFilterFormats *inout_formats = avfilter_make_format_list(inout_pix_fmts);
> -    AVFilterFormats *blend_formats = avfilter_make_format_list(blend_pix_fmts);
> +    OverlayContext *over = ctx->priv;
> +
> +    /* overlay formats contains alpha, for avoiding conversion with alpha information loss */
> +    const enum PixelFormat main_pix_fmts_yuv[] = { PIX_FMT_YUV420P,  PIX_FMT_NONE };
> +    const enum PixelFormat overlay_pix_fmts_yuv[] = { PIX_FMT_YUVA420P, PIX_FMT_NONE };
> +    const enum PixelFormat main_pix_fmts_rgb[] = {
> +        PIX_FMT_ARGB,  PIX_FMT_RGBA,
> +        PIX_FMT_ABGR,  PIX_FMT_BGRA,
> +        PIX_FMT_RGB24, PIX_FMT_BGR24,
> +        PIX_FMT_NONE
> +    };
> +    const enum PixelFormat overlay_pix_fmts_rgb[] = {
> +        PIX_FMT_ARGB,  PIX_FMT_RGBA,
> +        PIX_FMT_ABGR,  PIX_FMT_BGRA,
> +        PIX_FMT_NONE
> +    };
> +
> +    AVFilterFormats *main_formats;
> +    AVFilterFormats *overlay_formats;
> +
> +    if (over->allow_packed_rgb) {
> +        main_formats    = avfilter_make_format_list(main_pix_fmts_rgb);
> +        overlay_formats = avfilter_make_format_list(overlay_pix_fmts_rgb);
> +    } else {
> +        main_formats    = avfilter_make_format_list(main_pix_fmts_yuv);
> +        overlay_formats = avfilter_make_format_list(overlay_pix_fmts_yuv);
> +    }
>  
> -    avfilter_formats_ref(inout_formats, &ctx->inputs [MAIN   ]->out_formats);
> -    avfilter_formats_ref(blend_formats, &ctx->inputs [OVERLAY]->out_formats);
> -    avfilter_formats_ref(inout_formats, &ctx->outputs[MAIN   ]->in_formats );
> +    avfilter_formats_ref(main_formats,    &ctx->inputs [MAIN   ]->out_formats);
> +    avfilter_formats_ref(overlay_formats, &ctx->inputs [OVERLAY]->out_formats);
> +    avfilter_formats_ref(main_formats,    &ctx->outputs[MAIN   ]->in_formats );
>  
>      return 0;
>  }
>  
> +static enum PixelFormat alpha_pix_fmts[] = {
> +    PIX_FMT_YUVA420P, PIX_FMT_ARGB, PIX_FMT_ABGR, PIX_FMT_RGBA,
> +    PIX_FMT_BGRA, PIX_FMT_NONE
> +};
> +
>  static int config_input_main(AVFilterLink *inlink)
>  {
>      OverlayContext *over = inlink->dst->priv;
>      const AVPixFmtDescriptor *pix_desc = &av_pix_fmt_descriptors[inlink->format];
>  
> -    av_image_fill_max_pixsteps(over->max_plane_step, NULL, pix_desc);
> +    av_image_fill_max_pixsteps(over->main_pix_step,    NULL, pix_desc);
> +
>      over->hsub = pix_desc->log2_chroma_w;
>      over->vsub = pix_desc->log2_chroma_h;
>  
> +    over->main_is_packed_rgb =
> +        ff_fill_rgba_map(over->main_rgba_map, inlink->format) >= 0;
> +    over->main_has_alpha = ff_fmt_is_in(inlink->format, alpha_pix_fmts);
>      return 0;
>  }
>  
> @@ -159,6 +212,9 @@ static int config_input_overlay(AVFilterLink *inlink)
>      char *expr;
>      double var_values[VAR_VARS_NB], res;
>      int ret;
> +    const AVPixFmtDescriptor *pix_desc = &av_pix_fmt_descriptors[inlink->format];
> +
> +    av_image_fill_max_pixsteps(over->overlay_pix_step, NULL, pix_desc);
>  
>      /* Finish the configuration by evaluating the expressions
>         now when both inputs are configured. */
> @@ -181,6 +237,10 @@ static int config_input_overlay(AVFilterLink *inlink)
>          goto fail;
>      over->x = res;
>  
> +    over->overlay_is_packed_rgb =
> +        ff_fill_rgba_map(over->overlay_rgba_map, inlink->format) >= 0;
> +    over->overlay_has_alpha = ff_fmt_is_in(inlink->format, alpha_pix_fmts);
> +
>      av_log(ctx, AV_LOG_INFO,
>             "main w:%d h:%d fmt:%s overlay x:%d y:%d w:%d h:%d fmt:%s\n",
>             ctx->inputs[MAIN]->w, ctx->inputs[MAIN]->h,
> @@ -289,25 +349,49 @@ static void blend_slice(AVFilterContext *ctx,
>      start_y = FFMAX(y, slice_y);
>      height = end_y - start_y;
>  
> -    if (dst->format == PIX_FMT_BGR24 || dst->format == PIX_FMT_RGB24) {
> -        uint8_t *dp = dst->data[0] + x * 3 + start_y * dst->linesize[0];
> +    if (over->main_is_packed_rgb) {
> +        uint8_t *dp = dst->data[0] + x * over->main_pix_step[0] +
> +                      start_y * dst->linesize[0];
>          uint8_t *sp = src->data[0];
> -        int b = dst->format == PIX_FMT_BGR24 ? 2 : 0;
> -        int r = dst->format == PIX_FMT_BGR24 ? 0 : 2;
> +        uint8_t alpha;          ///< the amount of overlay to blend on to main
> +        const int dr = over->main_rgba_map[R];
> +        const int dg = over->main_rgba_map[G];
> +        const int db = over->main_rgba_map[B];
> +        const int dstep = over->main_pix_step[0];
> +        const int sr = over->overlay_rgba_map[R];
> +        const int sg = over->overlay_rgba_map[G];
> +        const int sb = over->overlay_rgba_map[B];
> +        const int sa = over->overlay_rgba_map[A];
> +        const int sstep = over->overlay_pix_step[0];
>          if (slice_y > y)
>              sp += (slice_y - y) * src->linesize[0];
> +        START_TIMER
>          for (i = 0; i < height; i++) {
>              uint8_t *d = dp, *s = sp;
>              for (j = 0; j < width; j++) {
> -                d[r] = (d[r] * (0xff - s[3]) + s[0] * s[3] + 128) >> 8;
> -                d[1] = (d[1] * (0xff - s[3]) + s[1] * s[3] + 128) >> 8;
> -                d[b] = (d[b] * (0xff - s[3]) + s[2] * s[3] + 128) >> 8;
> -                d += 3;
> -                s += 4;
> +                alpha = s[sa];
> +                switch (alpha) {
> +                case 0:
> +                    break;
> +                case 255:
> +                    d[dr] = s[sr];
> +                    d[dg] = s[sg];
> +                    d[db] = s[sb];
> +                    break;
> +                default:
> +                    // main_value = main_value * (1 - alpha) + overlay_value * alpha

> +                    // apply a fast approximation: X/255 ~ (X+128)/256

please use +128*257>>16 (which is exact)

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

You can kill me, but you cannot change the truth.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111030/951eaef9/attachment.asc>


More information about the ffmpeg-devel mailing list