[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output

Michael Niedermayer michaelni at gmx.at
Sat Oct 29 04:10:04 CEST 2011


On Sat, Oct 29, 2011 at 12:56:15AM +0200, Stefano Sabatini wrote:
> On date Thursday 2011-10-27 01:01:40 +0200, Michael Niedermayer encoded:
> > On Thu, Oct 27, 2011 at 12:25:43AM +0200, Stefano Sabatini wrote:
> > > From 72b3c79a550961b3e215e5f1e6d42da3c362751e Mon Sep 17 00:00:00 2001
> > > From: Stefano Sabatini <stefasab at gmail.com>
> > > Date: Mon, 24 Oct 2011 20:00:21 +0200
> > > Subject: [PATCH] vf_overlay: add support to RGB packed input and output
> > > 
> > > Also add support to alpha pre-multiplication in the RGBA path.
> > > 
> > > Based on the work of Mark Himsley <mark at mdsh.com>.
> > > 
> > > See thread:
> > > Subject: [FFmpeg-devel] libavfilter: extending overlay filter
> > > Date: Sun, 13 Mar 2011 14:18:42 +0000
> > > ---
> > >  doc/filters.texi         |   15 +++++-
> > >  libavfilter/vf_overlay.c |  134 +++++++++++++++++++++++++++++++++++++++------
> > >  2 files changed, 129 insertions(+), 20 deletions(-)
> [...]
> > >          for (i = 0; i < height; i++) {
> > >              uint8_t *d = dp, *s = sp;
> > >              for (j = 0; j < width; j++) {
> > 
> > > -                d[r] = (d[r] * (0xff - s[3]) + s[0] * s[3] + 128) >> 8;
> > > -                d[1] = (d[1] * (0xff - s[3]) + s[1] * s[3] + 128) >> 8;
> > > -                d[b] = (d[b] * (0xff - s[3]) + s[2] * s[3] + 128) >> 8;
> > > -                d += 3;
> > > -                s += 4;
> > > +                // compute the blend multiplication of overlay over the main
> > > +                alpha = s[over->overlay_rgba_map[A]];
> > > +                // if the main channel has an alpha channel, alpha has to be calculated
> > > +                // to create an un-premultiplied (straight) alpha value
> > > +                if (over->main_has_alpha) {
> > > +                    // apply the general equation:
> > > +                    // alpha = alpha_overlay / ((alpha_main + alpha_overlay) - alpha_main * alpha_overlay)
> > > +                    //
> > > +                    // if alpha_main = 0 => alpha = 0
> > > +                    // if alpha_main = 1 => alpha = alpha_overlay
> > > +                    switch (alpha) {
> > > +                        case 0:
> > > +                        case 0xff:
> > > +                            break;
> > > +                        default:
> > > +                            // the un-premultiplied calculation is:
> > > +                            // (255 * 255 * overlay_alpha) / ( 255 * (overlay_alpha + main_alpha) - (overlay_alpha * main_alpha) )
> > > +                            alpha =
> > > +                            // the next line is a faster version of:  255 * 255 * alpha
> > > +                                ( (alpha << 16) - (alpha << 9) + alpha )
> > > +                                / (
> > > +                            // the next line is a faster version of: 255 * (blend + d[over->inout_rgba_map[A]])
> > > +                                    ((alpha + d[over->main_rgba_map[A]]) << 8 ) - (alpha + d[over->main_rgba_map[A]])
> > > +                                    - d[over->main_rgba_map[A]] * alpha
> > > +                                );
> > > +                    }
> > > +                }
> > > +                switch (alpha) {
> > > +                    case 0:
> > > +                        break;
> > > +                    case 0xff:
> > > +                        d[over->main_rgba_map[R]] = s[over->overlay_rgba_map[R]];
> > > +                        d[over->main_rgba_map[G]] = s[over->overlay_rgba_map[G]];
> > > +                        d[over->main_rgba_map[B]] = s[over->overlay_rgba_map[B]];
> > > +                        break;
> > > +                    default:
> > > +                        d[over->main_rgba_map[R]] = (d[over->main_rgba_map[R]] * (255 - alpha) + s[over->overlay_rgba_map[R]] * alpha) / 255;
> > > +                        d[over->main_rgba_map[G]] = (d[over->main_rgba_map[G]] * (255 - alpha) + s[over->overlay_rgba_map[G]] * alpha) / 255;
> > > +                        d[over->main_rgba_map[B]] = (d[over->main_rgba_map[B]] * (255 - alpha) + s[over->overlay_rgba_map[B]] * alpha) / 255;
> > > +                }
> > > +                if (over->main_has_alpha) {
> > > +                    switch (alpha) {
> > > +                    case 0:
> > > +                        break;
> > > +                    case 0xff:
> > > +                        d[over->main_rgba_map[A]] = s[over->overlay_rgba_map[A]];
> > > +                        break;
> > > +                    default:
> > > +                        d[over->main_rgba_map[A]] = (
> > > +                            (d[over->main_rgba_map[A]] << 8) + (0x100 - d[over->main_rgba_map[A]]) * s[over->overlay_rgba_map[A]]
> > > +                        ) >> 8;
> > > +                    }
> > > +                }
> > 
> 
> > please benchmark this with START/STOP_TIMER against the previous code
> 
> RGB path was disabled before this one, I split the present patch and
> did some tests.
> 
> * Test with no alpha in the main input
> 
> before alpha premultiplication
> 1287135 dezicycles in first, 2 runs, 0 skips
> 1335442 dezicycles in first, 4 runs, 0 skips
> 1245555 dezicycles in first, 8 runs, 0 skips
> 1162359 dezicycles in first, 16 runs, 0 skips
> 1144390 dezicycles in first, 32 runs, 0 skips
> 1134602 dezicycles in first, 64 runs, 0 skips
> 1133281 dezicycles in first, 128 runs, 0 skips
> 1114852 dezicycles in first, 256 runs, 0 skips
> 1108999 dezicycles in first, 512 runs, 0 skips
> 1101536 dezicycles in first, 1024 runs, 0 skips
> 1096821 dezicycles in first, 2048 runs, 0 skips
> 1090508 dezicycles in first, 4096 runs, 0 skips
> 1085896 dezicycles in first, 8192 runs, 0 skips
> 1084802 dezicycles in first, 16384 runs, 0 skips
> 1083604 dezicycles in first, 32768 runs, 0 skips
> 
> after alpha premultiplication
> 1224390 dezicycles in second, 2 runs, 0 skips
> 1202235 dezicycles in second, 4 runs, 0 skips
> 1191453 dezicycles in second, 8 runs, 0 skips
> 1183031 dezicycles in second, 16 runs, 0 skips
> 1230087 dezicycles in second, 32 runs, 0 skips
> 1227492 dezicycles in second, 64 runs, 0 skips
> 1230488 dezicycles in second, 128 runs, 0 skips
> 1215128 dezicycles in second, 256 runs, 0 skips
> 1207364 dezicycles in second, 512 runs, 0 skips
> 1199813 dezicycles in second, 1024 runs, 0 skips
> 1195857 dezicycles in second, 2048 runs, 0 skips
> 1193954 dezicycles in second, 4096 runs, 0 skips
> 1194128 dezicycles in second, 8192 runs, 0 skips
> 1187481 dezicycles in second, 16384 runs, 0 skips
> 1181874 dezicycles in second, 32768 runs, 0 skips
> 
> * Test with alpha in the main input:
> 28684935 dezicycles in first, 2 runs, 0 skips
> 28553902 dezicycles in first, 4 runs, 0 skips
> 28776015 dezicycles in first, 8 runs, 0 skips
> 29073680 dezicycles in first, 16 runs, 0 skips
> 28816918 dezicycles in first, 32 runs, 0 skips
> 28908704 dezicycles in first, 64 runs, 0 skips
> 28745401 dezicycles in first, 128 runs, 0 skips
> 28614980 dezicycles in first, 256 runs, 0 skips
> 28609710 dezicycles in first, 512 runs, 0 skips
> 28537037 dezicycles in first, 1024 runs, 0 skips
> 28517850 dezicycles in first, 2048 runs, 0 skips
> 28466515 dezicycles in first, 4096 runs, 0 skips
> 28438388 dezicycles in first, 8192 runs, 0 skips
> 28440383 dezicycles in first, 16384 runs, 0 skips
> 28426314 dezicycles in first, 32768 runs, 0 skips
> 
> 33347880 dezicycles in second, 2 runs, 0 skips
> 33131272 dezicycles in second, 4 runs, 0 skips
> 38018970 dezicycles in second, 8 runs, 0 skips
> 48715928 dezicycles in second, 16 runs, 0 skips
> 44290285 dezicycles in second, 32 runs, 0 skips
> 43696766 dezicycles in second, 64 runs, 0 skips
> 38599173 dezicycles in second, 128 runs, 0 skips
> 36112571 dezicycles in second, 256 runs, 0 skips
> 34737837 dezicycles in second, 512 runs, 0 skips
> 34066213 dezicycles in second, 1024 runs, 0 skips
> 33640178 dezicycles in second, 2048 runs, 0 skips
> 33368757 dezicycles in second, 4096 runs, 0 skips
> 33233522 dezicycles in second, 8192 runs, 0 skips
> 33132908 dezicycles in second, 16384 runs, 0 skips
> 33062949 dezicycles in second, 32768 runs, 0 skips
> 
> Results are as expected, alpha pre-multiplication is significantly
> slower but it may also be what the user wants, so I could make it
> optional (and preserve the original alpha?, enabled by default?).

thats not what i meant

the original code looked like this:
> -                d[r] = (d[r] * (0xff - s[3]) + s[0] * s[3] + 128) >> 8;
> -                d[1] = (d[1] * (0xff - s[3]) + s[1] * s[3] + 128) >> 8;
> -                d[b] = (d[b] * (0xff - s[3]) + s[2] * s[3] + 128) >> 8;

when i saw what you replaced it by i was ... scared ;)

if and switch are added in the innermost loop
constants are replaced by variables
variables are replaced by reading out of arrays from structures
a division is added

all this make the code significantly slower

Can you explain what equation you are trying to implement ?


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111029/efa8c8db/attachment.asc>


More information about the ffmpeg-devel mailing list