[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output

Mon Oct 31 00:06:35 CET 2011

On date Sunday 2011-10-30 22:47:31 +0100, Michael Niedermayer encoded:
> On Sun, Oct 30, 2011 at 10:34:41PM +0100, Stefano Sabatini wrote:
> > On date Sunday 2011-10-30 14:42:38 +0100, Michael Niedermayer encoded:
> > > On Sat, Oct 29, 2011 at 04:47:41PM +0200, Stefano Sabatini wrote:
> > [...]
> > > > +                switch (alpha) {
> > > > +                case 0:
> > > > +                    break;
> > > > +                case 255:
> > > > +                    d[dr] = s[sr];
> > > > +                    d[dg] = s[sg];
> > > > +                    d[db] = s[sb];
> > > > +                    break;
> > > > +                default:
> > > > +                    // main_value = main_value * (1 - alpha) + overlay_value * alpha
> > > 
> > > > +                    // apply a fast approximation: X/255 ~ (X+128)/256
> > > 
> > > please use +128*257>>16 (which is exact)
> > 
> > Uhm I suppose you meant:
> > ((X * 257) + 257)>> 16
> 
> i think we want round to nearest which is
> (x+127)/255
> or
> ((x+127)*257 + 257)>>16
> 
> this can be simplified to
> ((x+128)*257)>>16
> 
> (above all untested!)
> 
> 
> > 
> > For the interested reader:
> > research.swtch.com/2008/01/division-via-multiplication.html
> > (or read TAOCP if you want the long version ;-)).
> > 
> > Then I tested with the plain version:
> > 22001580 dezicycles in first, 2 runs, 0 skips
> > 22377187 dezicycles in first, 4 runs, 0 skips
> > 22358670 dezicycles in first, 8 runs, 0 skips
> > 22430178 dezicycles in first, 16 runs, 0 skips
> > 27048690 dezicycles in first, 32 runs, 0 skips
> > 24722512 dezicycles in first, 64 runs, 0 skips
> > 23467227 dezicycles in first, 128 runs, 0 skips
> > 22707239 dezicycles in first, 256 runs, 0 skips
> > 22325824 dezicycles in first, 512 runs, 0 skips
> > 22106139 dezicycles in first, 1024 runs, 0 skips
> > 22007162 dezicycles in first, 2048 runs, 0 skips
> > 21959926 dezicycles in first, 4096 runs, 0 skips
> > 21978105 dezicycles in first, 8192 runs, 0 skips
> > 21927611 dezicycles in first, 16384 runs, 0 skips
> > 21889967 dezicycles in first, 32768 runs, 0 skips
> > 
> > With the optmized variant:
> > 20987625 dezicycles in first, 2 runs, 0 skips
> > 20781405 dezicycles in first, 4 runs, 0 skips
> > 20581886 dezicycles in first, 8 runs, 0 skips
> > 20787228 dezicycles in first, 16 runs, 0 skips
> > 21084062 dezicycles in first, 32 runs, 0 skips
> > 21028600 dezicycles in first, 64 runs, 0 skips
> > 20786884 dezicycles in first, 128 runs, 0 skips
> > 20671322 dezicycles in first, 256 runs, 0 skips
> > 20563223 dezicycles in first, 512 runs, 0 skips
> > 20527375 dezicycles in first, 1024 runs, 0 skips
> > 20481658 dezicycles in first, 2048 runs, 0 skips
> > 20452863 dezicycles in first, 4096 runs, 0 skips
> > 20535609 dezicycles in first, 8192 runs, 0 skips
> > 20503526 dezicycles in first, 16384 runs, 0 skips
> > 20465800 dezicycles in first, 32768 runs, 0 skips
> > 
> 
> > But I confess that I always build ffmpeg with optimizations disabled
> 
> you really should not when doing optimizations
> 
> 
> > (for easing debugging) and I suppose that most decent compilers
> > will know all about these numerical tricks, so I'm not sure if
> > these hand-crafted optimizations are worth the code obfuscation.
> 
> gcc has to proof that x*257 wont overflow, is within the range
> where its valid and that x is not negative.
> so i wouldnt bet that it reliably can do this on its own
> A human will often just know something isnt negative while a compiler
> might just not be able to proof it. In this case it might work out,
> i havnt checked what gcc creates out of the divide

Makes sense, thanks for sharing your insight.

Patches updated, I used a macro for the fast 255 division which should
ease readability.
-- 
FFmpeg = Free & Funny Minimal Purposeless Elfic Game
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-vf_overlay-enable-RGB-path.patch
Type: text/x-diff
Size: 10109 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111031/0c5c0cec/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-vf_overlay-add-support-to-alpha-pre-multiplication-i.patch
Type: text/x-diff
Size: 3270 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111031/0c5c0cec/attachment-0001.bin>