[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output

Michael Niedermayer michaelni at gmx.at
Sun Oct 30 22:47:31 CET 2011


On Sun, Oct 30, 2011 at 10:34:41PM +0100, Stefano Sabatini wrote:
> On date Sunday 2011-10-30 14:42:38 +0100, Michael Niedermayer encoded:
> > On Sat, Oct 29, 2011 at 04:47:41PM +0200, Stefano Sabatini wrote:
> [...]
> > > +                switch (alpha) {
> > > +                case 0:
> > > +                    break;
> > > +                case 255:
> > > +                    d[dr] = s[sr];
> > > +                    d[dg] = s[sg];
> > > +                    d[db] = s[sb];
> > > +                    break;
> > > +                default:
> > > +                    // main_value = main_value * (1 - alpha) + overlay_value * alpha
> > 
> > > +                    // apply a fast approximation: X/255 ~ (X+128)/256
> > 
> > please use +128*257>>16 (which is exact)
> 
> Uhm I suppose you meant:
> ((X * 257) + 257)>> 16

i think we want round to nearest which is
(x+127)/255
or
((x+127)*257 + 257)>>16

this can be simplified to
((x+128)*257)>>16

(above all untested!)


> 
> For the interested reader:
> research.swtch.com/2008/01/division-via-multiplication.html
> (or read TAOCP if you want the long version ;-)).
> 
> Then I tested with the plain version:
> 22001580 dezicycles in first, 2 runs, 0 skips
> 22377187 dezicycles in first, 4 runs, 0 skips
> 22358670 dezicycles in first, 8 runs, 0 skips
> 22430178 dezicycles in first, 16 runs, 0 skips
> 27048690 dezicycles in first, 32 runs, 0 skips
> 24722512 dezicycles in first, 64 runs, 0 skips
> 23467227 dezicycles in first, 128 runs, 0 skips
> 22707239 dezicycles in first, 256 runs, 0 skips
> 22325824 dezicycles in first, 512 runs, 0 skips
> 22106139 dezicycles in first, 1024 runs, 0 skips
> 22007162 dezicycles in first, 2048 runs, 0 skips
> 21959926 dezicycles in first, 4096 runs, 0 skips
> 21978105 dezicycles in first, 8192 runs, 0 skips
> 21927611 dezicycles in first, 16384 runs, 0 skips
> 21889967 dezicycles in first, 32768 runs, 0 skips
> 
> With the optmized variant:
> 20987625 dezicycles in first, 2 runs, 0 skips
> 20781405 dezicycles in first, 4 runs, 0 skips
> 20581886 dezicycles in first, 8 runs, 0 skips
> 20787228 dezicycles in first, 16 runs, 0 skips
> 21084062 dezicycles in first, 32 runs, 0 skips
> 21028600 dezicycles in first, 64 runs, 0 skips
> 20786884 dezicycles in first, 128 runs, 0 skips
> 20671322 dezicycles in first, 256 runs, 0 skips
> 20563223 dezicycles in first, 512 runs, 0 skips
> 20527375 dezicycles in first, 1024 runs, 0 skips
> 20481658 dezicycles in first, 2048 runs, 0 skips
> 20452863 dezicycles in first, 4096 runs, 0 skips
> 20535609 dezicycles in first, 8192 runs, 0 skips
> 20503526 dezicycles in first, 16384 runs, 0 skips
> 20465800 dezicycles in first, 32768 runs, 0 skips
> 

> But I confess that I always build ffmpeg with optimizations disabled

you really should not when doing optimizations


> (for easing debugging) and I suppose that most decent compilers
> will know all about these numerical tricks, so I'm not sure if
> these hand-crafted optimizations are worth the code obfuscation.

gcc has to proof that x*257 wont overflow, is within the range
where its valid and that x is not negative.
so i wouldnt bet that it reliably can do this on its own
A human will often just know something isnt negative while a compiler
might just not be able to proof it. In this case it might work out,
i havnt checked what gcc creates out of the divide

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No snowflake in an avalanche ever feels responsible. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111030/76946a96/attachment.asc>


More information about the ffmpeg-devel mailing list