[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output

Michael Niedermayer michaelni at gmx.at
Sat Oct 29 17:50:37 CEST 2011


On Sat, Oct 29, 2011 at 04:33:59PM +0100, Mark Himsley wrote:
> On 29/10/2011 03:10, Michael Niedermayer wrote:
> >On Sat, Oct 29, 2011 at 12:56:15AM +0200, Stefano Sabatini wrote:
> >>On date Thursday 2011-10-27 01:01:40 +0200, Michael Niedermayer encoded:
> 
> [...]
> 
> >>the original code looked like this:
> >>>>  -                d[r] = (d[r] * (0xff - s[3]) + s[0] * s[3] + 128)>>  8;
> >>>>  -                d[1] = (d[1] * (0xff - s[3]) + s[1] * s[3] + 128)>>  8;
> >>>>  -                d[b] = (d[b] * (0xff - s[3]) + s[2] * s[3] + 128)>>  8;
> >>when i saw what you replaced it by i was ... scared ;)
> >>
> >>if and switch are added in the innermost loop
> >>constants are replaced by variables
> >>variables are replaced by reading out of arrays from structures
> >>a division is added
> >>
> >>all this make the code significantly slower
> 
> That is not correct.
> 
> Please correct me if I am wrong, but the code you quoted current can
> not be executed, because currently the overlay filter only outputs
> PIX_FMT_YUV420P, and the section you quoted can only be executed if
> the destination filter has negotiated PIX_FMT_BGR24 ||
> PIX_FMT_RGB24.
> 
> Further, I believe I added significant speed increases compared to
> the previous (unused) implementation.
> 
> An example of a speed improvement is the switch statement. Where as
> the previous implementation always multiplied every pixel, in my
> implementation; if the key channel is zero or the key channel is 255
> then no multiplication happens. For many real-world use-cases, such
> as keying a bug over a video, this is of large benefit - speeding up
> such use-cases by 15% or more.

if you have large areas of 0 or 255 it will be faster to detect them
in larger blocks like checking aligned 32 pr 64bit words to be all 255
or all 0.
this also makes it more friendly to SIMD optimization which alone
can make teh code 4+ times faster.
Also making sure width/height of the overlay is minimal should
help.


> 
> Of cause, if further optimisations can be applied that's great, but
> since the RGB workflow is not used currently I hope you can accept
> additional functionality even if it is not 100% optimised.

Thats a misunderstanding here somewhere. Iam very happy to accpet the
new functionality, i am unhappy about the included optimization because
if i want to optimize this further i first have to reverse engeneer
and undo this optimization

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Democracy is the form of government in which you can choose your dictator
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111029/412cb9f3/attachment.asc>


More information about the ffmpeg-devel mailing list