[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output

Mark Himsley mark at mdsh.com
Sat Oct 29 17:33:59 CEST 2011

On 29/10/2011 03:10, Michael Niedermayer wrote:
> On Sat, Oct 29, 2011 at 12:56:15AM +0200, Stefano Sabatini wrote:
>> On date Thursday 2011-10-27 01:01:40 +0200, Michael Niedermayer encoded:


>> the original code looked like this:
>>> >  -                d[r] = (d[r] * (0xff - s[3]) + s[0] * s[3] + 128)>>  8;
>>> >  -                d[1] = (d[1] * (0xff - s[3]) + s[1] * s[3] + 128)>>  8;
>>> >  -                d[b] = (d[b] * (0xff - s[3]) + s[2] * s[3] + 128)>>  8;
>> when i saw what you replaced it by i was ... scared ;)
>> if and switch are added in the innermost loop
>> constants are replaced by variables
>> variables are replaced by reading out of arrays from structures
>> a division is added
>> all this make the code significantly slower

That is not correct.

Please correct me if I am wrong, but the code you quoted current can not 
be executed, because currently the overlay filter only outputs 
PIX_FMT_YUV420P, and the section you quoted can only be executed if the 
destination filter has negotiated PIX_FMT_BGR24 || PIX_FMT_RGB24.

Further, I believe I added significant speed increases compared to the 
previous (unused) implementation.

An example of a speed improvement is the switch statement. Where as the 
previous implementation always multiplied every pixel, in my 
implementation; if the key channel is zero or the key channel is 255 
then no multiplication happens. For many real-world use-cases, such as 
keying a bug over a video, this is of large benefit - speeding up such 
use-cases by 15% or more.

Of cause, if further optimisations can be applied that's great, but 
since the RGB workflow is not used currently I hope you can accept 
additional functionality even if it is not 100% optimised.


> Can you explain what equation you are trying to implement ?

Two things: RGBA workflow through the overlay filter, and 
non-premultiplied overlay.

As the original proposer of the patch I would like to explain a use case:

In News organisations it is a requirement that name-captions are made in 
a standard format. I will call the completed name-caption a 
"lower-third" graphic.

Lower-thirds usually consist of text rendered in a specific font with 
specific positioning and kerning, keyed (overlayed) over a moving video 
background (which in its-self will have a key channel).

News video editors can be short for time, so pre-preparing lower-thirds 
that can be simply used in video editing software is a vital part of 

In my example a non-technical journalist could enter the name and 
designation of a contributor into an application. The application could 
render out a video file that is the complete lower-third. The video 
editor could then import that video file into their editing software and 
"drop" it on top of the video of the contributor, adding dissolves to 
make the lower-third fade on and off at the appropriate times.

Imagine that the lower third's video background is pure red but with a 
50% alpha. I want to key text over that and output a file with fill and 
key channels.

Where there is no text, the keyer MUST output the video background and 
key EXACTLY as it was in the source background video. Therefore this is 
said to be non-premultiplied.

That is what I implemented.

In the future I plan to propose further patches to the overlay filter.
The overlay filter needs to accept more pixel formats, most important to 
me are yuv422 and yuva422. And the YUV overlay section needs to be able 
to output non-premultiplied fill and calculated key channels too.


More information about the ffmpeg-devel mailing list