[FFmpeg-devel] [PATCH 2/2] Add hflip filter.

Stefano Sabatini stefano.sabatini-lala
Mon Aug 16 21:57:44 CEST 2010


On date Monday 2010-08-16 09:07:42 -0400, Ronald S. Bultje encoded:
> Hi,
> 
> On Thu, Aug 12, 2010 at 2:35 PM, Stefano Sabatini
> <stefano.sabatini-lala at poste.it> wrote:
> > On date Thursday 2010-08-12 12:49:25 -0400, Ronald S. Bultje encoded:
[...]
> >> > I didn't manage to understand how bswap/dsputils may be used, I don't
> >> > know if that would improve it.
> >>
> >> You could create a VideoFilterDSPContext (or a
> >> HFlipVideoFilterDSPContext), add a function hflip to it, and then any
> >> one of us could optimize it. E.g. for RGBA32, where step is probably
> >> 4, we would read it as 8/16-bytes-at-once, flip them using e.g. pshufw
> >> or something, (do the same for the opposite pixels at the end of the
> >> row, ) and then write them out again -> you just did 2x 2/4 pixels at
> >> once. By using multiple registries and making sure there's enough
> >> padding (which I think is always the case), this'd get even faster,
> >> also because for at least the left read/write, we can use aligned r/w
> >> which is faster.
> >>
> >> Not sure if that's what Michael meant, but I guess it's sort of in the
> >> right direction.
> >
> > OK I see thanks, I suggest anyway to commit this simple variant, and
> > then work on the optimizations.
> [..]
> > +            case 3:
> > +            {
> > +                uint8_t *in  =  inrow;
> > +                uint8_t *out = outrow;
> > +                for (j = 0; j < (inlink->w >> hsub); j++, out += 3, in -= 3) {
> > +                    out[0] = in[0];
> > +                    out[1] = in[1];
> > +                    out[2] = in[2];
> > +                }
> > +            }
> > +            break;
> 
> You can use a uint16+t + uint8_t write here instead of 3 uint8_t writes.

I followed Mans' hint, using AV_RB24 now.
 
> [..]
> +            default:
> +                for (j = 0; j < (inlink->w >> hsub); j++)
> +                    memcpy(outrow + j*step, inrow - j*step, step);
> +            }
> 
> Do we have pixelformats like this?

No, but this way looks more robust (I plan to change the hardcoded
list of supported pixfmts with some code to add all the possibly
supported formats, and I don't want to fail in case of addition of new
pixel formats).

> Can you write an implementation that uses a draw_slice[5] vfunc array,
> which is easier to modify into a HFlipVFilterBlaDSPContext, which you
> address like func[FFMIN(step, 5)-1](..), so that optimized functions
> can be written easier?

My problem with dsputil is that:
* it's defined in lavc, while I'm trying hard at keeping lavfi as
independant as possible from lavc.
* I really don't know it, and now I prefer to focus on other things.

Both problems can be addressed (moving dsputil to lavcore and studying
it), but for the moment I'd prefer just get the thing committed and
only hereafter work on optimizations.

Updated with the addition of a pixfmts test.

Regards.
-- 
FFmpeg = Friendly and Fancy Mere Programmable Ecletic Gadget



More information about the ffmpeg-devel mailing list