[FFmpeg-devel] [PATCH] yadif port to libavfitler

Michael Niedermayer michaelni
Mon May 24 12:49:50 CEST 2010


On Sun, May 23, 2010 at 09:20:29PM -0400, Alexander Strange wrote:
> 
> On May 23, 2010, at 9:03 PM, Baptiste Coudurier wrote:
> 
> > On 5/23/10 4:22 AM, Michael Niedermayer wrote:
> >> On Sun, May 23, 2010 at 04:10:57AM -0700, Baptiste Coudurier wrote:
> >>> Hi Michael,
> >>> 
> >>> On 5/23/10 3:53 AM, Michael Niedermayer wrote:
> >>>> On Thu, May 20, 2010 at 07:52:31PM -0700, Baptiste Coudurier wrote:
> >>>>> On 05/20/2010 01:37 AM, Baptiste Coudurier wrote:
> >>>>>> Hi guys,
> >>>>>> 
> >>>>>> Here is my first attempt to port yadif to libavfilter.
> >>>>>> 
> >>>>>> I chose the request_frame paradigm since the filter has delay.
> >>>>>> What happens when no prev frame is available ? Original code seems to
> >>>>>> read from
> >>>> 
> >>>> read from?
> >>>> (it obviously should use spatial/directional interpolation for the very
> >>>>   first frame)
> >>>> 
> >>>> 
> >>>>>> 
> >>>>>> I'd like extensive review if possible and especially instructions on how
> >>>>>> to retrieve cpu flags at runtime for MMX.
> >>>> 
> >>>> hmm, i guess we need to move libavcodec/x86/cpuid.c to libavutil
> >>>> 
> >>>> 
> >>>>>> 
> >>>>>> Also Michael, can you please check the calls to filter() ?
> >>>> 
> >>>> what do you want me to check exactly?
> >>> 
> >>> The parameters in the filter() call. I'm not sure I fully understand the
> >>> logic width fields and parity. Should it be 1 ^ tff if we output only
> >>> frames for now ?
> >> 
> >> parity selects if we interpolate even or odd lines iirc so that should stay
> >> (that is from memory havnt rechecked the code)
> >> 
> >> 
> >>> 
> >>>>>> I'm also not sure how the field output mode should be supported.
> >>>> 
> >>>> for every 2nd request_frame() we would just call our sources
> >>>> request_frame()
> >>>> (the variant with calling 2 start/slice/end in one request_frame() might
> >>>> work
> >>>>   too but i thinks its less ideal)
> >>>> 
> >>>> 
> >>>>> 
> >>>>> Patch updated, offseting the buffers as needed (please double check), and
> >>>>> memleak free :>
> >>>>> Questions still stands :)
> >>>>> The filter works without copying any input picture which is nice.
> >>>> 
> >>>> does md5 match to mplayers ?
> >>> 
> >>> Is there a md5 in ffmpeg available already ? Otherwise I can cook something
> >>> up.
> >> 
> >> hmm, seems not, i am just spotting crcenc.c which uses adler32 and our
> >> md5 utility code
> > 
> > Thanks to Reimar we have one now, framemd5 should be applied asap, it's very useful when porting filters.
> > 
> >>> 
> >>>> [...]
> >>>>> +static void filter_line_mmx2(AVFilterContext *ctx, uint8_t *dst,
> >>>>> +                             uint8_t *prev, uint8_t *cur, uint8_t *next,
> >>>>> +                             int w, int refs, int parity)
> >>>>> +{
> >>>>> +    YADIFContext *yadif = ctx->priv;
> >>>>> +    static const uint64_t pw_1 = 0x0001000100010001ULL;
> >>>>> +    static const uint64_t pb_1 = 0x0101010101010101ULL;
> >>>>> +    const int mode = yadif->mode;
> >>>>> +    uint64_t tmp0, tmp1, tmp2, tmp3;
> >>>>> +    int x;
> >>>>> +
> >>>>> +#define FILTER\
> >>>>> +    for(x=0; x<w; x+=4){\
> >>>>> +        __asm__ volatile(\
> >>>>> +            "pxor      %%mm7, %%mm7 \n\t"\
> >>>>> +            LOAD4("(%[cur],%[mrefs])", %%mm0) /* c = cur[x-refs] */\
> >>>>> +            LOAD4("(%[cur],%[prefs])", %%mm1) /* e = cur[x+refs] */\
> >>>>> +            LOAD4("(%["prev2"])", %%mm2) /* prev2[x] */\
> >>>>> +            LOAD4("(%["next2"])", %%mm3) /* next2[x] */\
> >>>>> +            "movq      %%mm3, %%mm4 \n\t"\
> >>>>> +            "paddw     %%mm2, %%mm3 \n\t"\
> >>>>> +            "psraw     $1,    %%mm3 \n\t" /* d = (prev2[x] +
> >>>>> next2[x])>>1 */\
> >>>>> +            "movq      %%mm0, %[tmp0] \n\t" /* c */\
> >>>>> +            "movq      %%mm3, %[tmp1] \n\t" /* d */\
> >>>>> +            "movq      %%mm1, %[tmp2] \n\t" /* e */\
> >>>>> +            "psubw     %%mm4, %%mm2 \n\t"\
> >>>>> +            PABS(      %%mm4, %%mm2) /* temporal_diff0 */\
> >>>>> +            LOAD4("(%[prev],%[mrefs])", %%mm3) /* prev[x-refs] */\
> >>>>> +            LOAD4("(%[prev],%[prefs])", %%mm4) /* prev[x+refs] */\
> >>>>> +            "psubw     %%mm0, %%mm3 \n\t"\
> >>>>> +            "psubw     %%mm1, %%mm4 \n\t"\
> >>>>> +            PABS(      %%mm5, %%mm3)\
> >>>>> +            PABS(      %%mm5, %%mm4)\
> >>>>> +            "paddw     %%mm4, %%mm3 \n\t" /* temporal_diff1 */\
> >>>>> +            "psrlw     $1,    %%mm2 \n\t"\
> >>>>> +            "psrlw     $1,    %%mm3 \n\t"\
> >>>>> +            "pmaxsw    %%mm3, %%mm2 \n\t"\
> >>>>> +            LOAD4("(%[next],%[mrefs])", %%mm3) /* next[x-refs] */\
> >>>>> +            LOAD4("(%[next],%[prefs])", %%mm4) /* next[x+refs] */\
> >>>>> +            "psubw     %%mm0, %%mm3 \n\t"\
> >>>>> +            "psubw     %%mm1, %%mm4 \n\t"\
> >>>>> +            PABS(      %%mm5, %%mm3)\
> >>>>> +            PABS(      %%mm5, %%mm4)\
> >>>>> +            "paddw     %%mm4, %%mm3 \n\t" /* temporal_diff2 */\
> >>>>> +            "psrlw     $1,    %%mm3 \n\t"\
> >>>>> +            "pmaxsw    %%mm3, %%mm2 \n\t"\
> >>>>> +            "movq      %%mm2, %[tmp3] \n\t" /* diff */\
> >>>>                                   ^^^^^^
> >>>> code like this does not wor with all gcc versions, (it needs a equivalent
> >>>> of
> >>>> the NAMED_ASM_ARGS from mplayer
> >>> 
> >>> How should this be changed ?
> >>> Or can I just enable it for a specific gcc version to get it into svn
> >>> quickly ?
> >> 
> >> depends on if you are afraid of mans
> >> i wouldnt mind but mans will want a proper configure check i think
> >> 
> >> 
> >>> 
> >>>> [...]
> >>>>> +
> >>>>> +static void filter(AVFilterContext *ctx, AVFilterPicRef *dstpic,
> >>>>> +                   int parity, int tff)
> >>>>> +{
> >>>>> +    YADIFContext *yadif = ctx->priv;
> >>>>> +    int y, i;
> >>>>> +
> >>>>> +    for (i = 0; i<   3; i++) {
> >>>>> +        int is_chroma = !!i;
> >>>>> +        int w = dstpic->w>>   is_chroma;
> >>>>> +        int h = dstpic->h>>   is_chroma;
> >>>>> +        int refs = yadif->cur->linesize[i];
> >>>>> +
> >>>>> +        for (y = 0; y<   h; y++) {
> >>>>> +            if ((y ^ parity)&   1) {
> >>>>> +                uint8_t *prev =&yadif->prev->data[i][y*refs];
> >>>>> +                uint8_t *cur  =&yadif->cur ->data[i][y*refs];
> >>>>> +                uint8_t *next =&yadif->next->data[i][y*refs];
> >>>>> +                uint8_t *dst  =&dstpic->data[i][y*dstpic->linesize[i]];
> >>>>> +                yadif->filter_line(ctx, dst, prev, cur, next, w, refs,
> >>>>> parity ^ tff);
> >>>>> +            } else {
> >>>>> +                memcpy(&dstpic->data[i][y*dstpic->linesize[i]],
> >>>>> +&yadif->cur->data[i][y*refs], w);
> >>>> 
> >>>> switching from fast_memcpy() (that works with SIMD) to memcpy() possibly
> >>>> looses speed
> >>> 
> >>> Do we have a replacement ?
> >> 
> >> i dont think we do.
> >> 
> >> 
> >>> 
> >>>> [...]
> >>>>> +static int query_formats(AVFilterContext *ctx)
> >>>>> +{
> >>>>> +    static const enum PixelFormat pix_fmts[] = {
> >>>>> +        PIX_FMT_YUV420P,
> >>>>> +        PIX_FMT_GRAY8,
> >>>>> +        PIX_FMT_NONE
> >>>>> +    };
> >>>> 
> >>>> it should be trivial to support other yuv formats
> >>>> 
> >>>> other improvments that could be tried once its all in svn are to
> >>>> try to pass draw_slice() calls through from the source and maybe to
> >>>> pass get_video_buffer() through to avoid the every 2nd line memcpy.
> >>> 
> >>> Well, with the delay it might be tricky.
> >>> In short what's required before svn inclusion ?
> >> 
> >> cpuid + matching md5 to mplayer
> >> (id also like to see double frame output to work if possible but if thats
> >>  a problem we could look into it later)
> > 
> > I'd go for later :)
> > 
> >> and if fast_memcpy isnt ported we need a todo/fixme somewhere so we dont
> >> forget it
> > 
> > md5 matches, cpuid is a bit complicated to port since MM_FLAGS are defined in avcodec.h
> > 
> > Should memcpy be ported to libavutil ?
> 
> I'm not confident fastmemcpy is always faster than system memcpy.

iam not suggesting we always use it, but just when its faster, if here it
is faster it should be used

and if you think you can improve it (with additional temporality amount or
therwise), a patch on mplayer-dev is likely welcome

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Observe your enemies, for they first find out your faults. -- Antisthenes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100524/10979f3f/attachment.pgp>



More information about the ffmpeg-devel mailing list