[FFmpeg-devel] [PATCH] lavfi: edgedetect filter

Clément Bœsch ubitux at gmail.com
Wed Aug 8 14:12:02 CEST 2012


On Wed, Aug 08, 2012 at 12:55:15PM +0200, Stefano Sabatini wrote:
[...]
> > +static void gaussian_blur(AVFilterContext *ctx, int w, int h,
> > +                                uint8_t *dst, int dst_linesize,
> > +                          const uint8_t *src, int src_linesize)
> > +{
> > +    int i, j;
> > +
> > +    memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
> > +    memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
> > +    for (j = 2; j < h - 2; j++) {
> > +        dst[0] = src[0];
> > +        dst[1] = src[1];
> > +        for (i = 2; i < w - 2; i++) {
> > +            dst[i] = ((src[-2*src_linesize + i-2] + src[2*src_linesize + i-2]) * 2
> > +                    + (src[-2*src_linesize + i-1] + src[2*src_linesize + i-1]) * 4
> > +                    + (src[-2*src_linesize + i  ] + src[2*src_linesize + i  ]) * 5
> > +                    + (src[-2*src_linesize + i+1] + src[2*src_linesize + i+1]) * 4
> > +                    + (src[-2*src_linesize + i+2] + src[2*src_linesize + i+2]) * 2
> > +
> > +                    + (src[  -src_linesize + i-2] + src[  src_linesize + i-2]) *  4
> > +                    + (src[  -src_linesize + i-1] + src[  src_linesize + i-1]) *  9
> > +                    + (src[  -src_linesize + i  ] + src[  src_linesize + i  ]) * 12
> > +                    + (src[  -src_linesize + i+1] + src[  src_linesize + i+1]) *  9
> > +                    + (src[  -src_linesize + i+2] + src[  src_linesize + i+2]) *  4
> > +
> > +                    + src[i-2] *  5
> > +                    + src[i-1] * 12
> > +                    + src[i  ] * 15
> > +                    + src[i+1] * 12
> > +                    + src[i+2] *  5) / 159;
> > +        }
> 
> My feeling is that we should avoid to hardcode convolution operations,
> and write generic code for it. Also we may want to make the size of
> the gaussian mask parametric, as well as the sigma parameter used to
> compute the mask.
> 
> Convolution operations are useful per-se, and could be used to
> implement ad-hoc filters.
> 

I've just used the standard matrix for that algorithm; from a performance
point of view it also has some benefits. Note that it is relatively
trivial to write so I'm not sure such generic code would be required.
Maybe we could macro-generate various version of that but I'm not sure
that's really important in that case.

This step really is just to quickly get rid of some noise.

Do we have already use these in some other filters?

> > +        dst[i    ] = src[i    ];
> > +        dst[i + 1] = src[i + 1];
> > +
> > +        dst += dst_linesize;
> > +        src += src_linesize;
> > +    }
> > +    memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
> > +    memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
> > +}
> > +
> > +enum {
> > +    DIRECTION_45UP,
> > +    DIRECTION_45DOWN,
> > +    DIRECTION_HORIZONTAL,
> > +    DIRECTION_VERTICAL,
> > +};
> > +
> > +static int get_rounded_direction(int gx, int gy)
> > +{
> > +    /* reference angles:
> > +     *   tan( pi/8) = sqrt(2)-1
> > +     *   tan(3pi/8) = sqrt(2)+1
> > +     * Gy/Gx is the tangent of the angle (theta), so Gy/Gx is compared against
> > +     * <ref-angle>, or more simply Gy against <ref-angle>*Gx
> > +     *
> > +     * Gx and Gy bounds = [1020;1020], using 16-bit arith:
> > +     *   round((sqrt(2)-1) * (1<<16)) =  27146
> > +     *   round((sqrt(2)+1) * (1<<16)) = 158218
> > +     */
> > +    if (gx) {
> > +        int tanpi8gx, tan3pi8gx;
> > +
> > +        if (gx < 0)
> > +            gx = -gx, gy = -gy;
> > +        gy <<= 16;
> > +        tanpi8gx  =  27145 * gx;

(Note: 27145 changed locally to 27146, test unchanged)

> > +        tan3pi8gx = 158218 * gx;
> > +        if (gy > -tan3pi8gx && gy < -tanpi8gx)  return DIRECTION_45UP;
> > +        if (gy > -tanpi8gx  && gy <  tanpi8gx)  return DIRECTION_HORIZONTAL;
> > +        if (gy >  tanpi8gx  && gy <  tan3pi8gx) return DIRECTION_45DOWN;
> > +    }
> > +    return DIRECTION_VERTICAL;
> > +}
> > +
> 
> > +static void sobel(AVFilterContext *ctx, int w, int h,
> > +                        uint16_t *dst, int dst_linesize,
> > +                  const uint8_t  *src, int src_linesize)
> > +{
> > +    int i, j;
> > +    EdgeDetectContext *edgedetect = ctx->priv;
> > +
> > +    for (j = 1; j < h - 1; j++) {
> > +        dst += dst_linesize;
> > +        src += src_linesize;
> > +        for (i = 1; i < w - 1; i++) {
> > +            const int gx =
> > +                -1*src[-src_linesize + i-1] + 1*src[-src_linesize + i+1]
> > +                -2*src[                i-1] + 2*src[                i+1]
> > +                -1*src[ src_linesize + i-1] + 1*src[ src_linesize + i+1];
> > +            const int gy =
> > +                -1*src[-src_linesize + i-1] + 1*src[ src_linesize + i-1]
> > +                -2*src[-src_linesize + i  ] + 2*src[ src_linesize + i  ]
> > +                -1*src[-src_linesize + i+1] + 1*src[ src_linesize + i+1];
> > +
> > +            dst[i] = FFABS(gx) + FFABS(gy);
> > +            edgedetect->directions[j*w + i] = get_rounded_direction(gx, gy);
> > +        }
> > +    }
> > +}
> 
> Same consideration as above.
> 

I'm not sure what you could really generalize here since I'm also
computing the directions based on this.

> > +
> > +static void non_maximum_suppression(AVFilterContext *ctx, int w, int h,
> > +                                          uint8_t  *dst, int dst_linesize,
> > +                                    const uint16_t *src, int src_linesize)
> > +{
> > +    int i, j;
> > +    EdgeDetectContext *edgedetect = ctx->priv;
> > +
> > +#define COPY_MAXIMA(ay, ax, by, bx) do {                \
> > +    if (src[i] > src[(ay)*src_linesize + i+(ax)] &&     \
> > +        src[i] > src[(by)*src_linesize + i+(bx)])       \
> > +        dst[i] = av_clip_uint8(src[i]);                 \
> > +} while (0)
> > +
> > +    for (j = 1; j < h - 1; j++) {
> > +        dst += dst_linesize;
> > +        src += src_linesize;
> > +        for (i = 1; i < w - 1; i++) {
> > +            switch (edgedetect->directions[j*w + i]) {
> > +            case DIRECTION_45UP:        COPY_MAXIMA( 1, -1, -1,  1); break;
> > +            case DIRECTION_45DOWN:      COPY_MAXIMA(-1, -1,  1,  1); break;
> > +            case DIRECTION_HORIZONTAL:  COPY_MAXIMA( 0, -1,  0,  1); break;
> > +            case DIRECTION_VERTICAL:    COPY_MAXIMA(-1,  0,  1,  0); break;
> > +            }
> > +        }
> > +    }
> > +}
> > +
> > +static void double_threshold(AVFilterContext *ctx, int w, int h,
> > +                                   uint8_t *dst, int dst_linesize,
> > +                             const uint8_t *src, int src_linesize)
> > +{
> > +    int i, j;
> > +
> > +#define THRES_HIGH 80
> > +#define THRES_LOW  20
> 
> This values should be made parametric (and expressed as float values
> in the [0, 1] range).
> 

Yes, but right now the filter is called "edgedetect" and only implements
the canny algo. If we were to add more algo, we would end up with various
mode such as vf edgedetect=canny, or edgedetect=marrhildreth or such. So I
wasn't confident about making it customizable yet.

This parametrization can be done later without breaking the usage, that
really was the main point.

[...]
> > +static int null_draw_slice(AVFilterLink *inlink, int y, int h, int slice_dir) { return 0; }
> > +
> > +AVFilter avfilter_vf_edgedetect = {
> > +    .name          = "edgedetect",
> > +    .description   = NULL_IF_CONFIG_SMALL("Detect and draw edge."),
> > +    .priv_size     = sizeof(EdgeDetectContext),
> > +    .uninit        = uninit,
> > +    .query_formats = query_formats,
> > +
> 
> > +    .inputs    = (const AVFilterPad[]) {{ .name             = "default",
> > +                                          .type             = AVMEDIA_TYPE_VIDEO,
> > +                                          .draw_slice       = null_draw_slice,
> > +                                          .config_props     = config_props,
> > +                                          .end_frame        = end_frame,
> > +                                          .min_perms        = AV_PERM_READ
> > +                                        },
> > +                                        { .name = NULL }
> 
> Nit+: weird indent, I would prefer:
>    .inputs    = (const AVFilterPad[]) {
>       {
>           .name             = "default",
>           .type             = AVMEDIA_TYPE_VIDEO,
>           .draw_slice       = null_draw_slice,
>           .config_props     = config_props,
>           .end_frame        = end_frame,
>           .min_perms        = AV_PERM_READ
>        },
>        { .name = NULL }
>    }
> 

Will change before pushing.

> [...]
> 
> Another consideration: we could optionally support faster less
> accurate algorithms (e.g. Marr-Hildreth).

Feel free to implement it. I was just having fun with lavfi.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120808/8f1d873f/attachment.asc>


More information about the ffmpeg-devel mailing list