[FFmpeg-devel] [PATCH] libavutil/libavfilter:add opencl warpper and opencl deshake filter

Stefano Sabatini stefasab at gmail.com
Tue Mar 19 19:36:45 CET 2013


On date Monday 2013-03-18 11:37:58 +0800, Wei Gao encoded:
> Hi,
> Stefano Sabatini, thank you for your comments.And there are some questions
> and some explanations as follows:
> 
> Thanks
> Best regards
> 
> 2013/3/18 Stefano Sabatini <stefasab at gmail.com>
> 
> > On date Sunday 2013-03-10 22:20:33 +0800, Wei Gao encoded:
> > > Hi,
> > >
> > > Stefano, the attachment is the patch modified according to your comments.
> > [...]
> >
> > > From dd254c133be45ebe07cfd3ae0ebf14d9c6fa151f Mon Sep 17 00:00:00 2001
> > > From: highgod0401 <highgod0401 at gmail.com>
> > > Date: Sun, 10 Mar 2013 22:16:12 +0800
> > > Subject: [PATCH] add opencl warpper and opencl deshake filter
> >
> > wrapper
> >
> > >
> > > ---
> > >  configure                      |   5 +
> > >  libavfilter/Makefile           |   2 +
> > >  libavfilter/allfilters.c       |  21 +
> > >  libavfilter/deshake_kernel.h   | 201 ++++++++++
> > >  libavfilter/transform_opencl.c | 155 +++++++
> > >  libavfilter/transform_opencl.h |  40 ++
> > >  libavfilter/vf_deshake.c       | 222 ++++++++++
> > >  libavutil/Makefile             |   4 +
> > >  libavutil/opencl.c             | 892
> > +++++++++++++++++++++++++++++++++++++++++
> > >  libavutil/opencl.h             | 246 ++++++++++++
> >
> > Please split the patch in two distinct lavu and lavfi patches.
> >

> About this comment, you have answerd in previous mail, and you said they
> can be submitted in one patch because the deshake opencl patch must use the
> opencl wrapper code.

On the other hand I think smaller patches are better, since this will
simplify review (reviewing a small patches takes less time). Also for
detecting regressions it is better to have two distinct patches,
especially when the belong to two distinct libraries.

[...]
> > > +kernel void avfilter_transform(global  unsigned char *src,
> > > +                               global  unsigned char *dst,
> > > +                               global          float *matrix,
> > > +                               global          float *matrix2,
> > > +                                                 int interpolate,
> > > +                                                 int fillmethod,
> > > +                                                 int src_stride_lu,
> > > +                                                 int dst_stride_lu,
> > > +                                                 int src_stride_ch,
> > > +                                                 int dst_stride_ch,
> > > +                                                 int height,
> > > +                                                 int width,
> > > +                                                 int ch,
> > > +                                                 int cw)
> > > +{
> > > +     int global_id = get_global_id(0);
> > > +
> > > +     global unsigned char *dst_y = dst;
> > > +     global unsigned char *dst_u = dst_y + height * dst_stride_lu;
> > > +     global unsigned char *dst_v = dst_u + ch * dst_stride_ch;
> > > +
> > > +     global unsigned char *src_y = src;
> > > +     global unsigned char *src_u = src_y + height * src_stride_lu;
> > > +     global unsigned char *src_v = src_u + ch * src_stride_ch;
> > > +
> > > +     global unsigned char *tempdst;
> > > +     global unsigned char *tempsrc;
> >
> > sorry for the ignorance, what is "global" used for?
> >
> This is an OpenCL memory type defined in  "OpenCL Specification
> Version: 1.2 ",it
> defined that "A memory region accessible to all work-items executing in a
> context. It is accessible to the host using commands such as read, write
> and map"

Thanks for explaining.

[...]
> > > +static int filter_frame_opencl(AVFilterLink *link, AVFilterBufferRef
> > *in)
> > > +{
> > > +    DeshakeContext *deshake = link->dst->priv;
> > > +    AVFilterLink *outlink = link->dst->outputs[0];
> > > +    AVFilterBufferRef *out;
> >
> > Uhm this needs to be updated after the recent merge (you should
> > directly make use of AVFrame).
> >
> should I need to get the latest git version and merge my code to it then
> submit it?

This is usually a good idea, otherwise the committer will have to do
the work. Fortunately the changes required for the update should be
minimal.

[...]
> > > +typedef struct OpenclErrorMsg {
> > > +    int err_code;
> > > +    const char *err_str;
> > > +}OpenclErrorMsg;
> > > +
> > > +static OpenclErrorMsg opencl_err_msg[] = {
> > > +        {CL_DEVICE_NOT_FOUND,                               "DEVICE NOT
> > FOUND"},
> > > +        {CL_DEVICE_NOT_AVAILABLE,                           "DEVICE NOT
> > AVAILABLE"},
> > > +        {CL_COMPILER_NOT_AVAILABLE,                         "COMPILER
> > NOT AVAILABLE"},
> > > +        {CL_MEM_OBJECT_ALLOCATION_FAILURE,                  "MEM OBJECT
> > ALLOCATION FAILURE"},
> > > +        {CL_OUT_OF_RESOURCES,                               "OUT OF
> > RESOURCES"},
> > > +        {CL_OUT_OF_HOST_MEMORY,                             "OUT OF
> > HOST MEMORY"},
> > > +        {CL_PROFILING_INFO_NOT_AVAILABLE,                   "PROFILING
> > INFO NOT AVAILABLE"},
> > > +        {CL_MEM_COPY_OVERLAP,                               "MEM COPY
> > OVERLAP"},
> > > +        {CL_IMAGE_FORMAT_MISMATCH,                          "IMAGE
> > FORMAT MISMATCH"},
> > > +        {CL_IMAGE_FORMAT_NOT_SUPPORTED,                     "IMAGE
> > FORMAT NOT_SUPPORTED"},
> > > +        {CL_BUILD_PROGRAM_FAILURE,                          "BUILD
> > PROGRAM FAILURE"},
> > > +        {CL_MAP_FAILURE,                                    "MAP
> > FAILURE"},
> > > +        {CL_MISALIGNED_SUB_BUFFER_OFFSET,                   "MISALIGNED
> > SUB BUFFER OFFSET"},
> > > +        {CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST,      "EXEC
> > STATUS ERROR FOR EVENTS IN WAIT LIST"},
> > > +        {CL_COMPILE_PROGRAM_FAILURE,                        "COMPILE
> > PROGRAM FAILURE"},
> > > +        {CL_LINKER_NOT_AVAILABLE,                           "LINKER NOT
> > AVAILABLE"},
> > > +        {CL_LINK_PROGRAM_FAILURE,                           "LINK
> > PROGRAM FAILURE"},
> > > +        {CL_DEVICE_PARTITION_FAILED,                        "DEVICE
> > PARTITION FAILED"},
> > > +        {CL_KERNEL_ARG_INFO_NOT_AVAILABLE,                  "KERNEL ARG
> > INFO NOT AVAILABLE"},
> > > +        {CL_INVALID_VALUE,                                  "INVALID
> > VALUE"},
> > > +        {CL_INVALID_DEVICE_TYPE,                            "INVALID
> > DEVICE TYPE"},
> > > +        {CL_INVALID_PLATFORM,                               "INVALID
> > PLATFORM"},
> > > +        {CL_INVALID_DEVICE,                                 "INVALID
> > DEVICE"},
> > > +        {CL_INVALID_CONTEXT,                                "INVALID
> > CONTEXT"},
> > > +        {CL_INVALID_QUEUE_PROPERTIES,                       "INVALID
> > QUEUE PROPERTIES"},
> > > +        {CL_INVALID_COMMAND_QUEUE,                          "INVALID
> > COMMAND QUEUE"},
> > > +        {CL_INVALID_HOST_PTR,                               "INVALID
> > HOST PTR"},
> > > +        {CL_INVALID_MEM_OBJECT,                             "INVALID
> > MEM OBJECT"},
> > > +        {CL_INVALID_IMAGE_FORMAT_DESCRIPTOR,                "INVALID
> > IMAGE FORMAT DESCRIPTOR"},
> > > +        {CL_INVALID_IMAGE_SIZE,                             "INVALID
> > IMAGE SIZE"},
> > > +        {CL_INVALID_SAMPLER,                                "INVALID
> > SAMPLER"},
> > > +        {CL_INVALID_BINARY,                                 "INVALID
> > BINARY"},
> > > +        {CL_INVALID_BUILD_OPTIONS,                          "INVALID
> > BUILD OPTIONS"},
> > > +        {CL_INVALID_PROGRAM,                                "INVALID
> > PROGRAM"},
> > > +        {CL_INVALID_PROGRAM_EXECUTABLE,                     "INVALID
> > PROGRAM EXECUTABLE"},
> > > +        {CL_INVALID_KERNEL_NAME,                            "INVALID
> > KERNEL NAME"},
> > > +        {CL_INVALID_KERNEL_DEFINITION,                      "INVALID
> > KERNEL DEFINITION"},
> > > +        {CL_INVALID_KERNEL,                                 "INVALID
> > KERNEL"},
> > > +        {CL_INVALID_ARG_INDEX,                              "INVALID
> > ARG INDEX"},
> > > +        {CL_INVALID_ARG_VALUE,                              "INVALID
> > ARG VALUE"},
> > > +        {CL_INVALID_ARG_SIZE,                               "INVALID
> > ARG_SIZE"},
> > > +        {CL_INVALID_KERNEL_ARGS,                            "INVALID
> > KERNEL ARGS"},
> > > +        {CL_INVALID_WORK_DIMENSION,                         "INVALID
> > WORK DIMENSION"},
> > > +        {CL_INVALID_WORK_GROUP_SIZE,                        "INVALID
> > WORK GROUP SIZE"},
> > > +        {CL_INVALID_WORK_ITEM_SIZE,                         "INVALID
> > WORK ITEM SIZE"},
> > > +        {CL_INVALID_GLOBAL_OFFSET,                          "INVALID
> > GLOBAL OFFSET"},
> > > +        {CL_INVALID_EVENT_WAIT_LIST,                        "INVALID
> > EVENT WAIT LIST"},
> > > +        {CL_INVALID_EVENT,                                  "INVALID
> > EVENT"},
> > > +        {CL_INVALID_OPERATION,                              "INVALID
> > OPERATION"},
> > > +        {CL_INVALID_GL_OBJECT,                              "INVALID GL
> > OBJECT"},
> > > +        {CL_INVALID_BUFFER_SIZE,                            "INVALID
> > BUFFER SIZE"},
> > > +        {CL_INVALID_MIP_LEVEL,                              "INVALID
> > MIP LEVEL"},
> > > +        {CL_INVALID_GLOBAL_WORK_SIZE,                       "INVALID
> > GLOBAL WORK SIZE"},
> > > +        {CL_INVALID_PROPERTY,                               "INVALID
> > PROPERTY"},
> > > +        {CL_INVALID_IMAGE_DESCRIPTOR,                       "INVALID
> > IMAGE DESCRIPTOR"},
> > > +        {CL_INVALID_COMPILER_OPTIONS,                       "INVALID
> > COMPILER OPTIONS"},
> > > +        {CL_INVALID_LINKER_OPTIONS,                         "INVALID
> > LINKER OPTIONS"},
> > > +        {CL_INVALID_DEVICE_PARTITION_COUNT,                 "INVALID
> > DEVICE PARTITION COUNT"},
> >
> > This is going to be a pain to maintain. Rather return a generic error
> > message (OpenCL error with code %d occurred) in the code.
> >
> > Ideally the library should provide error string utilities (e.g. like
> > the ones in libavutil/error.h) so that it is not required to keep them
> > in sync in each and every single one OpenCL application.
> >
> These error code are defined in OpenCL Specification Version: 1.2, and they
> will not changed frequency,so it may easy to maintain.And in your pervious
> mail, you have suggested that:
>  "Something like this should be good:
> if (status != CL_SUCCESS) {
>     av_log(&openclwrapperutils, AV_LOG_ERROR, "Error creating OpenCL
> kernel: %s", opencl_errstr(status));"
> I also agree that the string is a better way.

I leave the choice to you. But in a global perspective it would be
better if the OpenCL implementation supported this (think if all C
application had to implement their own version of strerror_r()).

Do you know where it is possible to file feature requests related to
OpenCL?

[...]
> > > +++ b/libavutil/opencl.h
> > > @@ -0,0 +1,246 @@
> > > +/*
> > > + * Copyright (C) 2012 Peng Gao <peng at multicorewareinc.com>
> > > + * Copyright (C) 2012 Li   Cao <li at multicorewareinc.com>
> > > + * Copyright (C) 2012 Wei  Gao <weigao at multicorewareinc.com>
> > > + *
> > > + * This file is part of FFmpeg.
> > > + *
> > > + * FFmpeg is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU Lesser General Public
> > > + * License as published by the Free Software Foundation; either
> > > + * version 2.1 of the License, or (at your option) any later version.
> > > + *
> > > + * FFmpeg is distributed in the hope that it will be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > + * Lesser General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU Lesser General Public
> > > + * License along with FFmpeg; if not, write to the Free Software
> > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> > 02110-1301 USA
> > > + */
> > > +
> > > +#include "config.h"
> > > +
> > > +#ifndef LIBAVUTIL_OPENCLWRAPPER_H
> > > +#define LIBAVUTIL_OPENCLWRAPPER_H
> > > +
> > > +#include <CL/cl.h>
> > > +
> >
> > > +#define AV_OPENCL_CHECK(method, ...)\
> > > +    status = method( __VA_ARGS__ ); if( status != CL_SUCCESS ) {\
> > > +        av_log(NULL,AV_LOG_ERROR, " error %s %d\n", # method, status );
> >  return status; }
> > > +
> > > +#define AV_OPENCL_SET_KERNEL_ARG(arg_ptr)\
> > > +    status =
> > clSetKernelArg((kernel),(arg_no++),(sizeof(arg_ptr)),(void*)(&(arg_ptr)));if(
> > status != CL_SUCCESS ) {\
> > > +        av_log(NULL,AV_LOG_ERROR, " error %s %d\n", "clSetKernelArg",
> > status );  return status; }
> >
> > macros assuming a certain variable in the context are not acceptable
> > in a public header. Please move them to the place where they are
> > defined, they can be changed later if the need arises.
> >

> every opencl filer maybe need this to easy write code and check error
> code,if move to the filter file, every opencl filter may define them.

The principle is that we should avoid to overdesign things. Since they
are used only in one file, I think it's better to define them
there. Then we can move the macros to a common header when the need
arises.

My problem with these macros is that they rely on the presence of
specific variables ("status", "kernel") so this is not very
robust/handy to be included in a public header. For example the user
may have already a "status" variable used for an entirely different
purpose.

> >
> > > +
> > > +
> > > +#define FF_OPENCL_KERNEL( ... )# __VA_ARGS__
> >
> > FF_ if reserved for internal API, use AV_
> >
> > > +
> > > +typedef struct AVOpenCLKernelEnv {
> > > +    cl_context context;
> > > +    cl_command_queue command_queue;
> > > +    cl_program program;
> > > +    cl_kernel kernel;
> > > +    char kernel_name[150];
> > > +}AVOpenCLKernelEnv;
> > > +
> > > +typedef struct AVOpenCLExternalInfo {
> > > +    cl_platform_id platform;
> > > +    cl_device_type device_type;
> > > +    cl_context context;
> >
> > > +    cl_device_id *devices_id;
> > > +    cl_device_id  dev;
> >
> > device_ids
> > device_id
> >
> > may be better names
> >
> > > +    cl_command_queue command_queue;
> > > +    char *platform_name;
> > > +}AVOpenCLExternalInfo;
> > > +
> > > +/**
> > > + * user defined, this is function wrapper which is used to set the
> > input parameters.
> > > + * launch kernel and copy data from GPU to CPU or CPU to GPU.
> > > + */
> >
> > User defined function, used to set the input parameter in the kernel
> > environment. This function launches kernel and copies data from GPU to
> > CPU, or from CPU to GPU.
> >
> > ...
> >
> > I'm not yet sure what "to launch kernel" means.
> >

> The program will use it to run a kernel the user data is the data witch
> will be trasform to GPU to compute and the OpenCL kernel info is
> in AVOpenCLKernelEnv *kenv

May be "Load kernel" a better description?

[...]
> > > +
> > > +/**
> > > + *  Save OpenCL buffer as share buffer.
> >
> > what is a share buffer?
> >

> It is an opencl buffer that avoid the copy data between GPU and CPU, the
> opencl filter will detect whether the next filter is opencl fiter, if yes,
> it will save the opencl buffer and don't copy the processed data to memory,
> and the next filter will detect whether the previous data is opencl filter,
> if yes, it will get the buffer porinter and use the data directly.So it can
> avoid the copy operation GPU-CPU-GPU, and it can save much time.I want to
> remove it in this patch because there is only one opencl filter(deshake),
> If the wrapper patch is accepted, I will add it and submit the sacle
> filter,is it OK?

Sure, it's best to start with smaller patches and add pieces as we go.

About OpenCL buffers, we have a framework for working with H/W
buffers, but I need someone who has more experience than me to comment
on it.
 
> >
> > > + *
> > > + *@param filtername                  filter name
> > > + *@param cl_inbuf                     OpenCL buffer
> > > + */
> > > +
> > > +int av_opencl_save_buffer(const char *filtername,void *cl_inbuf,int
> > buf_size);
> > > +
> > > +/**
> > > + *  Get the OpenCL share buffer.
> > > + *
> > > + *@param filtername                  filter name
> > > + */
> > > +
> > > +void *av_opencl_get_buffer(const char *filtername,int buf_size);
> > > +

> > > +/**
> > > + *  Regist kernels.
> >
> > Register kernels? kernel?
> >
> > > + *
> > > + *@param kernel_name                  Regist kernel name
> > > + *@param kernel_code                   Kernel code
> > > + */
> > > +
> > > +void av_opencl_regist_kernel(const char *kernel_name,const char
> > *kernel_code);
> >
> > av_opencl_register_kernel, characters are cheap this days.
> >
> > Sorry, I don't get what your mean, Do you mean that I should update the
> latest version?

I mean that there is a mismatch between the doxy (register kernels)
and the function name (register_kernel), and I can't say what's more
correct.

> 
> > Also this will plainly crash unless you perform some operations
> > before, which are not documented.
> >

> Sorry, I don't get what your mean.Do you mean that I should add some code
> to make sure the function is run correct?

Ideally yes, or you specify which are the conditions which allow to
use the function. Ideally reading the docs should be enough to get how
it works, and it should be almost impossible for the user to get it
wrong.
-- 
FFmpeg = Faithful and Free Mastering Proud Ecumenical Guru


More information about the ffmpeg-devel mailing list