[FFmpeg-devel] [PATCH V2] lavf: add transpose_opencl filter

Mark Thompson sw at jkqxz.net
Mon Dec 3 02:10:09 EET 2018

On 28/11/2018 02:27, Ruiling Song wrote:
> Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> ---
>  configure                         |   1 +
>  libavfilter/Makefile              |   1 +
>  libavfilter/allfilters.c          |   1 +
>  libavfilter/opencl/transpose.cl   |  35 +++++
>  libavfilter/opencl_source.h       |   1 +
>  libavfilter/transpose.h           |  34 +++++
>  libavfilter/vf_transpose.c        |  14 +-
>  libavfilter/vf_transpose_opencl.c | 288 ++++++++++++++++++++++++++++++++++++++
>  8 files changed, 362 insertions(+), 13 deletions(-)
>  create mode 100644 libavfilter/opencl/transpose.cl
>  create mode 100644 libavfilter/transpose.h
>  create mode 100644 libavfilter/vf_transpose_opencl.c

Testing the passthrough option here reveals a slightly unfortunate interaction with mapping - if this is the only filter in use, then not doing a redundant copy can fall over.

For example, on Rockchip (Mali) decoding with rkmpp then using:

-vf hwmap=derive_device=opencl,transpose_opencl=dir=clock:passthrough=landscape,hwdownload,format=nv12

fails at the download in the passthrough case because it doesn't allow the read (the extension does explicitly document this constraint - <https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_import_memory.txt>).

VAAPI has a similar problem with a decode followed by:

-vf hwmap=derive_device=opencl,transpose_opencl,hwmap=derive_device=vaapi:reverse=1

because the reverse mapping tries to replace the inlink hw_frames_ctx in a way which doesn't actually work.

All of these cases do of course work if anything else is in the way - any additional opencl filter on either side makes it work.  I think it's fine to ignore this (after all, the hwmap immediately followed by hwdownload case can already fail in the same way), but any thoughts you have on making that better are welcome.

>> Does the dependency on dir have any effect on speed here?  Any call is only ever
>> going to use one side of each of the dir cases, so it feels like it might be nicer to
>> hard-code that so they aren't included in the compiled code at all.
> For such memory bound OpenCL kernel, some little more arithmetic operation would not affect the overall performance.
> I did some more testing, and see no obvious performance difference for different 'dir' parameter. So I just keep it as now.

That makes sense, thank you for checking.

So, LGTM and applied.


- Mark

More information about the ffmpeg-devel mailing list