[FFmpeg-devel] Development of a CUDA accelerated variant of the libav vf_tonemap
Lynne
dev at lynne.ee
Tue Jan 12 06:55:44 EET 2021
Jan 11, 2021, 23:27 by felix.leclair123 at hotmail.com:
> Hi guys and gals, first post on this mailing list, apologies for any formatting/stylistic snafus
>
> TLDR; we currently have tone mapping filters (typically used to map content from a 10bit HDR source to an 8bit SDR output) that are done on CPU with Zscale from Zlib, or hardware implementations using VAAPI or OpenCL. Having a version implemented in CUDA would round out the main HWaccels types.
>
> Context:
> I'm a computer engineering student up in Canada with an interest in high efficiency distributed processing. As a personal project I'm trying to build a cluster of Nvidia Jetson Nano's to be able to handle a few dozen streams (mix of SD, HD, FHD, UHD, 4kHDR) at once while drawing south of 100W at peak. These little devices can do anywhere from 1 to 9 streams of content at a time depending on resolution/framerate in hardware in any mix of HEVC or H.264, so 3 of them should get me most of the way to where I want to go (this would be a 30W package capable of ~12 2160p30 at 10 bit -> 1080p30 8bit streams).
>
> The issue is that, 4 little arm64 cores are just not going to be able to tonemap using Zscale in real time, even with the encoder and decoders sharing memory with the CPU (so no PCIe memcopy penalty). On the other hand, the built in GPU and the relative simplicity of most tone mapping algorithms (say hable) should make quick work of this. Unfortunately (or fortunately for me to learn with?) there isn't a CUDA version of the filter.
>
> Question/guidance:
> I've read through the doc on how to write filters, as well as looking at the other cuda filters currently in the source and have a general idea of where I'm going, but haven't been able to fully nail down how to access frames from hwupload_cuda passed to vf_tonemap_cuda.c which in turn passes that frame to vf_tonemap_cuda.cu for processing. I have a repo with everything I've been pulling together for my project, but the piece of interest is under */cuda_filter/ in the source tree. <https://github.com/Camofelix/Jetson_ffmpeg_trancode_cluster/>
>
> Would anyone mind helping me out with how to architect this?
>
The tonemap filter is just a (very old by now) copy of libplacebo's tonemapping.
No one has bothered to keep it in sync.
I'm working on a libplacebo wrapper currently, so once that's merged there
will be up to date hardware tonemapping.
More information about the ffmpeg-devel
mailing list