[FFmpeg-devel] Sharing cuda context between transcode sessions to reduce initialization overhead

Tue Jun 13 00:33:12 EEST 2017

Am 12.06.2017 10:38 nachm. schrieb "Ganapathy Raman Kasi" <gkasi at nvidia.com
>:

Hi,

Currently incase of using 1 -> N transcode (1 SW decode -> N  NVENC
encodes) without HW upload filter, we end up allocating multiple Cuda
contexts for the N transcode sessions for the same underlying gpu device.
This comes with the cuda context initialization overhead. (~100 ms per
context creation with 4th gen i5 with GTX 1080 in ubuntu 16.04).  Also in
case of  M * (1->N) full HW accelerated transcode we face this issue where
the cuda context is not shared between the M transcode sessions. Sharing
the context would greatly reduce the initialization time which will matter
in case of short clip transcodes.

I currently have a global array in avutil/hwcontext_cuda.c which keeps
track of the cuda contexts created and reuses existing contexts when
request for hwdevice ctx create occurs. This is shared in the attached
patch. Please check the approach and let me know if there is better/cleaner
way to do this. Thanks

Global state in the libraries is something we absolutely try to stay away
from, so this approach is not quite appropriate.

If you want to somehow share this, it should be in the ffmpeg command line
tool somewhere, however we also try to reduce hardware specific magic in
favor of abstractions.

- Hendrik