[Libav-user] Integrating CUDA-based video decoder into libavcodec/ffmpeg

Tim Lenertz t.lenertz at intopix.com
Mon Feb 4 11:16:35 EET 2019

I have a CUDA-based decoder of a video format running on the GPU. I am 
trying to add a "codec" into libavcodec that uses it as external 
decoder. It is for a fork of ffmpeg that is for internal use only (to 
demonstrate the decoder).

Currenty, I have it working such that I can play a sequence of pictures 
using ffplay, which get decoded on the GPU with the external decoder.

But with the current implementation, the codec module copies its output 
(in a RGB24 pixel format) from GPU memory to host memory after each 
frame, and gives this to libavcodec in its AVFrame. So with this when 
using ffplay, it will copy the output images back and forth between GPU 
and host two times (as ffplay has to copy the data to GPU for display).

My goal is to leave the uncompressed data on GPU using on a CUDA device 
buffer, and have ffmpeg use it.

ffmpeg seems to have support for this using AVHWAccel. I have a few 
questions on how to implement this into libavcodec:

* Is there any example implementation that uses this with a CUDA based 
decoder (not using the dedicated hardware decoders through NVDEC, CUVID, 

* Does ffmpeg need the output in a pixel format in a CUDA buffer, or can 
it also be in texture memory, in a CUDA array?

* Is it possible to have the hardware decoder as primary decoder of the 
AVCodec. It seems that hardware-acceleration is foreseen as an add-on, 
with the software decoder implemented by AVCodec available as fallback?

* It seems that ffmpeg will allocate a pool of CUDA buffers to receive 
its output. Is it also possible to allocate the output buffers oneself 
in the module's implementation, and control how many buffers there will be.

* Is it possible to control with how many CPU threads the decoder will 
be called? With the external decoder's interface, ideal would be one 
writer thread that pushes compressed codestreams, and one reader thread 
that pulls the uncompressed output to a CUDA buffer.

More information about the Libav-user mailing list