[FFmpeg-devel] [PATCH] (for discussion): cuvid: allow to crop and resize in decoder

Mark Thompson sw at jkqxz.net
Sat Mar 4 23:44:27 EET 2017

On 01/03/17 10:58, Timo Rothenpieler wrote:
>>> We recently just had all sorts of discussions what decoders should and
>>> should not do, I don't think scaling in a decoder is a good thing to
>>> start doing here.
>> scaling in some decoders is mandated by some specs
>> some standards support reduced resolution which can switch from frame
>> to frame without the decoder output changing
>> There is also the possiblity of scalability where the reference stream
>> has lower resolution IIRC.
>> This is kind of different of course but, scaling code in decoders is
>> part of some specifications.
> Would like to bring this back up.
> I'd like to merge this, as specially the scaling is freely done by the
> video asic, offering a possibility to scale without requiring non-free
> libnpp. And cropping so far is not possible at all.
> Yes, scaling and cropping is not something a decoder usually does, but
> it exposes a hardware feature that has no other way of accessing it,
> which offers valuable functionality to users.

To offer an alternative approach to this:

* Make a new CUVID hwcontext implementation - each frame in it consists of some decode parameters (including input bitstream) and a reference to a decoder instance.

* The CUVID decoder in lavc would create a decoder instance, but when asked to decode a packet it would a new CUVID frame with the appropriate decoding parameters attached to it and returns that.

* CUVID scale/crop/deinterlace filters could then be written which just tag the frame with the appropriate transformation to happen later.

* The decoder then actually runs when you try to get the frame data - either by mapping to CUDA (av_hwframe_map() / vf_hwmap) or actually downloading the frame to system memory (av_hwframe_transfer_data() / vf_hwdownload).

Now, while this has rather nice outward behaviour in having the API work like all other hwcontext implementations, it also has a number of difficulties:

* It's even less clear how to get asynchronicity for performance than it is now - decodes are only issued when you try to use the output, so pretty much all overlap possibilities are lost.  Maybe that could be avoided by adding some sort of "crystallise frame" call to hwcontext, but it's still somewhat clumsy.

* The decoder has to be able to determine the intrinsic delay of the stream in advance, because it can't output a frame until it will definitely be decodable without more packets on the input (av_hwframe_transfer_data() can't return AVERROR(EAGAIN) to indicate that you should supply more data with avcodec_send_packet()).

* The non-native output formats of the decoder in lavc (i.e. all current ones - system memory and CUDA) become unwanted, but compatibility would force them to continue to exist as some sort of auto-download setup.  (ffmpeg.c wouldn't use it - the download would happen there (or not) like it does with the true hwaccels, since like them the decoder doesn't actually support system memory or even CUDA frame output without copying at all.)

* This multiple-library approach putting the decoder in lavu might be regarded as madness.

Not really advocating this solution exactly (I rather agree with the final point above), but I think something like this should be considered so that CUVID doesn't end up behaving entirely differently to all other decoders in this respect.

- Mark

More information about the ffmpeg-devel mailing list