[FFmpeg-devel] [PATCH 6/6] ffv1enc_vulkan: switch to receive_packet

Sun Nov 24 05:41:48 EET 2024

On 11/23/24 23:10, Jerome Martinez wrote:

> Le 23/11/2024 à 20:58, Lynne via ffmpeg-devel a écrit :
>> This allows the encoder to fully saturate all queues the GPU
>> has, giving a good 10% in certain cases and resolutions.
>
>
> Using a RTX 4070:
> +50% (!!!) with 2K 10-bit content.
> +17% with 4K 16-bit content.
> Also the speed with 2K content is now 4x the speed of 4K content which 
> is similar to the SW encoder (with similar count of slices) and which 
> is the expected result, it seems that a bottleneck with smaller 
> resolutions is removed.
>
>
> Unfortunatly, it has a drawback, a 6K5K content which was well handled 
> without this patch is now having an immediate error:
> [vost#0:0/ffv1_vulkan @ 0x10467840] [enc:ffv1_vulkan @ 0x12c011c0] 
> Error submitting video frame to the encoder
> [vost#0:0/ffv1_vulkan @ 0x10467840] [enc:ffv1_vulkan @ 0x12c011c0] 
> Error encoding a frame: Cannot allocate memory
> [vost#0:0/ffv1_vulkan @ 0x10467840] Task finished with error code: -12 
> (Cannot allocate memory)
> [vost#0:0/ffv1_vulkan @ 0x10467840] Terminating thread with return 
> code -12 (Cannot allocate memory)
>
> Which is a problem, the handling of 6K5K being good on the RTX 4070 
> (3x faster than a CPU at the same price) before this patch.
> Is it possible to keep the handling of bigger resolutions on such card 
> while keeping the performance boost of this patch?

To an extent. At high resolutions, -async_depth 0 (maximum) harms 
performance for higher

resolution. I get the best results with it set to 2 or 3 for 6k content, 
on my odd setup.

Increasing async_depth increases the amount of VRAM used, so that's the 
tradeoff.

Automatically detecting it is difficult, as Vulkan doesn't give you 
metrics on how much free

VRAM there is, so there's nothing we can do than to document it and hope 
users follow

the instructions in case they run out of memory.

The good news is that -async_depth 1 uses less VRAM than before this patch.

Must of the VRAM used is from somewhere within Nvidia's black-box 
driver, as RADV

uses 1/3rd of the VRAM at the same content and async_depth settings. 
Nothing we

can do about this too.

>> This also improves error resilience if an allocation fails,
>> and properly cleans up after itself if it does.
>
> Looks like that this part does not work, still a freeze if an 
> allocation fails.

This is due to Nvidia's drivers. If you switch to using their GSP

firmware, recovery is instant, pretty much.