[FFmpeg-devel] [PATCH 11/13] ffv1dec: reference the current packet into the main context

Thu Mar 13 03:56:26 EET 2025

On 13/03/2025 02:24, Andreas Rheinhardt wrote:
> Lynne:
>> On 10/03/2025 18:42, Lynne wrote:
>>> On 10/03/2025 04:14, Andreas Rheinhardt wrote:
>>>> Lynne:
>>>>> ---
>>>>>    libavcodec/ffv1.h    |  3 +++
>>>>>    libavcodec/ffv1dec.c | 19 +++++++++++++++++--
>>>>>    2 files changed, 20 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h
>>>>> index 8c0e71284d..860a5c14b1 100644
>>>>> --- a/libavcodec/ffv1.h
>>>>> +++ b/libavcodec/ffv1.h
>>>>> @@ -174,6 +174,9 @@ typedef struct FFV1Context {
>>>>>         * NOT shared between frame threads.
>>>>>         */
>>>>>        uint8_t           frame_damaged;
>>>>> +
>>>>> +    /* Reference to the current packet */
>>>>> +    AVPacket *pkt_ref;
>>>>>    } FFV1Context;
>>>>>    int ff_ffv1_common_init(AVCodecContext *avctx, FFV1Context *s);
>>>>> diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c
>>>>> index eaa21eebdf..6396f22f79 100644
>>>>> --- a/libavcodec/ffv1dec.c
>>>>> +++ b/libavcodec/ffv1dec.c
>>>>> @@ -469,6 +469,10 @@ static av_cold int decode_init(AVCodecContext
>>>>> *avctx)
>>>>>        f->pix_fmt = AV_PIX_FMT_NONE;
>>>>>        f->configured_pix_fmt = AV_PIX_FMT_NONE;
>>>>> +    f->pkt_ref = av_packet_alloc();
>>>>> +    if (!f->pkt_ref)
>>>>> +        return AVERROR(ENOMEM);
>>>>> +
>>>>>        if ((ret = ff_ffv1_common_init(avctx, f)) < 0)
>>>>>            return ret;
>>>>> @@ -701,6 +705,10 @@ static int decode_frame(AVCodecContext *avctx,
>>>>> AVFrame *rframe,
>>>>>        /* Start */
>>>>>        if (hwaccel) {
>>>>> +        ret = av_packet_ref(f->pkt_ref, avpkt);
>>>>> +        if (ret < 0)
>>>>> +            return ret;
>>>>> +
>>>>>            ret = hwaccel->start_frame(avctx, avpkt->data, avpkt->size);
>>>>>            if (ret < 0)
>>>>>                return ret;
>>>>> @@ -720,15 +728,21 @@ static int decode_frame(AVCodecContext *avctx,
>>>>> AVFrame *rframe,
>>>>>                uint32_t len;
>>>>>                ret = find_next_slice(avctx, avpkt->data, buf_end, i,
>>>>>                                      &pos, &len);
>>>>> -            if (ret < 0)
>>>>> +            if (ret < 0) {
>>>>> +                av_packet_unref(f->pkt_ref);
>>>>>                    return ret;
>>>>> +            }
>>>>>                buf_end -= len;
>>>>>                ret = hwaccel->decode_slice(avctx, pos, len);
>>>>> -            if (ret < 0)
>>>>> +            if (ret < 0) {
>>>>> +                av_packet_unref(f->pkt_ref);
>>>>>                    return ret;
>>>>> +            }
>>>>>            }
>>>>> +
>>>>> +        av_packet_unref(f->pkt_ref);
>>>>>        } else {
>>>>>            ret = decode_slices(avctx, c, avpkt);
>>>>>            if (ret < 0)
>>>>> @@ -827,6 +841,7 @@ static av_cold int
>>>>> ffv1_decode_close(AVCodecContext *avctx)
>>>>>        ff_progress_frame_unref(&s->last_picture);
>>>>>        av_refstruct_unref(&s->hwaccel_last_picture_private);
>>>>> +    av_packet_free(&s->pkt_ref);
>>>>>        ff_ffv1_close(s);
>>>>>        return 0;
>>>>
>>>> Why not simply use a const AVPacket*?
>>>
>>> No reason. Fixed locally.
>>> Thanks.
>>
>> *reverted this change.
>> We need to ref the packet, since we map its memory and let the GPU use
>> it directly without copying the contents. 6k16bit content at 24fps is
>> typically around 2Gbps when compressed, so avoiding copies is important.
> 
> How long does the hwaccel need this data?

Until the frame has been asynchronously decoded. We give an output frame 
with a semaphore that receivers need to wait on to determine when that is.

On the decoder-side, the hardware has a fixed number of queues where 
submissions can be sent to asynchronously. We treat it as a ring buffer 
and keep a reference to all resources our side for each submission, 
until we need to reuse the slot, at which point we wait on the frame 
decoding to complete (which it usually has), and we release all 
resources used.

Output frames also have a bit of state that has to be freed once the 
frame is marked (unreferenced) by the decoder as no longer being needed 
as a reference, this is done in the FFHWAccel.free_frame_priv callback. 
There, we have to wait for the last internal use of the frame to be 
finished (done via the vp->wait_semaphores() call in vulkan_decode.c).

This is valid for both ASIC hardware decoders and a compute shader based 
implementation, since the two share the same code, except for decode 
submissions.