[FFmpeg-devel] [PATCH] AVHWAccel infrastructure v2 (take 6) [ping^3]

Sat Oct 30 00:32:31 CEST 2010

Hi,

Le 29 oct. 10 ? 23:45, Michael Niedermayer a ?crit :

> On Fri, Oct 29, 2010 at 02:52:41PM +0200, Gwenole Beauchesne wrote:
>> Hi,
>>
>> Here is a new patch that applies to current SVN tree with some  
>> further
>> cosmetics (comments) added.
>>
>> Regards,
>> Gwenole.
>>
>> On Mon, 20 Sep 2010, Gwenole Beauchesne wrote:
>>
>>> It seems I didn't reply to this one or the message got lost  
>>> because I
>>> had a v5 lying around for some time. Here is a v6 anyway, that
>>> intersects with the tidsp needs.
>>>
>>> On Mon, 22 Feb 2010, Michael Niedermayer wrote:
>>>
>>>> On Mon, Jan 04, 2010 at 04:39:02PM +0100, Gwenole Beauchesne wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, 4 Jan 2010, Michael Niedermayer wrote:
>>>>>
>>>>>>> In the new model I wanted to use, hwaccel_context was to be  
>>>>>>> solely
>>>>>>> maintained by libavcodec. However, it still needed a way to  
>>>>>>> get some
>>>>>>> hwaccel context initialized by the user-application. e.g. a  
>>>>>>> VADisplay
>>>>>>> (void
>>>>>>> *), an LPDIRECTXVIDEODECODER, etc.
>>>>>>>
>>>>>>> I felt interesting to pass this information through hwaccel  
>>>>>>> attrs along
>>>>>>> with other int attributes.
>>>>>>
>>>>>> We have existing means to pass information, that is through
>>>>>> AVCodecContext
>>>>>> and using AVOption where appropriate
>>>>>> As well as using AVFrame instead of AVCodecContext where its  
>>>>>> per frame
>>>>>> data
>>>>>
>>>>> OK, I nuked the hwaccel attrs and replaced it with
>>>>> AVCodecContext.hwaccel_{id,flags}. hwaccel_flags is handled by  
>>>>> AVOption. I
>>>>> left hwaccel_id as is to match existing practise (pix_fmt et al.).
>>>>>
>>>>> I haven't tested with MPlayer yet but this new approach still  
>>>>> works
>>>>> with my
>>>>> modified ffplay.
>>>>>
>>>>> So, HW accelerator selection works as is:
>>>>> - Set AVCodecContext.hwaccel_id to the desired accelerator
>>>>> - Set AVCodecContext.hwaccel_flags to HWACCEL_CAP_GET_PIXELS if  
>>>>> the user
>>>>> wants AVFrame.data[0-2] populated with YV12 or NV12 pixels
>>>>>
>>>>> This means we can handle multiple accelerators automatically at  
>>>>> once.
>>>>> Anyway, this lived in an ideal world without practical use since  
>>>>> the
>>>>> user-application would still have to handle the HW accelerator
>>>>> specifically.
>>>>>
>>>>> Regards,
>>>>> Gwenole.
>>>>
>>>> ]...]
>>>>> @@ -2497,7 +2498,7 @@ typedef struct AVCodecContext {
>>>>>      * provided by the user. In that case, this holds display- 
>>>>> dependent
>>>>>      * data FFmpeg cannot instantiate itself. Please refer to the
>>>>>      * FFmpeg HW accelerator documentation to know how to fill  
>>>>> this
>>>>> -     * is. e.g. for VA API, this is a struct vaapi_context.
>>>>> +     * is. e.g. for VA API, this is a VADisplay.
>>>>>      * - encoding: unused
>>>>>      * - decoding: Set by user
>>>>>      */
>>>>
>>>> hunk ok probably
>>>>
>>>> [...]
>>>>> +    /**
>>>>> +     * Called in codec initialization.
>>>>> +     *
>>>>> +     * This is used to initialize the  
>>>>> AVCodecContext.hwaccel_context
>>>>> +     * hardware accelerator context.
>>>>> +     *
>>>>> +     * @param avctx the codec context
>>>>> +     * @return zero if successful, a negative value otherwise
>>>>> +     */
>>>>> +    int (*init)(AVCodecContext *avctx);
>>>>> +
>>>>> +    /**
>>>>> +     * Called in codec finalization.
>>>>> +     *
>>>>> +     * This is used to clean-up any data from the hardware  
>>>>> accelerator
>>>>> +     * context. Should this function be implemented, it shall  
>>>>> reset
>>>>> +     * AVCodecContext.hwaccel_context to NULL.
>>>>> +     *
>>>>> +     * @param avctx the codec context
>>>>> +     * @return zero if successful, a negative value otherwise
>>>>> +     */
>>>>> +    int (*close)(AVCodecContext *avctx);
>>>>> +
>>>>> +    /**
>>>>> +     * Called at the beginning of each frame to allocate a HW  
>>>>> surface.
>>>>> +     *
>>>>> +     * The returned surface will be stored in Picture.data[3].
>>>>> +     *
>>>>> +     * @param avctx the codec context
>>>>> +     * @param surface the pointer to the HW accelerator surface
>>>>> +     * @return zero if successful, a negative value otherwise
>>>>> +     */
>>>>> +    int (*alloc_surface)(AVCodecContext *avctx, uint8_t  
>>>>> **surface);
>>>>> +
>>>>> +    /**
>>>>> +     * Called to free the HW surface that was allocated with
>>>>> alloc_surface().
>>>>> +     *
>>>>> +     * @param avctx the codec context
>>>>> +     * @param surface the HW accelerator surface
>>>>> +     * @return zero if successful, a negative value otherwise
>>>>> +     */
>>>>> +    void (*free_surface)(AVCodecContext *avctx, uint8_t  
>>>>> *surface);
>>>>> } AVHWAccel;
>>>>
>>>> iam not in favor of these. What are these good for?
>>>> Why are they needed?
>>>
>>> This is used to allocate the HW surfaces (VdpSurface, VASurface,  
>>> etc.)
>>> from within FFmpeg. i.e. the AVFrame.data[3] part, hence the  
>>> uint8_t *
>>> instead of the void *.
>>
>> Those functions are also used to help user implement
>> AVCodecContext.get_buffer() and release_buffer() if he wants to do so
>> manually.
>>
>> e.g. by default, AVCodecContext.get_buffer() will point to
>> avcodec_default_get_buffer(). If a user overrides that function, he  
>> will
>> lose automatic data[3] handling. So, two possible cases:
>> 1) He really knows how to do so, so he allocates the HW surface  
>> himself
>> 2) He can delegate that to FFmpeg that will know how to do that  
>> just fine
>> too. e.g. calling the right vaCreateSurfaces(),  
>> VdpVideoSurfaceCreate(),
>> etc.
>
> the user can call the default get/release buffer. And if the user  
> doesnt know
> how to allocate the surface what is he doing with the functions  
> thats their
> purpose. Your explanations are
> so incomplete and terse that i cannot really comment. Because i do  
> not know
> what problem this solves or where this problem would arrise

I want the default avcodec_default_get_buffer() allocate HW surfaces  
(Picture.data[3]) itself the same way it would allocate memory for SW  
decoding (Picture.data[0-2]). Since the way a HW surface is allocated  
is HW-dependent, this needs per-accelerator hooks.

There are two cases to consider:
(a) The user requested the decoded pixels back to Picture.data[0-2]  
(AV_HWACCEL_CAP_GET_PIXELS), thus we need that along with the  
Picture.data[3].
(b) The user wants to retain the decoded pixels in GPU memory and  
avoid a round-trip GPU -> CPU -> GPU (for presentation). Thus,  
data[0-2] doesn't need to be allocated but Picture.data[3] must still  
be allocated, the HW surface.

In the V6 revision of the patch, I proposed a way to account for both.  
i.e. if HW accelerator is enabled, only allocate the Picture.data[0-2]  
if the user requested for it through the AV_HWACCEL_CAP_GET_PIXELS  
flag. Otherwise, don't allocate anything in there. This can save up to  
50 MB of memory. e.g. 16x 1080p reference frames.

Now, imagine the user wants to use his own version of get_buffer() to  
e.g. manage reordered_opaque or whatever other field. In his  
get_buffer() for SW decoding, he would initially have:

/* SW */
pic->type = FF_BUFFER_TYPE_USER;
pic->reordered_opaque = avctx->reordered_opaque;
// allocate pic->data[0-2] planes

/* HW: case (a) */
pic->type = FF_BUFFER_TYPE_USER;
pic->reordered_opaque = avctx->reordered_opaque;
// allocate pic->data[0-2] planes with his own memory
if (avctx->hwaccel && avctx->hwaccel->alloc_surface)
   pic->data[3] = avctx->hwaccel->alloc_surface(avctx);

/* HW: case (b) */
pic->type = FF_BUFFER_TYPE_USER;
pic->reordered_opaque = avctx->reordered_opaque;
if (avctx->hwaccel) {
   pic->data[0..2] = NULL;
   pic->data[3] = avctx->hwaccel->alloc_surface(avctx);
} else {
// allocate pic->data[0-2] planes with his own memory
}

Now, if we don't have the {alloc/free}_surface() functions, we would  
need to:

/* HW: case (b) */
pic->type = FF_BUFFER_TYPE_USER;
pic->reordered_opaque = avctx->reordered_opaque;
if (avctx->hwaccel) {
   switch (avctx->hwaccel_id) {
   case HWACCEL_ID_API_1: pic->data[3] =  
allocate_hw_surface_for_API_1(avctx); break;
   case HWACCEL_ID_API_2: pic->data[3] =  
allocate_hw_surface_for_API_2(avctx); break;
   // ...
   }
} else {
// allocate pic->data[0-2] planes with his own memory
}

i.e. mass code duplication accross players with plenty of cases to  
chose (and implement) for the favorite API the end-user might want. As  
Reimar pointed out, the end effect of hwaccel should be that only as  
few as 10 lines are needed in the user applications to actually use  
hwaccel.

Regards,
Gwenole.