[FFmpeg-devel] [PATCH] Generic part of frame multithreading

Tue Aug 26 17:26:49 CEST 2008

On Tue, Aug 26, 2008 at 03:23:06AM -0400, Alexander Strange wrote:
>
> On Aug 20, 2008, at 2:26 PM, Michael Niedermayer wrote:
>
>> On Tue, Aug 19, 2008 at 10:55:21PM -0400, Alexander Strange wrote:
[...]
>> * your decoder returns "no frames" while it actually still has frames
>>  at EOF
>> -> If i understand this correctly its a bug in the affected decoder ...
>
> In this case, it returns whatever the codec's decode() would have returned 
> 3 frames ago (assuming 3 threads). Since h264 returns "no picture" if 
> passed a top field, it returns that. I think "no picture" isn't really the 
> same thing as "no frames", and if the threading dropped "no picture" 
> results from the codec it might confuse clients.

Frame threading can already confuse clients and clearly breaks the API/ABI
by introducing extra delay.
For example current API _requires_ DTS from the demuxer to match the
timestamps of each outputted frame from the decoder.
a quick grep shows no match for "dts" in your patch so it seems this hasnt
been considered or documented at all.

If decode_video would take AVPackets and AVFrame has a AVPacket field then
this may be trivial to solve ...

[...]
>>
>>
>> [...]
>>> Index: libavcodec/w32thread.c
>>> ===================================================================
>>> --- libavcodec/w32thread.c	(revision 14856)
>>> +++ libavcodec/w32thread.c	(working copy)
>>> @@ -104,7 +104,10 @@
>>>     ThreadContext *c;
>>>     uint32_t threadid;
>>>
>>> +    if(s->thread_opaque) return 0;
>>> +
>>
>> why is this needed?
>
> avcodec_thread_init needs to see the codec capabilities field, so I call it 
> from avcodec_open. This means it's called twice, so it needs to handle 

sounds like a hack ....

> that. Maybe this could be removed when slice+frame threading is supported, 
> but I've never understood why clients need to call avcodec_thread_init 
> instead of just setting thread_count.

Feel free to change it but dont hack it please!
Especially not with something like
if(opaque) return 0
and no comment at all, why such hack would be needed and be unavoidable

>
>>>     s->thread_count= thread_count;
>>> +    s->thread_algorithm= FF_THREAD_MULTISLICE;
>>
>> i suspect this should rather be a if( != ) error, at least when the user
>> explicitly requested frame threading
>
> Done.
>
>> [...]
>>> @@ -847,7 +852,17 @@
>>>     avctx->codec = codec;
>>>     avctx->codec_id = codec->id;
>>>     avctx->frame_number = 0;
>>> -    if(avctx->codec->init){
>>> +
>>> +    if (ENABLE_THREADS) {
>>> +        ret = avcodec_thread_init(avctx, avctx->thread_count);
>>> +        if (ret < 0) {
>>> +            av_freep(&avctx->priv_data);
>>> +            avctx->codec= NULL;
>>> +            goto end;
>>> +        }
>>> +    }
>>> +
>>
>>> +    if(avctx->codec->init && !USE_FRAME_THREADING(avctx)){
>>
>> iam against the use of weird macros, exceptions to this are only
>> when it improves readablity or speed. (it cannot improve speed in
>> functions that are irrelevant to speed)
>> And considering that this macro is not defined anywhere or a simple
>> grep is not enough to find it, as is it significnatly reduces
>> readablity as well.
>
> This expands to (ENABLE_THREADS && avctx->active_thread_algorithm == 
> FF_THREAD_MULTIFRAME). The first part isn't needed, but leads to a very 
> small optimization since we can have any codec threading code under if (0) 
> when threads aren't compiled in. This is only about 3% size reduction for 
> h264.o, so it's not really important - the only real reason is because the 
> real code is a bit long. I think the name of this macro is pretty clear, 
> but I can remove it if you want.

iam somewhat against the macro you can write
if(ENABLE_THREADS && avctx->frame_threads)
its almost as terse and much cleaner

It would even allow to control the exact amount of frame and slice threading
though thats seperate of course and not critical ...

>
>> [...]
>>> @@ -980,6 +998,7 @@
>>>
>>> int avcodec_close(AVCodecContext *avctx)
>>> {
>>> +    int threaded = USE_FRAME_THREADING(avctx);
>>>     entangled_thread_counter++;
>>>     if(entangled_thread_counter != 1){
>>>         av_log(avctx, AV_LOG_ERROR, "insufficient thread locking around 
>>> avcodec_open/close()\n");
>>> @@ -989,7 +1008,7 @@
>>>
>>>     if (ENABLE_THREADS && avctx->thread_opaque)
>>>         avcodec_thread_free(avctx);
>>> -    if (avctx->codec->close)
>>> +    if (avctx->codec->close && !threaded)
>>>         avctx->codec->close(avctx);
>>>     avcodec_default_free_buffers(avctx);
>>>     av_freep(&avctx->priv_data);
>>
>> why?
>
> codec close has to be called inside frame thread close. If it's called 
> before, another thread might be accessing it; if it's called after, it 
> might call some threading support function after the threading context has 
> been freed. This is avoidable, but I think the current way is simplest.

The current way is ugly, but when the alternaives are less simple then lets
leave it for now, maybe i can think of some solution later ...

>
>> and the the threaded var is redundant
>
> It was needed there (not obvious because of the missing header); I 
> simplified it.
>

>>
>>> @@ -1251,7 +1270,9 @@
>>>
>>> void avcodec_flush_buffers(AVCodecContext *avctx)
>>> {
>>> -    if(avctx->codec->flush)
>>> +    if(USE_FRAME_THREADING(avctx))
>>> +        ff_frame_thread_flush(avctx);
>>> +    else if(avctx->codec->flush)
>>>         avctx->codec->flush(avctx);
>>> }
>>
>> why not
>> if(frame_thraading)
>>    ff_frame_thread_flush(avctx);
>> if(avctx->codec->flush)
>>    avctx->codec->flush(avctx);
>> ?
>>
>> having ff_frame_thread_flush call avctx->codec->flush behind is not good 
>> for
>> sake of readablity
>
> Same reason as above; doing it that way will work, but because of how 
> release_buffer works with threading, the frames won't end up actually being 
> released for a while, which is surprising. Now that I look at it, maybe I 
> should comment that.

I really dislike this kind of special casing of the frame thread code
all over the place

Besides i do not understand why buffer releases are delayed
When each frame is finished it should be signalled to the user app with a
callback similar to release_buffer/get_buffer, the user app can then if it
needs to, place it in a que/mark it to prevent get_buffer from reusing it.
The default return_buffer() implementation could do something similar,
this wouldnt work with user apps that override get_buffer but not
return_buffer and enable frame threads, but then i really would say the
user app should be fixed instead of hacks being added to lavc.
Especially if these hacks cause performance loss ...

[...]
>>
>>>      * - encoding: unused
>>>      * - decoding: Set by user.
>>>      * @param height the height of the slice
>>> @@ -951,7 +969,7 @@
>>>      * Samples per packet, initialized when calling 'init'.
>>>      */
>>>     int frame_size;
>>> -    int frame_number;   ///< audio or video frame number
>>> +    int frame_number;   ///< Number of audio or video frames returned so 
>>> far
>>>     int real_pict_num;  ///< Returns the real picture number of previous 
>>> encoded frame.
>>>
>>>     /**
>>
>>> @@ -1166,6 +1184,7 @@
>>>      * If pic.reference is set then the frame will be read later by 
>>> libavcodec.
>>>      * avcodec_align_dimensions() should be used to find the required 
>>> width and
>>>      * height, as they normally need to be rounded up to the next 
>>> multiple of 16.
>>> +     * May be called by different threads, but not more than one at the 
>>> same time.
>>>      * - encoding: unused
>>>      * - decoding: Set by libavcodec., user can override.
>>>      */
>>> @@ -1175,6 +1194,7 @@
>>>      * Called to release buffers which were allocated with get_buffer.
>>>      * A released buffer can be reused in get_buffer().
>>>      * pic.data[*] must be set to NULL.
>>> +     * May be called by different threads, but not more than one at the 
>>> same time.
>>>      * - encoding: unused
>>>      * - decoding: Set by libavcodec., user can override.
>>>      */
>>
>> did i miss the similar change to reget_buffer() ?
>
> It seems like the purpose of reget_buffer() is to reuse the buffer with the 
> data from the last frame, for codecs that draw on top of it? If that's the 
> case, it's not compatible with frame threading.

hmm probably it wouldnt be trivial, lets forget about it ...

[...]
>>
>>> +static int submit_frame(PerThreadContext * volatile p, const uint8_t 
>>> *buf, int buf_size)
>>> +{
>>> +    FrameThreadContext *fctx = p->parent;
>>> +    PerThreadContext *prev_thread = fctx->prev_thread;
>>> +    AVCodec *codec = p->context->codec;
>>> +    int err = 0;
>>> +
>>> +    if (!buf_size && !(codec->capabilities & CODEC_CAP_DELAY)) return 0;
>>> +
>>> +    pthread_mutex_lock(&p->mutex);
>>> +    if (prev_thread) err = update_context_from_copy(p->context, 
>>> prev_thread->context, 0);
>>> +    if (err) return err;
>>> +
>>> +    p->buf = av_fast_realloc(p->buf, &p->allocated_buf_size, buf_size);
>>> +    memcpy(p->buf, buf, buf_size);
>>
>> copying every frame does not seem particularly reasonable to me, 
>> considering
>> that it should be easy for most user apps to keep a few buffers around or
>> if they cant they can still copy ...
>>
>> This also reminds me that the lavc API should be changed to use AVPackets
>> as input to the decode functions. This would allow av_packet_dup() to be
>> used which would just copy packets when needed ...
>
> Well, with the current API I can't guarantee anything about buffers so I 
> did it the safe way. CODEC_FLAG_INPUT_PRESERVED exists but doesn't cover 
> quite enough, and I don't see many users for it (only vlc) in google code 
> search. I can fix it if the API is better or if more clients can guarantee 
> not freeing them.

well if this isnt fixed than it should at least be very clearly marked as
something that needs to be fixed.

[...]
> @@ -775,6 +779,8 @@
>  
>      s->palctrl = NULL;
>      s->reget_buffer= avcodec_default_reget_buffer;
> +
> +    s->active_thread_algorithm= FF_THREAD_DEFAULT;
>  }
>  
>  AVCodecContext *avcodec_alloc_context2(enum CodecType codec_type){

This can be set through AVOptions default i think

[...]
> +int ff_get_buffer(AVCodecContext *avctx, AVFrame *f)
> +{
> +    int ret, *progress;
> +    PerThreadContext *p = avctx->thread_opaque;
> +
> +    f->owner = avctx;

> +    f->thread_opaque = progress = av_malloc(sizeof(int));

What is the sense behind this malloc?
also thread_opaque is clearly not the correct field semantically to keep
track of decoding progress of each frame/field

[...]
> +/// Set the threading algorithm used, or none if an algorithm was set but no thread count.
> +static void ff_validate_thread_parameters(AVCodecContext *avctx)
> +{

when its static then it doesnt need a ff_

[..]
> Restrictions on clients
> ==============================================
> 
> Slice threading -
> * If the client uses draw_horiz_band, it must handle it being called from
> separate threads.
> 
> Frame threading -
> * Restrictions with slice threading also apply.

> * get_buffer and release_buffer will be called by separate threads, but are
>   protected by a mutex.

... "so the functions do not need to be reentrant."

> * There is one frame of delay added for every thread. Use of reordered_opaque
>   will help with A/V sync problems, and clients should not assume that no frame
>   being returned after all frames have been submitted means there are no frames
>   left.

this one needs more discussion. As is it breaks API/ABI and actual applications
so it will have to be changed or you will have to provide some solid evidence
why the other way around would be worse or tricky.

> 
> Restrictions on codecs
> ==============================================
> 
> Slice threading -
> None.
> 
> Frame threading -

> * Relying on previous contents of buffers no longer works. This includes using
>   reget_buffer() and not copying skipped MBs.

elaborate on skipped MBs please.
I do not see how they would be affected by frame threading

> * Error resilience may not work as well, or deterministically.

It has to be fixed to work (yes this is a reason to reject the patch),
about not deterministically that can be discussed, i dont see immedeatly
why it would loose deterministicity but i woundnt mind if fixing it is hard

> * Accepting randomly truncated packets (CODEC_FLAG_TRUNCATED) no longer works.

thats ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I have often repented speaking, but never of holding my tongue.
-- Xenocrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080826/8c25f8dc/attachment.pgp>