[FFmpeg-devel] [PATCH] Generic part of frame multithreading

Wed Aug 20 20:26:32 CEST 2008

On Tue, Aug 19, 2008 at 10:55:21PM -0400, Alexander Strange wrote:
> This patch adds the new API and generic implementation for frame 
> multithreading. It doesn't cause any behavior changes, since it's just more 
> functions with no users yet.
>
> Some issues:

> - No support for field pictures or slice+frame threading. Without these 
> there'll be some speed regressions for for some TV streams, especially for 
> PAFF. I'll try to get the first one done soon, as it's pretty easy.

I assum this regression is limited to when threading is enabled? So that
there is no regression when its disabled?

> - The thread.h functions with "decode" in the name need to be renamed, 
> since they'll be used for encoding as well. I haven't thought of a better 
> name for "ff_report_predecode_done" that isn't too vague yet.

> - https://roundup.mplayerhq.hu/roundup/ffmpeg/issue590

Lets see if i understand this corectly
* user apps never feed NULL into the decode function except at EOF
  before as well as after your patch
* user apps feed NULL into the decoder to flush out frames at EOF
* your decoder returns "no frames" while it actually still has frames
  at EOF
-> If i understand this correctly its a bug in the affected decoder ...

> - Some clients can't handle the extra decoder delay right, so I attached 
> another patch that disables it by default. I don't know whether it's a good 
> idea since we won't know when it's safe to to make it default again.

> - I'm not sure the documentation is good enough.

what documentation? ;)

Beside that i wouldnt mind if you could describe how it works from a high
level point of view ...
(could help me with the review as well as anyone who later wants to do
something with it

[...]
> Index: libavcodec/w32thread.c
> ===================================================================
> --- libavcodec/w32thread.c	(revision 14856)
> +++ libavcodec/w32thread.c	(working copy)
> @@ -104,7 +104,10 @@
>      ThreadContext *c;
>      uint32_t threadid;
>  
> +    if(s->thread_opaque) return 0;
> +

why is this needed?

>      s->thread_count= thread_count;
> +    s->thread_algorithm= FF_THREAD_MULTISLICE;

i suspect this should rather be a if( != ) error, at least when the user
explicitly requested frame threading

[...]

> @@ -847,7 +852,17 @@
>      avctx->codec = codec;
>      avctx->codec_id = codec->id;
>      avctx->frame_number = 0;
> -    if(avctx->codec->init){
> +
> +    if (ENABLE_THREADS) {
> +        ret = avcodec_thread_init(avctx, avctx->thread_count);
> +        if (ret < 0) {
> +            av_freep(&avctx->priv_data);
> +            avctx->codec= NULL;
> +            goto end;
> +        }
> +    }
> +

> +    if(avctx->codec->init && !USE_FRAME_THREADING(avctx)){

iam against the use of weird macros, exceptions to this are only
when it improves readablity or speed. (it cannot improve speed in
functions that are irrelevant to speed)
And considering that this macro is not defined anywhere or a simple
grep is not enough to find it, as is it significnatly reduces
readablity as well.

[...]
> @@ -980,6 +998,7 @@
>  
>  int avcodec_close(AVCodecContext *avctx)
>  {
> +    int threaded = USE_FRAME_THREADING(avctx);
>      entangled_thread_counter++;
>      if(entangled_thread_counter != 1){
>          av_log(avctx, AV_LOG_ERROR, "insufficient thread locking around avcodec_open/close()\n");
> @@ -989,7 +1008,7 @@
>  
>      if (ENABLE_THREADS && avctx->thread_opaque)
>          avcodec_thread_free(avctx);
> -    if (avctx->codec->close)
> +    if (avctx->codec->close && !threaded)
>          avctx->codec->close(avctx);
>      avcodec_default_free_buffers(avctx);
>      av_freep(&avctx->priv_data);

why?
and the the threaded var is redundant

> @@ -1251,7 +1270,9 @@
>  
>  void avcodec_flush_buffers(AVCodecContext *avctx)
>  {
> -    if(avctx->codec->flush)
> +    if(USE_FRAME_THREADING(avctx))
> +        ff_frame_thread_flush(avctx);
> +    else if(avctx->codec->flush)
>          avctx->codec->flush(avctx);
>  }

why not
if(frame_thraading)
    ff_frame_thread_flush(avctx);
if(avctx->codec->flush)
    avctx->codec->flush(avctx);
?

having ff_frame_thread_flush call avctx->codec->flush behind is not good for
sake of readablity

[...]

> @@ -767,7 +771,21 @@
>       * - encoding: Set by user.\
>       * - decoding: Set by libavcodec.\
>       */\
> -    int8_t *ref_index[2];
> +    int8_t *ref_index[2];\
> +\
> +    /**\
> +     * context owning the internal buffers\
> +     * - encoding: Set by libavcodec.\
> +     * - decoding: Set by libavcodec.\
> +     */\
> +    struct AVCodecContext *avctx;\

id say this needs some more extensive documentation and likely a better
variable name as well
when there are several AVCodecContext then avctx just isnt a good name
anymore

> +\
> +    /**\
> +     * Can be used by multithreading to track frames\
> +     * - encoding: Set by libavcodec.\
> +     * - decoding: Set by libavcodec.\
> +     */\
> +    void *thread_opaque;
>  
>  #define FF_QSCALE_TYPE_MPEG1 0
>  #define FF_QSCALE_TYPE_MPEG2 1

> @@ -923,7 +941,7 @@
>       * If non NULL, 'draw_horiz_band' is called by the libavcodec
>       * decoder to draw a horizontal band. It improves cache usage. Not
>       * all codecs can do that. You must check the codec capabilities
> -     * beforehand.
> +     * beforehand. May be called by different threads at the same time.

... when frame threading is enabled

>       * - encoding: unused
>       * - decoding: Set by user.
>       * @param height the height of the slice
> @@ -951,7 +969,7 @@
>       * Samples per packet, initialized when calling 'init'.
>       */
>      int frame_size;
> -    int frame_number;   ///< audio or video frame number
> +    int frame_number;   ///< Number of audio or video frames returned so far
>      int real_pict_num;  ///< Returns the real picture number of previous encoded frame.
>  
>      /**

> @@ -1166,6 +1184,7 @@
>       * If pic.reference is set then the frame will be read later by libavcodec.
>       * avcodec_align_dimensions() should be used to find the required width and
>       * height, as they normally need to be rounded up to the next multiple of 16.
> +     * May be called by different threads, but not more than one at the same time.
>       * - encoding: unused
>       * - decoding: Set by libavcodec., user can override.
>       */
> @@ -1175,6 +1194,7 @@
>       * Called to release buffers which were allocated with get_buffer.
>       * A released buffer can be reused in get_buffer().
>       * pic.data[*] must be set to NULL.
> +     * May be called by different threads, but not more than one at the same time.
>       * - encoding: unused
>       * - decoding: Set by libavcodec., user can override.
>       */

did i miss the similar change to reget_buffer() ?

> @@ -2230,6 +2250,25 @@
>       * - decoding: Set by user.
>       */
>      float drc_scale;
> +
> +    /**
> +     * Whether this is a copy of the context which had init() called on it.
> +     * This is used by multithreading - shared tables and picture pointers
> +     * should be freed from the original context only.
> +     * - encoding: Set by libavcodec.
> +     * - decoding: Set by libavcodec.
> +     */
> +    int is_copy;
> +

> +    /**
> +     * Which multithreading method to use, for codecs that support more than one.
> +     * - encoding: Set by user, otherwise the default is used.
> +     * - decoding: Set by user, otherwise the default is used.
> +     */
> +    int thread_algorithm;

> +#define FF_THREAD_AUTO       0 //< Use the default. Will be changed by libavcodec after init.

the changed is not so great ...
havng lavc change variables the user set is something exceptional not
something to handle "defaults" IMO
besides DEFAULT is a better term tha AUTO i think

> +#define FF_THREAD_MULTIFRAME 1 //< Decode more than one frame at once
> +#define FF_THREAD_MULTISLICE 2 //< Decode more than one part of a single frame at once

with
#define FF_THREAD_MULTIFRAME 1
#define FF_THREAD_MULTISLICE 2
#define FF_THREAD_AUTO       3

a simple thread_algorithm & FF_THREAD_MULTIFRAME could be used

>  } AVCodecContext;
>  
>  /**
> @@ -2271,6 +2310,14 @@
>      const char *long_name;
>      const int *supported_samplerates;       ///< array of supported audio samplerates, or NULL if unknown, array is terminated by 0
>      const enum SampleFormat *sample_fmts;   ///< array of supported sample formats, or NULL if unknown, array is terminated by -1
> +
> +    /**
> +     * @defgroup multithreading Multithreading support functions.
> +     * @{
> +     */
> +    int (*init_copy)(AVCodecContext *);     ///< called after copying initially, re-allocate all writable tables
> +    int (*update_context)(AVCodecContext *, AVCodecContext *from); ///< copy everything needed from the last thread before every new frame
> +    /** @} */
>  } AVCodec;

I do think they need more precisse and elaborate description
besides something tells me a
clone(AVCodecContext *new, AVCodecContext *old)
would be better than a memcpy() ; init_copy(dst)

[...]
> +typedef struct PerThreadContext {
> +    pthread_t thread;
> +    pthread_cond_t input_cond, progress_cond, output_cond;
> +    pthread_mutex_t mutex, progress_mutex, buffer_mutex;
> +

> +    AVCodecContext *context;

and here i think avctx would be better than context

> +
> +    uint8_t *buf;
> +    int buf_size, allocated_buf_size;
> +
> +    AVFrame picture;
> +    int got_picture, output_res;
> +
> +    struct FrameThreadContext *parent;
> +
> +    int decode_progress;
> +
> +    enum {
> +        STATE_INPUT_READY,
> +        STATE_PREDECODING,
> +        STATE_DECODING
> +    } state;
> +
> +    AVFrame released_buffers[MAX_DELAYED_RELEASED_BUFFERS];
> +    int nb_released_buffers;
> +} PerThreadContext;
> +
> +typedef struct FrameThreadContext {
> +    PerThreadContext *threads;
> +    PerThreadContext *prev_thread;
> +
> +    int next_available, next_ready;
> +    int delaying;
> +
> +    int die;
> +} FrameThreadContext;

please document all structure members and all structures

[...]
> @@ -105,6 +146,9 @@
>      ThreadContext *c= avctx->thread_opaque;
>      int dummy_ret;
>  
> +    if (!USE_AVCODEC_EXECUTE(avctx) || avctx->thread_count <= 1)
> +        return avcodec_default_execute(avctx, func, arg, ret, job_count);
> +

another one of these opaque macors :/
if only i would remember the address of the git repo to find out if it
maybe is defined there ...

[...]
> +
> +static void handle_delayed_releases(PerThreadContext *p)
> +{
> +    while (p->nb_released_buffers > 0)
> +        ff_release_buffer(&p->released_buffers[--p->nb_released_buffers]);
> +}
> +

plaese document all functions

> +static int submit_frame(PerThreadContext * volatile p, const uint8_t *buf, int buf_size)
> +{
> +    FrameThreadContext *fctx = p->parent;
> +    PerThreadContext *prev_thread = fctx->prev_thread;
> +    AVCodec *codec = p->context->codec;
> +    int err = 0;
> +
> +    if (!buf_size && !(codec->capabilities & CODEC_CAP_DELAY)) return 0;
> +
> +    pthread_mutex_lock(&p->mutex);
> +    if (prev_thread) err = update_context_from_copy(p->context, prev_thread->context, 0);
> +    if (err) return err;
> +
> +    p->buf = av_fast_realloc(p->buf, &p->allocated_buf_size, buf_size);
> +    memcpy(p->buf, buf, buf_size);

copying every frame does not seem particularly reasonable to me, considering
that it should be easy for most user apps to keep a few buffers around or
if they cant they can still copy ...

This also reminds me that the lavc API should be changed to use AVPackets
as input to the decode functions. This would allow av_packet_dup() to be
used which would just copy packets when needed ...

[...]
> +int ff_decode_frame_threaded(AVCodecContext *avctx,
> +                             void *data, int *data_size,
> +                             const uint8_t *buf, int buf_size)
> +{
> +    FrameThreadContext *fctx;
> +    PerThreadContext * volatile p;
> +    int thread_count = avctx->thread_count, err = 0;
> +
> +    if (!avctx->thread_opaque) ff_frame_thread_init(avctx);
> +
> +    fctx = avctx->thread_opaque;
> +    p = &fctx->threads[fctx->next_available];
> +    update_context_from_user(p->context, avctx);
> +    err = submit_frame(p, buf, buf_size);
> +
> +    if (err) return err;
> +
> +    fctx->next_available++;
> +
> +    if (fctx->delaying) {
> +        if (fctx->next_available >= (thread_count-1)) fctx->delaying = 0;
> +
> +        *data_size=0;
> +        return 0;
> +    }
> +
> +    p = &fctx->threads[fctx->next_ready++];
> +
> +    pthread_mutex_lock(&p->progress_mutex);

> +    while (p->state != STATE_INPUT_READY) pthread_cond_wait(&p->output_cond, &p->progress_mutex);

too much on one line ...

[...]
> +void ff_await_decode_progress(AVFrame *f, int n)
> +{

non static function that is not used anywhere nor mentioned in any header
nor is it documented

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080820/bd5d97bf/attachment.pgp>