[FFmpeg-devel] [RFC] threading API

Reimar Döffinger Reimar.Doeffinger
Tue Oct 6 16:34:12 CEST 2009


On Tue, Oct 06, 2009 at 03:23:14PM +0200, Michael Niedermayer wrote:
> On Tue, Oct 06, 2009 at 03:13:46PM +0200, Reimar D?ffinger wrote:
> > On Tue, Oct 06, 2009 at 01:52:47PM +0200, Michael Niedermayer wrote:
> > > On Tue, Oct 06, 2009 at 09:17:30AM +0200, Reimar D?ffinger wrote:
> > > > On Tue, Oct 06, 2009 at 09:01:15AM +0200, Reimar D?ffinger wrote:
> > > > > The later is only a very minor optimization, and if a API change should
> > > > > be avoided by all means, the former can be achieved also by adding to
> > > > > AVCodecContext a set_thread_number callback, that if set is executed
> > > > > each time before the main worker function with the same void *arg, i.e.
> > > > > void (*set_thread_number)(struct AVCodecContext *c, void *arg, int thread)
> > > > > 
> > > > > So, what are the opinions? Other ideas?
> > > > 
> > > > Actually, a better idea: add a
> > > > void *(get_worker_arg)(struct AVCodecContext *c, void *arg_array, int arg_size,
> > > > int arg_elem_size, int jobnr, int threadnr)
> > > > where the arg_ stuff matches with the avctx->execute (arg2, count, size)
> > > > arguments.
> > > 
> > > > The default implementation would then be
> > > > default_get_worker_arg:
> > > > return (char*)arg_array + jobnr*arg_elem_size;
> > > > whereas dnxhdenc would use something like
> > > > DNXHDContext *c = ((DNXHDContext **)arg_array)[threadnr];
> > > > c->start_slice = jobnr;
> > > > return c;
> > > 
> > > I see no reason why we ever should do 
> > > return (char*)arg_array + jobnr*arg_elem_size
> > > its always better to reuse contexts
> > 
> > Well, reason 1: it allows to change the existing code gradually
> > reason 2: it might be possible that some codecs do not need
> > to have a separate context for each thread anyway, they can all
> > use the same AVCodecContext (and its priv_data), but they need some
> > per-task data.
> 
> maybe we should have 2 arrays, 1 per thread, 1 per task, and they can be NULL

Again, that requires an API change. Though on thinking again my approach
isn't really possible without either...
Still, at least it wouldn't require changing all decoders/encoders at
once.
Apart from that it seems less flexible than just having a void * that is
the same for all and passing thread and task number.

> > I think that is what the current implementation
> > was thought for and which seems like a useful feature to me.
> 
> the current implementation was mostly toward threads == tasks, it has its
> advantages if one assumes that the OS keeps each thread on the same CPU
> then if the image is split in 2 the top is handled always on CPU #1 and the
> botton always on cpu #2, you have a advanatge due to better cache use for
> P and B frames

You'd have to also assume that thread 1 also gets task 1, for which I
think there is no reason at all with the current code (haven't checked
though). Also with low_delay (or in general without B-frames) only the
B-frames get any advantage unless you assume the decoded data isn't
(immediately) used by the application.



More information about the ffmpeg-devel mailing list