[FFmpeg-devel] [RFC] threading API

Reimar Döffinger Reimar.Doeffinger
Thu Oct 8 09:31:02 CEST 2009

On Wed, Oct 07, 2009 at 06:13:46PM +0200, Michael Niedermayer wrote:
> On Tue, Oct 06, 2009 at 04:34:12PM +0200, Reimar D?ffinger wrote:
> > On Tue, Oct 06, 2009 at 03:23:14PM +0200, Michael Niedermayer wrote:
> > > maybe we should have 2 arrays, 1 per thread, 1 per task, and they can be NULL
> > 
> > Again, that requires an API change. Though on thinking again my approach
> > isn't really possible without either...
> i have no matches in ffmpeg.c for execute() ...

What does that have to do with it? The documentation for execute says:
     * The user may replace this with some multithreaded implementation,
     * the default implementation will execute the parts serially.
     * @param count the number of things to execute
     * - encoding: Set by libavcodec, user can override.
     * - decoding: Set by libavcodec, user can override.

Apart from the "default implementation will execute the parts serially"
which is wrong anyway, execute can be set by users, thus changing it is
an API change.
Of course it would be possible to just add an execute2 function.

> > Apart from that it seems less flexible than just having a void * that is
> > the same for all and passing thread and task number.
> iam fine with that as well

I'll see if/when I'll find the time to implement a proposal using a new
execute2 function.

> > You'd have to also assume that thread 1 also gets task 1, for which I
> > think there is no reason at all with the current code (haven't checked
> > though).
> i think thread 1 gets task 1 if not that would be a bug IMO

That is no how any of the threading APIs design things. It is of course
possible to hack this in manually, but that will probably cost
performance, e.g. due to causing excessive context switches.
E.g. for pthreads the current code does:
Whichever thread the scheduler happens to wake up first will get the
first task.

> > Also with low_delay (or in general without B-frames) only the
> > B-frames get any advantage unless you assume the decoded data isn't
> > (immediately) used by the application.
> the app could also have 2 threads on the same cpus or whatever to
> use the data split to maintain cache efficiency ...

So it is supposed to hack into the kernel (well, there might be an API)
to find out on which CPU thread 1 of FFmpeg runs and then explicitly
schedule its threads to do so to, and of course check all the time if
maybe the kernel moved them around? (e.g. Windows before 7 swapped threads
around CPUs all the time).
Yes, this can be implemented, but it is not done now.
But this also can be done for a slice-based approach, combining both
advantages (with more effort though I admit):
make each thread prefer (task % threadcount) == threadnum, and only if
none of these are available switch to other tasks (and there preferably
starting from the task with the highest number, though without that it
is simpler, you'd only need thread_count current task counters instead
of the one you currently have).

More information about the ffmpeg-devel mailing list