[FFmpeg-devel] Fwd: pipeline multithreading

Daniel Oberhoff danieloberhoff at gmail.com
Wed Nov 26 00:21:03 CET 2014



Von meinem iPhone gesendet

Anfang der weitergeleiteten E‑Mail:

> Von: Daniel Oberhoff <danieloberhoff at gmail.com>
> Datum: 26. November 2014 00:19:30 MEZ
> An: Reimar Döffinger <Reimar.Doeffinger at gmx.de>
> Kopie: "Daniel Oberhoff (Privat)" <danieloberhoff at gmail.com>
> Betreff: Re: [FFmpeg-devel] pipeline multithreading
> 
> 
> 
> Von meinem iPhone gesendet
> 
>> Am 25.11.2014 um 21:37 schrieb Reimar Döffinger <Reimar.Doeffinger at gmx.de>:
>> 
>>> On 25.11.2014, at 21:27, "Daniel Oberhoff (Privat)" <danieloberhoff at gmail.com> wrote:
>>>> Am 25.11.2014 um 11:33 schrieb Reimar Döffinger <Reimar.Doeffinger at gmx.de>:
>>>> 
>>>> On 25.11.2014, at 10:01, Daniel Oberhoff <danieloberhoff at gmail.com> wrote:
>>>>>>> Am 24.11.2014 um 17:16 schrieb Reimar Döffinger <Reimar.Doeffinger at gmx.de>:
>>>>>>> 
>>>>>>> On Mon, Nov 24, 2014 at 12:35:58PM +0100, Daniel Oberhoff wrote:
>>>>>>> inout -> filter1 -> filter2 -> output
>>>>>>> 
>>>>>>> some threads processing frame n in the output (i.e. encoding), other threads procesing frame n+1 in filter2, others processing frame n+2 in filter1, and yet others processing frame n+3 decoding. This way non-parallel filters can be sped up, and diminishing returns for too much striping can be avoided. With modern cpus scaling easily up to 24 hardware threads I see this as neccessary to fully utilize the hardware.
>>>>>> 
>>>>>> Keep in mind the two things:
>>>>>> 1) It only works for cases where many filters are used, which is not
>>>>>> necessarily a common case
>>>>> 
>>>>> Also, not quite. Even just decode/encode had a pipeline depth of 2 (the decoder could decode frame n+1 while the encoder encodes frame n). Every filter deepens this more...
>>>> 
>>>> If you run encode and decode with multithreading, they already run in different threads.
>>>> So if you have only one filter, you should not have any gains at all from per-frame filter multithreading.
>>> 
>>> So you are saying if I have
>>> 
>>> decode->filter(1)->encode
>>> 
>>> I will have n(decode) + n(sclices) + n(encode) threads? Even here I think a clever setup of data following threads would improve performance, but probably not pressingly so.
>>> 
>>> Anyhow, frame multithreading within the filter graph would help us a lot, so I guess my question shifts towards how hard it would be to implement, and if I end up starting the endeavor if there is interest in merging this work.
>> 
>> I am sure there is interest in principle, I just expect that it is quite some effort and not all that useful in the most common use-cases, which reduces chances of someone else working on it.
>> Of course in the case where filters only act on one single frame and independent of previous state an easy hack is possible (even purely application-side): just instantiate the same filter graph n times, each with its own thread, and push/pull round-robin.
>> However any more advanced filter (deshake, deinterlace, temporal denoise etc) will horribly break...
>> Also Stefano is the main avfilter maintainer, I don't really work on it at all, just commenting from the "sidelines".
> 
> Hmm, that clone hack sounds nasty. I don't believe locally temporally dependent filters would break a (non-hacked) frame multithreading, since at some point they have to produce a frame, which the next stage can work on, albeit only after having consumed n frames, n being their range of dependency (or even 1 if they are implemented recursively). Well, I will have a look at the code...


More information about the ffmpeg-devel mailing list