[FFmpeg-devel] Fwd: pipeline multithreading
Daniel Oberhoff
danieloberhoff at gmail.com
Wed Nov 26 00:21:03 CET 2014
Von meinem iPhone gesendet
Anfang der weitergeleiteten E‑Mail:
> Von: Daniel Oberhoff <danieloberhoff at gmail.com>
> Datum: 26. November 2014 00:19:30 MEZ
> An: Reimar Döffinger <Reimar.Doeffinger at gmx.de>
> Kopie: "Daniel Oberhoff (Privat)" <danieloberhoff at gmail.com>
> Betreff: Re: [FFmpeg-devel] pipeline multithreading
>
>
>
> Von meinem iPhone gesendet
>
>> Am 25.11.2014 um 21:37 schrieb Reimar Döffinger <Reimar.Doeffinger at gmx.de>:
>>
>>> On 25.11.2014, at 21:27, "Daniel Oberhoff (Privat)" <danieloberhoff at gmail.com> wrote:
>>>> Am 25.11.2014 um 11:33 schrieb Reimar Döffinger <Reimar.Doeffinger at gmx.de>:
>>>>
>>>> On 25.11.2014, at 10:01, Daniel Oberhoff <danieloberhoff at gmail.com> wrote:
>>>>>>> Am 24.11.2014 um 17:16 schrieb Reimar Döffinger <Reimar.Doeffinger at gmx.de>:
>>>>>>>
>>>>>>> On Mon, Nov 24, 2014 at 12:35:58PM +0100, Daniel Oberhoff wrote:
>>>>>>> inout -> filter1 -> filter2 -> output
>>>>>>>
>>>>>>> some threads processing frame n in the output (i.e. encoding), other threads procesing frame n+1 in filter2, others processing frame n+2 in filter1, and yet others processing frame n+3 decoding. This way non-parallel filters can be sped up, and diminishing returns for too much striping can be avoided. With modern cpus scaling easily up to 24 hardware threads I see this as neccessary to fully utilize the hardware.
>>>>>>
>>>>>> Keep in mind the two things:
>>>>>> 1) It only works for cases where many filters are used, which is not
>>>>>> necessarily a common case
>>>>>
>>>>> Also, not quite. Even just decode/encode had a pipeline depth of 2 (the decoder could decode frame n+1 while the encoder encodes frame n). Every filter deepens this more...
>>>>
>>>> If you run encode and decode with multithreading, they already run in different threads.
>>>> So if you have only one filter, you should not have any gains at all from per-frame filter multithreading.
>>>
>>> So you are saying if I have
>>>
>>> decode->filter(1)->encode
>>>
>>> I will have n(decode) + n(sclices) + n(encode) threads? Even here I think a clever setup of data following threads would improve performance, but probably not pressingly so.
>>>
>>> Anyhow, frame multithreading within the filter graph would help us a lot, so I guess my question shifts towards how hard it would be to implement, and if I end up starting the endeavor if there is interest in merging this work.
>>
>> I am sure there is interest in principle, I just expect that it is quite some effort and not all that useful in the most common use-cases, which reduces chances of someone else working on it.
>> Of course in the case where filters only act on one single frame and independent of previous state an easy hack is possible (even purely application-side): just instantiate the same filter graph n times, each with its own thread, and push/pull round-robin.
>> However any more advanced filter (deshake, deinterlace, temporal denoise etc) will horribly break...
>> Also Stefano is the main avfilter maintainer, I don't really work on it at all, just commenting from the "sidelines".
>
> Hmm, that clone hack sounds nasty. I don't believe locally temporally dependent filters would break a (non-hacked) frame multithreading, since at some point they have to produce a frame, which the next stage can work on, albeit only after having consumed n frames, n being their range of dependency (or even 1 if they are implemented recursively). Well, I will have a look at the code...
More information about the ffmpeg-devel
mailing list