[FFmpeg-devel] Once again: Multithreaded H.264 decoding with ffmpeg?
Fri May 30 07:36:11 CEST 2008
On Thu, May 29, 2008 at 3:16 PM, Siegmar Buss <lists at siegmar-buss.de> wrote:
> I am a newbie to ffmpeg development and I know that I am not the first
> person thinking about a multithreaded H.264 decoder.
This has been discussed several times, please check the archives.
> Is anyone out there currently working on parallelizing the macroblock
> level? My incomplete comprehension suggests that there are several ways
> to do this. Or is this idea a stupid one?
I believe there was a post about a month back about someone attempting
frame level parallelisation.
> Could it be realisitic to achieve lets say a speedup between 30% and 50%
> on a modern dual core CPU or are there limitations that make this a
> dream? What about using NVIDIA's CUDA or AMD's stream computing to make
> use of modern GPUs (given that the "native" acceleration APIs are not
> available under Linux)?
There is definitively room for 50% decode speed improvement in
ffmpeg's h264 decoder but it would take some work. CUDA et al have
been discussed in the past (again see archives) but the kist of it is
that while very fast, the latency can be quite significant so you
would have to do some fancy coding to hide that. The h264 acceleration
'featured' by most modern GPUs is actually a HW decoder and they are
generally limited to Level 4.1 decode and make use of poorly designed
MS Acceleration APIs.
> I have been looking into the h264 code and each piece of H.264
> documentation I could get my hands on. And I have the impression that
> some of the decoding steps (namely residual decoding, deblocking) could
> be parallelized quite well. But I don't have any idea how much time the
> individual decoding steps take. Does someone happen to have some
> numbers? Or a hint how to measure this myself?
The best way would be to compile with oprofile support under linux and
measure it yourself.
> Would it e.g. make any sense to parallelize the deblocking filter?
I haven't looked at that myself, but it seems like something that
could be done but I think the biggest single gain would be
implementing frame-level parallelisation.
More information about the ffmpeg-devel