[FFmpeg-devel] h264 multithreading, chapter 2
Wed Sep 26 14:14:24 CEST 2007
On Tue, Sep 25, 2007 at 10:44:19PM +0200, Andreas ?man wrote:
> Even though I previously dismissed speeding up h264 decoding
> by splitting entropy decoding from the rest in two threads I
> decided to take a quick shot at it.
> The attach patch gives ~20-30% speedup on single sliced CABAC content.
> Basically, the code decodes up to 128 macroblocks in one
> thread while doing prediction+idct+deblock of the previously decoded
> 128 blocks in another thread.
> 20-30% is not exceptionally much, I would have expected a bit more.
> If you have a look in the patch you'll notice that it brutally
> copies the relevant fields back and forth from the H264Context.
> I tried embedding the H264mb into H264Context so i could reduce
> it into one single memcpy(), this does not make any difference
> (expect that it makes the patch about 3000 lines longer :-)
> I also tried passing an additional pointer around to all functions
> to entirely get rid of the memcpy()ing. This does not make any
> significant change either (And I know from my first attempts with
> slice-level multi threading that this slows down the single
> threaded case).
> All this kind of makes sense (i think) if you ponder that the data needs
> to be transfered from CPU1's cache to CPU2's cache (if it's even still
> around in CPU1's cache).
> Even with some type of shared cache mechanism (I'm not actually
> sure how core2 duo does this) the probability that the
> data from the previous 128 macroblocks still is in place
> is probably minimal.
> I tried lowering the number of macroblocks processed per round,
> but the overhead of thread synchronization quickly defeats any
> gain (If you only do one macroblock per round, it's about 7-8
> times as slow :-)
> well well...
> It is (obviously) not intended/ready for any formal review.
> MBAFF support is lacking, there is some code-duplication and
> other ugliness.
> But rather before I spend any more time on it I'd like to know
> if people think it is worth finishing.
> If we in the future also add frame based parallelism we would
> end up with three different techniques, (== bloated == hard to
> maintain), then again, perhaps one of them can be removed
> when we've got the full picture. I dunno...
you already summarized everything ...
i hope we will have frame based prallelism soon, when will you implement
after that we can decide if this makes sense or not ... we could decide
before but that would be a decission based on guessing which isnt good
we might end up wasting time cleaning and reviewing code and likely
even fixing bugs and then just find out theres no speed gain from it
relative to frame based ...
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel