[FFmpeg-devel] h264 multithreading, chapter 2

Andreas Öman andreas
Tue Sep 25 22:44:19 CEST 2007


Even though I previously dismissed speeding up h264 decoding
by splitting entropy decoding from the rest in two threads I
decided to take a quick shot at it.

The attach patch gives ~20-30% speedup on single sliced CABAC content.
Basically, the code decodes up to 128 macroblocks in one
thread while doing prediction+idct+deblock of the previously decoded
128 blocks in another thread.

20-30% is not exceptionally much, I would have expected a bit more.

If you have a look in the patch you'll notice that it brutally
copies the relevant fields back and forth from the H264Context.

I tried embedding the H264mb into H264Context so i could reduce
it into one single memcpy(), this does not make any difference
(expect that it makes the patch about 3000 lines longer :-)

I also tried passing an additional pointer around to all functions
to entirely get rid of the memcpy()ing. This does not make any
significant change either (And I know from my first attempts with
slice-level multi threading that this slows down the single
threaded case).

All this kind of makes sense (i think) if you ponder that the data needs
to be transfered from CPU1's cache to CPU2's cache (if it's even still
around in CPU1's cache).
Even with some type of shared cache mechanism (I'm not actually
sure how core2 duo does this) the probability that the
data from the previous 128 macroblocks still is in place
is probably minimal.

I tried lowering the number of macroblocks processed per round,
but the overhead of thread synchronization quickly defeats any
gain (If you only do one macroblock per round, it's about 7-8
times as slow :-)

well well...

It is (obviously) not intended/ready for any formal review.
MBAFF support is lacking, there is some code-duplication and
other ugliness.
But rather before I spend any more time on it I'd like to know
if people think it is worth finishing.

If we in the future also add frame based parallelism we would
end up with three different techniques, (== bloated == hard to
maintain), then again, perhaps one of them can be removed
when we've got the full picture. I dunno...

Ideas, hints, test-results, flames are welcome..

-------------- next part --------------
A non-text attachment was scrubbed...
Name: h264-mbparallel.patch
Type: text/x-patch
Size: 10195 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070925/119379c1/attachment.bin>

More information about the ffmpeg-devel mailing list