[Ffmpeg-devel] [PATCH] H.264 deblocking mmx

Skal skal
Thu Apr 28 22:08:01 CEST 2005


	Hi,

On Mon, 2005-04-25 at 00:39, Loren Merritt wrote:
> I noticed that the inloop deblocking filter was taking a large fraction of 
> the decode time, and it is inherently parallel, so...

	just some remarks about the patch:

	a) chroma deblocking filters 4 pixels at a time, whereas
	it seems to me only 2 chroma pixels share the same
	strength (deduced from co-located the 4x4 luma block contents).
	And even, for MBAff, you sometimes have to filter only 1
	vertical chroma sample (in case of Field->Frame or Frame->Field
	vertical filtering) at a time.

	b) the ASM code is computing the ABS(a-b) value, and afterward
	compares it to Alpha/Beta. It uses 16bits words.
	But in fact, only the result of the test (not the abs value itself)
	matters. And could be advantageously be computed using
	unsigned 8bits values only, since it would both avoid a 8b->16b 
	conversion, and allow testing the lower and upper bound
	of ABS(a-b) in one shot.

	Here's an example for the test ABS(P0-Q0)<Alpha, using 8b only:

	  input: mm7 = Alpha value, 8bits, replicated 8 times

	  movd      mm0, [Q0]		 ; four pixels 'Q0' in lower 32bits
	  punpckldq mm0, [P0]            ; four pixels 'P0' in  higher 32bits

          pshufw  mm1, mm0, 01001110b    ;    P0       | Q0       (Swap P0 and Q0)
          paddusb mm0, mm7               ;    Q0+Alpha | P0+Alpha
          psubusb mm0, mm1               ; Q0+Alpha-P0 | P0+Alpha-Q0

        At this point: mm0 contains zeros in the lower 32bits if P0>=Q0+Alpha, 
	and zeros in the higher 32bits if Q0>=P0+Alpha.

        Note: you can repeat/pair the above 3 instructions for the other tests
        (ABS(P1-P0)<Beta, etc...), and accumulate the results in mm0 using
	a 'por' instruction...

        In the end, when one wants the final result:

	  pminub mm0, [One]	     ; mask is now made of '0' or '1'. [One] is 1, replicated 8 times
          pshufw mm1, mm0, 01001110b ; Swap the hi/lo 32 bits
          pxor   mm0, [One]          ; flip the bits
          pand   mm1, mm0            ; => the higher 4 bytes of mm1 tell whether the pixels should be filtered or not.

        Hope it helps.
-Skal


Before you ask: why don't i supply a patch for that? Simply because i'm very dislike inlined ASM code. 
I can hardly read it, let alone write some. But fortunately, Michael is around here ;)






More information about the ffmpeg-devel mailing list