[Ffmpeg-devel] Re: about mmx instructions

Michael Niedermayer michaelni
Thu Sep 1 16:07:29 CEST 2005


On Wed, Aug 31, 2005 at 07:26:17PM +0200, thomas.kunlin at free.fr wrote:
> Hello,
> I am a Phd student working on H.264, i have a question concerning
> a calculation found in the loop filter implementation of ffmpeg.
> In the H264_DEBLOCK_P0_Q0 macro (h264dsp_mmx.c) :
>   delta = (q0-p0+((p1-q1)>>2)+1)>>1
> is obtained by the following calculation :
>   delta = e-f , or: -delta=f-e
> where :
>   f = ((p0+(q1>>2)+1)>>1) + (d&~a)
>   e = ((q0+(p1>>2)+1)>>1) + (d&a)
>   d = (c^b)&~(b^a)^1

typo, it should be d = (c^b)&~(b^a)&1

>   c = q0^(p1>>2)
>   b = p0^(q1>>2)
>   a = p0^q0^((p1-q1)>>2)
> I have had a bad time trying to understand how this does the trick.
> Could you give me some explanations/pointers about the creation of such a
> magical formula :-) ? 

pointers, hmm ffmpegs source & http://www.aggregate.org/MAGIC/
explanation, ok thats easier :)

the reason why it cant be calculated with the trivial 
is that the intermediates and the result would not fit within 8bit, and
doing it in 16bit would be half the speed + converting 8<->16bit
so we first need to decide how to represent the result in 8bit
e-f with both e and f unsigned 8bit integers seems like a obvious choice
at least when reading the code maybe not before writing it though :)
as the inputs are also unsigned 8bit e and f should be
f = ((p0+(q1>>2)+1)>>1) 
e = ((q0+(p1>>2)+1)>>1)
ignoring rounding of the >> operations ...
so the only thing left is to fix the least significant bits

a = p0^q0^((p1-q1)>>2)
gives the correct least significant bit before the +1)>>1

c = q0^(p1>>2) 
b = p0^(q1>>2)
produces the incorrect least significant bit before the +1)>>1 which is 
used in the calculation of e and f

now, how do the rounding differences look/behave
(p1>>2)-(q1>>2) - 1 == ((p1-q1)>>2) iff (p1&3) < (q1&3) otherwise they are
equal ((p1>>2)-(q1>>2) == (p1-q1)>>2)
so we could fix this part by changing f to
f = ((p0+(q1>>2)+1-X)>>1) where X is 1 iff (p1&3) < (q1&3)

can we detect this case from the LSB bit from a,b,c?
yes,  X= (a^c^b)&1

and ((A+1)>>1) - ((B+1)>>1) + 1 == (A-B+1)>>1 iff B&1=1 and A&1=0
otherwise they are equal 

and ((A+1)>>1) - ((B  )>>1) - 1 == (A-B+1)>>1 iff B&1=1 and A&1=1
otherwise they are equal
and b&1 == (B-1)&1 -> we must subtract 1 iff b&1=0 and c&1=1

so  for the X=0 case we need to correct by    (b&~c)&1
and for the X=1 case we need to correct by  -((~b&c)&1)
-> X = ~a if we limit ourselfs to the case where correction is needed

f = ((p0+(q1>>2)+1)>>1) + (d&~a)
e = ((q0+(p1>>2)+1)>>1) + (d&a)
d = (c^b)&~(b^a)^1

should be obvious based upon the above

btw, iam CCing this to ffmpeg-dev as it might be interresting for others

anyone got a nicer derivation/proof?
or even a faster implementation?


More information about the ffmpeg-devel mailing list