# [Ffmpeg-devel] Re: about mmx instructions

Michael Niedermayer michaelni
Thu Sep 1 16:07:29 CEST 2005

```Hi

On Wed, Aug 31, 2005 at 07:26:17PM +0200, thomas.kunlin at free.fr wrote:
> Hello,
>
> I am a Phd student working on H.264, i have a question concerning
> a calculation found in the loop filter implementation of ffmpeg.
> In the H264_DEBLOCK_P0_Q0 macro (h264dsp_mmx.c) :
>   delta = (q0-p0+((p1-q1)>>2)+1)>>1
> is obtained by the following calculation :
>   delta = e-f , or: -delta=f-e
> where :
>   f = ((p0+(q1>>2)+1)>>1) + (d&~a)
>   e = ((q0+(p1>>2)+1)>>1) + (d&a)
>   d = (c^b)&~(b^a)^1

typo, it should be d = (c^b)&~(b^a)&1

>   c = q0^(p1>>2)
>   b = p0^(q1>>2)
>   a = p0^q0^((p1-q1)>>2)
>
> I have had a bad time trying to understand how this does the trick.
> Could you give me some explanations/pointers about the creation of such a
> magical formula :-) ?

pointers, hmm ffmpegs source & http://www.aggregate.org/MAGIC/
explanation, ok thats easier :)

the reason why it cant be calculated with the trivial
(q0-p0+((p1-q1)>>2)+1)>>1
is that the intermediates and the result would not fit within 8bit, and
doing it in 16bit would be half the speed + converting 8<->16bit
so we first need to decide how to represent the result in 8bit
e-f with both e and f unsigned 8bit integers seems like a obvious choice
at least when reading the code maybe not before writing it though :)
as the inputs are also unsigned 8bit e and f should be
f = ((p0+(q1>>2)+1)>>1)
e = ((q0+(p1>>2)+1)>>1)
ignoring rounding of the >> operations ...
so the only thing left is to fix the least significant bits

a = p0^q0^((p1-q1)>>2)
gives the correct least significant bit before the +1)>>1

c = q0^(p1>>2)
b = p0^(q1>>2)
produces the incorrect least significant bit before the +1)>>1 which is
used in the calculation of e and f

now, how do the rounding differences look/behave
(p1>>2)-(q1>>2) - 1 == ((p1-q1)>>2) iff (p1&3) < (q1&3) otherwise they are
equal ((p1>>2)-(q1>>2) == (p1-q1)>>2)
so we could fix this part by changing f to
f = ((p0+(q1>>2)+1-X)>>1) where X is 1 iff (p1&3) < (q1&3)

can we detect this case from the LSB bit from a,b,c?
yes,  X= (a^c^b)&1

and ((A+1)>>1) - ((B+1)>>1) + 1 == (A-B+1)>>1 iff B&1=1 and A&1=0
otherwise they are equal

and ((A+1)>>1) - ((B  )>>1) - 1 == (A-B+1)>>1 iff B&1=1 and A&1=1
otherwise they are equal
and b&1 == (B-1)&1 -> we must subtract 1 iff b&1=0 and c&1=1

so  for the X=0 case we need to correct by    (b&~c)&1
and for the X=1 case we need to correct by  -((~b&c)&1)
-> X = ~a if we limit ourselfs to the case where correction is needed

f = ((p0+(q1>>2)+1)>>1) + (d&~a)
e = ((q0+(p1>>2)+1)>>1) + (d&a)
d = (c^b)&~(b^a)^1

should be obvious based upon the above

btw, iam CCing this to ffmpeg-dev as it might be interresting for others
too

anyone got a nicer derivation/proof?
or even a faster implementation?

[...]
--
Michael

```