[FFmpeg-devel] [PATCH] VP8 V simple loopfilter in MMX/MMX2/SSE2

Pascal Massimino pascal.massimino
Thu Jul 1 18:41:35 CEST 2010


On Thu, Jul 1, 2010 at 8:10 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:

> Hi,
>
> On Thu, Jul 1, 2010 at 10:46 AM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> > On Thu, Jul 1, 2010 at 10:32 AM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> >> see attached, my first try at doing a loopfilter in SIMD.
> >>
> >> C takes about 660 cycles for the main MB one of 1760 for the 3
> >> together in the splitmv case. MMX didn't really measure since it's
> >> only 1 instruction difference as per MMX2 (I just tested that they
> >> gave identical output). MMX2 takes 190/350 cycles, SSE2 takes 180/330
> >> cycles (which is weird, should be faster, but who knows what my crappy
> >> CPU is doing, this machine is 5 yrs old - Intel Core Duo 2GHz on a
> >> MacBook Pro).
> >
> > Now with vp8dsp-init.c changes also.
>
> Now with proper alignment for constants, thanks to Vitor for noticing.
>

+    mova      m0, [pb_80]
+    pxor      m2, m0
+    pxor      m4, m0
+    psubsb    m2, m4        ; m2=p1-q1 (signed) backup for below
+    pand      m3, [pb_FE]
+    psrlq     m3, 1         ; m3=FFABS(p1-q1)/2, this can be used signed

i think you can avoid loading pb_FE by re-using pb_80 as:

+    mova      m0, [pb_80]
+    pxor      m2, m0
+    pxor      m4, m0
+    psubsb    m2, m4        ; m2=p1-q1 (signed) backup for below
+    psrlq     m3, 1         ; m3=FFABS(p1-q1)/2, this can be used signed
+   pandn m0

also: have you considered using something along the lines of :

pxor m0,m0
pavbg(m3, m0)

for computing fabs(p1-q1)/2 ?




> Ronald
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>



More information about the ffmpeg-devel mailing list