[FFmpeg-devel] [PATCH] VP8 V simple loopfilter in MMX/MMX2/SSE2
Pascal Massimino
pascal.massimino
Thu Jul 1 18:41:35 CEST 2010
On Thu, Jul 1, 2010 at 8:10 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Thu, Jul 1, 2010 at 10:46 AM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> > On Thu, Jul 1, 2010 at 10:32 AM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> >> see attached, my first try at doing a loopfilter in SIMD.
> >>
> >> C takes about 660 cycles for the main MB one of 1760 for the 3
> >> together in the splitmv case. MMX didn't really measure since it's
> >> only 1 instruction difference as per MMX2 (I just tested that they
> >> gave identical output). MMX2 takes 190/350 cycles, SSE2 takes 180/330
> >> cycles (which is weird, should be faster, but who knows what my crappy
> >> CPU is doing, this machine is 5 yrs old - Intel Core Duo 2GHz on a
> >> MacBook Pro).
> >
> > Now with vp8dsp-init.c changes also.
>
> Now with proper alignment for constants, thanks to Vitor for noticing.
>
+ mova m0, [pb_80]
+ pxor m2, m0
+ pxor m4, m0
+ psubsb m2, m4 ; m2=p1-q1 (signed) backup for below
+ pand m3, [pb_FE]
+ psrlq m3, 1 ; m3=FFABS(p1-q1)/2, this can be used signed
i think you can avoid loading pb_FE by re-using pb_80 as:
+ mova m0, [pb_80]
+ pxor m2, m0
+ pxor m4, m0
+ psubsb m2, m4 ; m2=p1-q1 (signed) backup for below
+ psrlq m3, 1 ; m3=FFABS(p1-q1)/2, this can be used signed
+ pandn m0
also: have you considered using something along the lines of :
pxor m0,m0
pavbg(m3, m0)
for computing fabs(p1-q1)/2 ?
> Ronald
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list