[FFmpeg-devel] [PATCH] x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_sse2()
u at pkh.me
Tue Jan 28 13:24:34 CET 2014
On Tue, Jan 28, 2014 at 12:05:41PM +0100, Christophe Gisquet wrote:
> 2014-01-28 James Almer <jamrial at gmail.com>:
> > +%if cpuflag(ssse3)
> > mova m0, [mask_mix]
> > +%endif
> > movd m2, Id
> > movd m3, Ed
> > - pshufb m2, m0
> > - pshufb m3, m0
> > + SPLATB_MASK m2, m0
> > + SPLATB_MASK m3, m0
> Is there any gain in loading mask_mix into m0, in particular considering that:
The register was available, and iirc splat macros need the value in a
> > %endif
> > mova m0, [pb_80]
> > pxor m2, m0
> > @@ -456,7 +469,7 @@ SECTION .text
> > SPLATB_REG m7, H, m0 ; H H H H ...
> > %else
> > movd m7, Hd
> > - pshufb m7, [mask_mix]
> > + SPLATB_MASK m7, [mask_mix]
> > %endif
> It is not loaded here?
I couldn't keep the register available until then.
> I'm asking because I have noticed it sometimes (not in vp9 scope) does
> not matter, or is even 1 cycle faster.
In that particular case we need to use it twice, so we just avoid another
read. I admit I didn't bench, but that's probably not relevant.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 490 bytes
Desc: not available
More information about the ffmpeg-devel