[FFmpeg-devel] [PATCH 2/2] vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd.
Ronald S. Bultje
rsbultje at gmail.com
Wed Sep 30 19:26:32 CEST 2015
Hi,
thanks for review!
On Wed, Sep 30, 2015 at 11:54 AM, Henrik Gramner <henrik at gramner.com> wrote:
> On Fri, Sep 25, 2015 at 5:25 PM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> > diff --git a/libavcodec/x86/vp9dsp_init_16bpp_template.c
> b/libavcodec/x86/vp9dsp_init_16bpp_template.c
>
> > +#define init_lpf_8_func(idx1, idx2, dir, wd, bpp, opt) \
> > + dsp->loop_filter_8[idx1][idx2] =
> ff_loop_filter_##dir##_##wd##_##bpp##_##opt
> > +#define init_lpf_16_func(idx, dir, bpp, opt) \
> > + dsp->loop_filter_16[idx] = loop_filter_##dir##_16_##bpp##_##opt
> > +#define init_lpf_mix2_func(idx1, idx2, idx3, dir, wd1, wd2, bpp, opt) \
> > + dsp->loop_filter_mix2[idx1][idx2][idx3] =
> loop_filter_##dir##_##wd1##wd2##_##bpp##_##opt
>
> Are those defines required for macro expansion? If so, ok, otherwise
> I'd drop them since they don't really simplify anything.
>
We need double macros for expansion of "BPC", yes. :(
> diff --git a/libavcodec/x86/vp9lpf_16bpp.asm
> b/libavcodec/x86/vp9lpf_16bpp.asm
>
> > +pw_4095: times 16 dw 4095
>
> This one already exists in vp9mc_16bpp.asm, merge them.
Oh right that was in the wrong patch (next one), moved here instead, sorry.
> +%macro FILTER_STEP 6-10 "", "", "", 0 ; tmp, reg, mask, shift, dst, \
> > + ; src/sub1, sub2, add1, add2,
> dont_store
> > + psrlw %1, %2, %4
> > +%ifnidn %7, ""
> > + psubw %2, %6
> > +%endif
> > + psubw %1, %6 ; abs->delta
> > +%ifnidn %7, ""
> > + psubw %2, %7
> > +%endif
> > + pand %1, reg_%3 ; apply mask
> > +%ifnidn %7, ""
> > + paddw %2, %8
> > +%endif
> > +%if %10 == 1
> > + paddw %6, %1 ; delta->abs
> > +%else
> > + paddw %1, %6 ; delta->abs
> > +%endif
> > +%ifnidn %7, ""
> > + paddw %2, %9
> > +%endif
> > +%if %10 != 1
> > + mova [%5], %1
> > +%endif
> > +%endmacro
>
> Is there a reason for not merging most of those %ifs to make it more
> readable?
>
Pairing. I can remove if you don't like it.
> + psllw m5, 1
> [...]
> > + psllw m5, m3, 1
> [...]
> > + psllw m5, m1, 1
>
> Use paddw instead of left-shifting by 1.
>
Done.
Otherwise LGTM. Also seems like AVX2 would indeed be quite useful here
> if you feel like doing that.
Anyone want to donate me a machine? :D
Seriously, I've tried to write it such that avx2 should be as simple as
adding another macro invocation, I can't exactly guarantee that it works,
but I'm hoping it does. I'll try it on Kieran's new machine, but that would
be a separate patch. No promises yet, trying to finish ssse3 coverage first
;)
Ronald
More information about the ffmpeg-devel
mailing list