[FFmpeg-devel] [PATCH 2/2] vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd.

Ronald S. Bultje rsbultje at gmail.com
Wed Sep 30 19:26:32 CEST 2015


Hi,

thanks for review!

On Wed, Sep 30, 2015 at 11:54 AM, Henrik Gramner <henrik at gramner.com> wrote:

> On Fri, Sep 25, 2015 at 5:25 PM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> > diff --git a/libavcodec/x86/vp9dsp_init_16bpp_template.c
> b/libavcodec/x86/vp9dsp_init_16bpp_template.c
>
> > +#define init_lpf_8_func(idx1, idx2, dir, wd, bpp, opt) \
> > +    dsp->loop_filter_8[idx1][idx2] =
> ff_loop_filter_##dir##_##wd##_##bpp##_##opt
> > +#define init_lpf_16_func(idx, dir, bpp, opt) \
> > +    dsp->loop_filter_16[idx] = loop_filter_##dir##_16_##bpp##_##opt
> > +#define init_lpf_mix2_func(idx1, idx2, idx3, dir, wd1, wd2, bpp, opt) \
> > +    dsp->loop_filter_mix2[idx1][idx2][idx3] =
> loop_filter_##dir##_##wd1##wd2##_##bpp##_##opt
>
> Are those defines required for macro expansion? If so, ok, otherwise
> I'd drop them since they don't really simplify anything.
>

We need double macros for expansion of "BPC", yes. :(

> diff --git a/libavcodec/x86/vp9lpf_16bpp.asm
> b/libavcodec/x86/vp9lpf_16bpp.asm
>
> > +pw_4095: times 16 dw 4095
>
> This one already exists in vp9mc_16bpp.asm, merge them.


Oh right that was in the wrong patch (next one), moved here instead, sorry.

> +%macro FILTER_STEP 6-10 "", "", "", 0 ; tmp, reg, mask, shift, dst, \
> > +                                      ; src/sub1, sub2, add1, add2,
> dont_store
> > +    psrlw               %1, %2, %4
> > +%ifnidn %7, ""
> > +    psubw               %2, %6
> > +%endif
> > +    psubw               %1, %6                      ; abs->delta
> > +%ifnidn %7, ""
> > +    psubw               %2, %7
> > +%endif
> > +    pand                %1, reg_%3                  ; apply mask
> > +%ifnidn %7, ""
> > +    paddw               %2, %8
> > +%endif
> > +%if %10 == 1
> > +    paddw               %6, %1                      ; delta->abs
> > +%else
> > +    paddw               %1, %6                      ; delta->abs
> > +%endif
> > +%ifnidn %7, ""
> > +    paddw               %2, %9
> > +%endif
> > +%if %10 != 1
> > +    mova              [%5], %1
> > +%endif
> > +%endmacro
>
> Is there a reason for not merging most of those %ifs to make it more
> readable?
>

Pairing. I can remove if you don't like it.

> +    psllw               m5, 1
> [...]
> > +    psllw               m5, m3, 1
> [...]
> > +    psllw               m5, m1, 1
>
> Use paddw instead of left-shifting by 1.
>

Done.

Otherwise LGTM. Also seems like AVX2 would indeed be quite useful here
> if you feel like doing that.


Anyone want to donate me a machine? :D

Seriously, I've tried to write it such that avx2 should be as simple as
adding another macro invocation, I can't exactly guarantee that it works,
but I'm hoping it does. I'll try it on Kieran's new machine, but that would
be a separate patch. No promises yet, trying to finish ssse3 coverage first
;)

Ronald


More information about the ffmpeg-devel mailing list