[FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

Henrik Gramner henrik at gramner.com
Mon Jun 26 03:24:40 EEST 2017

On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev <ikalvachev at gmail.com> wrote:
> +%define HADDPS_IS_FAST 0
> +%define PHADDD_IS_FAST 0
> +        haddps      %1,   %1
> +        haddps      %1,   %1
> +       phaddd       xmm%1,xmm%1
> +       phaddd       xmm%1,xmm%1

You can safely assume that those instructions are always slow and that
this is virtually never the correct way to use them, so just use the
shuffle + add method.

You can unconditionally use non-destructive 3-arg instructions
(without v-prefix) in non AVX-code to reduce ifdeffery. The x86inc
abstraction layer will automatically insert register-register moves as

I'm a bit doubtful if it's worth the complexity to emulate 256-bit
integer math using floating-point instruction hacks, especially since
that's only relevant on two 5+ year old Intel µarchs (SNB & IVB). It's
probably fine to simply require AVX2 if you need 256-bit integer SIMD.

Be aware that most SSE SIMD instructions are actually implemented as
x86inc macros and redefining them can have unexpected consequences and
is therefore discouraged.

