[FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

Henrik Gramner henrik at gramner.com
Mon Jun 26 03:24:40 EEST 2017

On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev <ikalvachev at gmail.com> wrote:
> +%define HADDPS_IS_FAST 0
> +%define PHADDD_IS_FAST 0
> +        haddps      %1,   %1
> +        haddps      %1,   %1
> +       phaddd       xmm%1,xmm%1
> +       phaddd       xmm%1,xmm%1

You can safely assume that those instructions are always slow and that
this is virtually never the correct way to use them, so just use the
shuffle + add method.

You can unconditionally use non-destructive 3-arg instructions
(without v-prefix) in non AVX-code to reduce ifdeffery. The x86inc
abstraction layer will automatically insert register-register moves as

I'm a bit doubtful if it's worth the complexity to emulate 256-bit
integer math using floating-point instruction hacks, especially since
that's only relevant on two 5+ year old Intel µarchs (SNB & IVB). It's
probably fine to simply require AVX2 if you need 256-bit integer SIMD.

Be aware that most SSE SIMD instructions are actually implemented as
x86inc macros and redefining them can have unexpected consequences and
is therefore discouraged.

More information about the ffmpeg-devel mailing list