[FFmpeg-devel] [PATCH] vp9: add 32x32 idct AVX2 implementation.

Henrik Gramner henrik at gramner.com
Sat Jul 16 12:55:24 EEST 2016


On Wed, Jul 13, 2016 at 6:37 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> +cglobal vp9_idct_idct_32x32_add, 4, 9, 16, 2048, dst, stride, block, eob
[...]
> +    movd               xm0, [blockq]
> +    mova                m1, [pw_11585x2]
> +    pmulhrsw            m0, m1
> +    pmulhrsw            m0, m1
> +    vpbroadcastw        m0, xm0
> +    pmulhrsw            m0, [pw_512]

The vpbroadcastw could be done from memory in the beginning which
would get rid of the movd.

Is it mathematically possible to merge consecutive pmulhrsw
instructions into a single one using a different constant? I'm
guessing no, but I'm not sure.

[...]

> +    ; at the end of the loop, m7 should still be zero
> +    ; use that to zero out block coefficients
> +    ZERO_BLOCK      blockq, 64, 16, m1

comment says m7, code says m1.

[...]

> +    ; at the end of the loop, m7 should still be zero
> +    ; use that to zero out block coefficients
> +    ZERO_BLOCK      blockq, 64, 32, m1

Ditto.


More information about the ffmpeg-devel mailing list