[FFmpeg-devel] [PATCH 1/2] x86: move horizonal add macros to x86util

Ronald S. Bultje rsbultje at gmail.com
Sat Apr 12 01:14:33 CEST 2014


Hi

On Fri, Apr 11, 2014 at 7:00 PM, James Almer <jamrial at gmail.com> wrote:

> Also port relevant AVX2/XOP optimizations from x264
>

Did you get permission from them to relicense to LGPL? I know it's trivial
code but really, but better safe than sorry.


> +%macro HADDD 2 ; sum junk
> +%if sizeof%1 == 32
> +%define %2 xmm%2
> +    vextracti128 %2, %1, 1
> +%define %1 xmm%1
> +    paddd   %1, %2
> +%endif
> +%if mmsize >= 16
> +%if cpuflag(xop) && sizeof%1 == 16
> +    vphadddq %1, %1
> +%endif
> +    movhlps %2, %1
> +    paddd   %1, %2
> +%endif
> +%if notcpuflag(xop)
> +    PSHUFLW %2, %1, q0032
> +    paddd   %1, %2
> +%endif
> +%undef %1
> +%undef %2
> +%endmacro
> +
> +%macro HADDW 2 ; reg, tmp
> +%if cpuflag(xop) && sizeof%1 == 16
> +    vphaddwq  %1, %1
> +    movhlps   %2, %1
> +    paddd     %1, %2
> +%else
> +    pmaddwd %1, [pw_1]
> +    HADDD   %1, %2
> +%endif
> +%endmacro


So, these require some comments on what they do - the naming is terrible.
It suggests that they act like phaddw/d, but they actually just act on the
lower half of the output register (or the full half of one, rather than
both, input registers). You probably want to make that explicit in a
command, maybe even rename just to prevent the obvious confusion.

Ronald


More information about the ffmpeg-devel mailing list