[FFmpeg-devel] [PATCH 06/10] lavu/aes: add x86 AESNI optimizations

Henrik Gramner henrik at gramner.com
Tue Oct 13 17:42:43 CEST 2015


On Tue, Oct 13, 2015 at 2:33 AM, Rodger Combs <rodger.combs at gmail.com> wrote:
> +%macro AES_CRYPT 1
> +%if %1 == 1
> +%define CRYPT aesdec
> +%define LAST  aesdeclast
> +cglobal aes_decrypt, 6,6,2
> +%else
> +%define CRYPT aesenc
> +%define LAST  aesenclast
> +cglobal aes_encrypt, 6,6,2
> +%endif
> +    pxor xm1, xm1
> +    shl r5d, 4
> +    sub r5, 0x60
> +    test r4, r4
> +    je .block
> +    movdqu xm1, [r4] ; iv
> +.block:
> +    movdqu xm0, [r2] ; state
> +%if %1 == 0
> +    pxor xm0, xm1
> +%endif
> +    pxor  xm0, [r0 + r5 + 0x60]
> +    CRYPT xm0, [r0 + r5 + 0x50]
> +    CRYPT xm0, [r0 + r5 + 0x40]
> +    CRYPT xm0, [r0 + r5 + 0x30]
> +    CRYPT xm0, [r0 + r5 + 0x20]
> +    CRYPT xm0, [r0 + r5 + 0x10]
> +    CRYPT xm0, [r0 + r5 + 0x00]
> +    CRYPT xm0, [r0 + r5 - 0x10]
> +    CRYPT xm0, [r0 + r5 - 0x20]
> +    CRYPT xm0, [r0 + r5 - 0x30]
> +    cmp r5, 0x60
> +    jl .last
> +    CRYPT xm0, [r0 + r5 - 0x40]
> +    CRYPT xm0, [r0 + r5 - 0x50]
> +    cmp r5, 0x80
> +    jl .last
> +    CRYPT xm0, [r0 + 0x20]
> +    CRYPT xm0, [r0 + 0x10]
> +.last:
> +    LAST xm0, [r0]
> +    test r4, r4
> +    je .noiv
> +%if %1 == 1
> +    pxor xm0, xm1
> +    movdqu xm1, [r2]
> +%else
> +    movdqa xm1, xm0
> +%endif
> +.noiv
> +    movdqu [r1], xm0
> +    dec r3d
> +    add r2, 16
> +    add r1, 16
> +    test r3d, r3d
> +    jne .block
> +%if %1 == 0
> +    test r4, r4
> +    je .ret
> +    movdqu [r4], xm0
> +.ret:
> +%endif
> +    REP_RET
> +%endmacro

If you use enc and dec as macro arguments instead of 0 and 1 you could
get rid of some of the if:s. E.g. using cglobal aes_%1rypt and the
instructions aes%1 and aes%1last instead of the CRYPT/LAST macros.
"%if %1 == 0" can be replaced by "%ifidn %1, enc" as well (which is
also more clear).

Use m# instead of xm# since you're not dealing with mixing xmm and ymm
registers.

Use mova instead of movdqa and movu instead of movdqu.

Vertically align the lines on the first comma.

You can also adjust the r5 offset to make the "cmp r5, 0x80" fit in a
signed byte as well, but that's pretty minor.


More information about the ffmpeg-devel mailing list