[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

Anton Khirnov anton at khirnov.net
Fri Dec 4 15:00:15 EET 2020

Quoting Alan Kelly (2020-11-19 09:41:56)
> ---
>  All of Henrik's suggestions have been implemented. Additionally,
>  m3 and m6 are permuted in avx2 before storing to ensure bit by bit
>  identical results in avx2.
>  libswscale/x86/Makefile     |   1 +
>  libswscale/x86/swscale.c    |  75 +++--------------------
>  libswscale/x86/yuv2yuvX.asm | 118 ++++++++++++++++++++++++++++++++++++
>  3 files changed, 129 insertions(+), 65 deletions(-)
>  create mode 100644 libswscale/x86/yuv2yuvX.asm

Is this function tested by FATE?
I did some brief testing and apparently it gets called during
fate-filter-shuffleplanes-dup-luma, but the results do not change even
if I comment out the whole function.

Also, it seems like you are adding an AVX2 version of the function, but
I don't see it being used.

Anton Khirnov

More information about the ffmpeg-devel mailing list