[FFmpeg-devel] [PATCH v2] swscale/aarch64: dotprod implementation of rgba32_to_Y

Martin Storsjö martin at martin.st
Tue Mar 4 10:27:42 EET 2025


On Mon, 3 Mar 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:

> The idea is to split the 16 bit coefficients into lower and upper half,
> invoke udot for the lower half, shift by 8, and follow by udot for the
> upper half.
>
> Benchmark on A78:
> bgra_to_y_128_c:                                       682.0 ( 1.00x)
> bgra_to_y_128_neon:                                    181.2 ( 3.76x)
> bgra_to_y_128_dotprod:                                 117.8 ( 5.79x)
> bgra_to_y_1080_c:                                     5742.5 ( 1.00x)
> bgra_to_y_1080_neon:                                  1472.5 ( 3.90x)
> bgra_to_y_1080_dotprod:                                906.5 ( 6.33x)
> bgra_to_y_1920_c:                                    10194.0 ( 1.00x)
> bgra_to_y_1920_neon:                                  2589.8 ( 3.94x)
> bgra_to_y_1920_dotprod:                               1573.8 ( 6.48x)
> ---
> libswscale/aarch64/input.S   | 88 ++++++++++++++++++++++++++++++++++++
> libswscale/aarch64/swscale.c | 17 +++++++
> 2 files changed, 105 insertions(+)

LGTM, thanks, I pushed this one now.

// Martin



More information about the ffmpeg-devel mailing list