[FFmpeg-devel] [PATCH v2] swscale/aarch64: dotprod implementation of rgba32_to_Y
Martin Storsjö
martin at martin.st
Tue Mar 4 10:27:42 EET 2025
On Mon, 3 Mar 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
> The idea is to split the 16 bit coefficients into lower and upper half,
> invoke udot for the lower half, shift by 8, and follow by udot for the
> upper half.
>
> Benchmark on A78:
> bgra_to_y_128_c: 682.0 ( 1.00x)
> bgra_to_y_128_neon: 181.2 ( 3.76x)
> bgra_to_y_128_dotprod: 117.8 ( 5.79x)
> bgra_to_y_1080_c: 5742.5 ( 1.00x)
> bgra_to_y_1080_neon: 1472.5 ( 3.90x)
> bgra_to_y_1080_dotprod: 906.5 ( 6.33x)
> bgra_to_y_1920_c: 10194.0 ( 1.00x)
> bgra_to_y_1920_neon: 2589.8 ( 3.94x)
> bgra_to_y_1920_dotprod: 1573.8 ( 6.48x)
> ---
> libswscale/aarch64/input.S | 88 ++++++++++++++++++++++++++++++++++++
> libswscale/aarch64/swscale.c | 17 +++++++
> 2 files changed, 105 insertions(+)
LGTM, thanks, I pushed this one now.
// Martin
More information about the ffmpeg-devel
mailing list