[FFmpeg-devel] [PATCH] lavc/aarch64: Add neon implementation for pix_abs16_y2
Martin Storsjö
martin at martin.st
Thu Aug 4 11:08:01 EEST 2022
On Mon, 25 Jul 2022, Hubert Mazur wrote:
> Provide optimized implementation of pix_abs16_y2 function for arm64.
>
> Performance comparison tests are shown below.
> pix_abs_0_2_c: 308.5
> pix_abs_0_2_neon: 39.2
>
> Benchmarks and tests run with checkasm tool on AWS Graviton 3.
>
> Signed-off-by: Hubert Mazur <hum at semihalf.com>
> ---
> libavcodec/aarch64/me_cmp_init_aarch64.c | 3 +
> libavcodec/aarch64/me_cmp_neon.S | 73 ++++++++++++++++++++++++
> 2 files changed, 76 insertions(+)
Please do the same optimizations as done for pix_abs_xy2 in
b46de9aba436dea0cff76f3ed0f7c98448367fd0,
68a03f64240dcbe408c3fd43d1071a105508a588 and
4136405c86162063e45d40d55c9985f348d4ea0a for this function too
("aarch64: me_cmp: Interleave some of the loads in ff_pix_abs16_xy2_neon",
"aarch64: me_cmp: Switch from uabd to uabal in ff_pix_abs16_xy2_neon" and
"aarch64: me_cmp: Don't do uaddlv once per iteration").
// Martin
More information about the ffmpeg-devel
mailing list