[FFmpeg-devel] [PATCH v2] avcodec/aarch64/vvc: Optimize NEON version of vvc_dmvr
Martin Storsjö
martin at martin.st
Tue Mar 4 10:36:08 EET 2025
On Mon, 3 Mar 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
> This patch replaces blocks of instructions performing rounding and
> widening shifts with one-liners achieving the same result.
>
> Before and after on A78
> dmvr_8_12x20_neon: 86.2 ( 6.90x)
> dmvr_8_20x12_neon: 94.8 ( 5.93x)
> dmvr_8_20x20_neon: 141.5 ( 6.50x)
> dmvr_12_12x20_neon: 158.0 ( 3.76x)
> dmvr_12_20x12_neon: 151.2 ( 3.73x)
> dmvr_12_20x20_neon: 247.2 ( 3.71x)
> dmvr_hv_8_12x20_neon: 423.2 ( 3.75x)
> dmvr_hv_8_20x12_neon: 434.0 ( 3.69x)
> dmvr_hv_8_20x20_neon: 706.0 ( 3.69x)
>
> dmvr_8_12x20_neon: 77.2 ( 7.70x)
> dmvr_8_20x12_neon: 66.5 ( 8.49x)
> dmvr_8_20x20_neon: 92.2 ( 9.90x)
> dmvr_12_12x20_neon: 80.2 ( 7.38x)
> dmvr_12_20x12_neon: 58.2 ( 9.59x)
> dmvr_12_20x20_neon: 90.0 (10.15x)
> dmvr_hv_8_12x20_neon: 369.0 ( 4.34x)
> dmvr_hv_8_20x12_neon: 355.8 ( 4.49x)
> dmvr_hv_8_20x20_neon: 574.2 ( 4.51x)
> ---
> libavcodec/aarch64/vvc/inter.S | 72 ++++++++++------------------------
> 1 file changed, 20 insertions(+), 52 deletions(-)
LGTM, pushed.
// Martin
More information about the ffmpeg-devel
mailing list