[FFmpeg-devel] [PATCH v2] avcodec/aarch64/vvc: Optimize NEON version of vvc_dmvr

Martin Storsjö martin at martin.st
Tue Mar 4 10:36:08 EET 2025


On Mon, 3 Mar 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:

> This patch replaces blocks of instructions performing rounding and
> widening shifts with one-liners achieving the same result.
>
> Before and after on A78
> dmvr_8_12x20_neon:                                      86.2 ( 6.90x)
> dmvr_8_20x12_neon:                                      94.8 ( 5.93x)
> dmvr_8_20x20_neon:                                     141.5 ( 6.50x)
> dmvr_12_12x20_neon:                                    158.0 ( 3.76x)
> dmvr_12_20x12_neon:                                    151.2 ( 3.73x)
> dmvr_12_20x20_neon:                                    247.2 ( 3.71x)
> dmvr_hv_8_12x20_neon:                                  423.2 ( 3.75x)
> dmvr_hv_8_20x12_neon:                                  434.0 ( 3.69x)
> dmvr_hv_8_20x20_neon:                                  706.0 ( 3.69x)
>
> dmvr_8_12x20_neon:                                      77.2 ( 7.70x)
> dmvr_8_20x12_neon:                                      66.5 ( 8.49x)
> dmvr_8_20x20_neon:                                      92.2 ( 9.90x)
> dmvr_12_12x20_neon:                                     80.2 ( 7.38x)
> dmvr_12_20x12_neon:                                     58.2 ( 9.59x)
> dmvr_12_20x20_neon:                                     90.0 (10.15x)
> dmvr_hv_8_12x20_neon:                                  369.0 ( 4.34x)
> dmvr_hv_8_20x12_neon:                                  355.8 ( 4.49x)
> dmvr_hv_8_20x20_neon:                                  574.2 ( 4.51x)
> ---
> libavcodec/aarch64/vvc/inter.S | 72 ++++++++++------------------------
> 1 file changed, 20 insertions(+), 52 deletions(-)

LGTM, pushed.

// Martin



More information about the ffmpeg-devel mailing list