[FFmpeg-devel] [PATCH 2/2] avcodec/aarch64/vvc: Use rounding shift NEON instruction

Martin Storsjö martin at martin.st
Sun Mar 2 00:34:57 EET 2025


On Wed, 19 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:

> ---
>
> Before and after on A78
>
> dmvr_8_12x20_neon:                                      86.2 ( 6.90x)
> dmvr_8_20x12_neon:                                      94.8 ( 5.93x)
> dmvr_8_20x20_neon:                                     141.5 ( 6.50x)
> dmvr_12_12x20_neon:                                    158.0 ( 3.76x)
> dmvr_12_20x12_neon:                                    151.2 ( 3.73x)
> dmvr_12_20x20_neon:                                    247.2 ( 3.71x)
> dmvr_hv_8_12x20_neon:                                  423.2 ( 3.75x)
> dmvr_hv_8_20x12_neon:                                  434.0 ( 3.69x)
> dmvr_hv_8_20x20_neon:                                  706.0 ( 3.69x)
>
> dmvr_8_12x20_neon:                                      77.2 ( 7.70x)
> dmvr_8_20x12_neon:                                      66.5 ( 8.49x)
> dmvr_8_20x20_neon:                                      92.2 ( 9.90x)
> dmvr_12_12x20_neon:                                     80.2 ( 7.38x)
> dmvr_12_20x12_neon:                                     58.2 ( 9.59x)
> dmvr_12_20x20_neon:                                     90.0 (10.15x)
> dmvr_hv_8_12x20_neon:                                  369.0 ( 4.34x)
> dmvr_hv_8_20x12_neon:                                  355.8 ( 4.49x)
> dmvr_hv_8_20x20_neon:                                  574.2 ( 4.51x)
>
> libavcodec/aarch64/vvc/inter.S | 72 ++++++++++------------------------
> 1 file changed, 20 insertions(+), 52 deletions(-)
>
> diff --git a/libavcodec/aarch64/vvc/inter.S b/libavcodec/aarch64/vvc/inter.S
> index c9d698ee29..45add44b6e 100644
> --- a/libavcodec/aarch64/vvc/inter.S
> +++ b/libavcodec/aarch64/vvc/inter.S
> @@ -369,22 +369,18 @@ function ff_vvc_dmvr_8_neon, export=1
> 1:
>         cbz             w15, 2f
>         ldr             q0, [src], #16
> -        uxtl            v1.8h, v0.8b
> -        uxtl2           v2.8h, v0.16b
> -        ushl            v1.8h, v1.8h, v16.8h
> -        ushl            v2.8h, v2.8h, v16.8h
> +        ushll           v1.8h, v0.8b, #2
> +        ushll2          v2.8h, v0.16b, #2

In addition to what's mentioned in the commit message, this bit is 
semantically a different one, so we should probably mention that in the 
commit message as well. If you're reposting patch 1/2 of this set, can you 
update the commit message on this one, to mention this (and move the 
measurements into the actual commit message).

Other than that, this patch looks very good to me, thanks!

// Martin



More information about the ffmpeg-devel mailing list