[FFmpeg-devel] [PATCH 2/2] avcodec/aarch64/vvc: Use rounding shift NEON instruction
Martin Storsjö
martin at martin.st
Sun Mar 2 00:34:57 EET 2025
On Wed, 19 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
> ---
>
> Before and after on A78
>
> dmvr_8_12x20_neon: 86.2 ( 6.90x)
> dmvr_8_20x12_neon: 94.8 ( 5.93x)
> dmvr_8_20x20_neon: 141.5 ( 6.50x)
> dmvr_12_12x20_neon: 158.0 ( 3.76x)
> dmvr_12_20x12_neon: 151.2 ( 3.73x)
> dmvr_12_20x20_neon: 247.2 ( 3.71x)
> dmvr_hv_8_12x20_neon: 423.2 ( 3.75x)
> dmvr_hv_8_20x12_neon: 434.0 ( 3.69x)
> dmvr_hv_8_20x20_neon: 706.0 ( 3.69x)
>
> dmvr_8_12x20_neon: 77.2 ( 7.70x)
> dmvr_8_20x12_neon: 66.5 ( 8.49x)
> dmvr_8_20x20_neon: 92.2 ( 9.90x)
> dmvr_12_12x20_neon: 80.2 ( 7.38x)
> dmvr_12_20x12_neon: 58.2 ( 9.59x)
> dmvr_12_20x20_neon: 90.0 (10.15x)
> dmvr_hv_8_12x20_neon: 369.0 ( 4.34x)
> dmvr_hv_8_20x12_neon: 355.8 ( 4.49x)
> dmvr_hv_8_20x20_neon: 574.2 ( 4.51x)
>
> libavcodec/aarch64/vvc/inter.S | 72 ++++++++++------------------------
> 1 file changed, 20 insertions(+), 52 deletions(-)
>
> diff --git a/libavcodec/aarch64/vvc/inter.S b/libavcodec/aarch64/vvc/inter.S
> index c9d698ee29..45add44b6e 100644
> --- a/libavcodec/aarch64/vvc/inter.S
> +++ b/libavcodec/aarch64/vvc/inter.S
> @@ -369,22 +369,18 @@ function ff_vvc_dmvr_8_neon, export=1
> 1:
> cbz w15, 2f
> ldr q0, [src], #16
> - uxtl v1.8h, v0.8b
> - uxtl2 v2.8h, v0.16b
> - ushl v1.8h, v1.8h, v16.8h
> - ushl v2.8h, v2.8h, v16.8h
> + ushll v1.8h, v0.8b, #2
> + ushll2 v2.8h, v0.16b, #2
In addition to what's mentioned in the commit message, this bit is
semantically a different one, so we should probably mention that in the
commit message as well. If you're reposting patch 1/2 of this set, can you
update the commit message on this one, to mention this (and move the
measurements into the actual commit message).
Other than that, this patch looks very good to me, thanks!
// Martin
More information about the ffmpeg-devel
mailing list