[FFmpeg-devel] [PATCH 1/2] avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_extract_exponents
Martin Storsjö
martin at martin.st
Sun Mar 2 00:59:20 EET 2025
On Fri, 28 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
> Before and after:
>
> A78
> ac3_extract_exponents_n512_neon: 503.2 ( 3.36x)
> ac3_extract_exponents_n3072_neon: 2986.2 ( 3.35x)
>
> ac3_extract_exponents_n512_neon: 211.2 ( 8.02x)
> ac3_extract_exponents_n3072_neon: 1251.5 ( 8.00x)
>
> A72
> ac3_extract_exponents_n512_neon: 964.7 ( 2.39x)
> ac3_extract_exponents_n3072_neon: 5434.5 ( 2.47x)
>
> ac3_extract_exponents_n512_neon: 465.6 ( 4.87x)
> ac3_extract_exponents_n3072_neon: 2696.3 ( 4.97x)
> ---
> This version handles 16 ints in one go and consolidates separate
> extractions and writes into one. I assume the length of the input is a
> multiple of 16 (there are no constraints defined in the template file),
> but the tests are passing.
I have no clue about whehter this is ok or not (it may be good to check
other assembly implementations if we do this on e.g. x86). Codewise, the
patch looks good, thanks!
This description of the patch, what it does and the assumptions it makes,
is probably nice to keep in the final commit as well, so it could be
included above "---" too.
// Martin
More information about the ffmpeg-devel
mailing list