[FFmpeg-devel] [PATCH 1/2] avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_extract_exponents

Martin Storsjö martin at martin.st
Sun Mar 2 00:59:20 EET 2025


On Fri, 28 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:

> Before and after:
>
> A78
> ac3_extract_exponents_n512_neon:                       503.2 ( 3.36x)
> ac3_extract_exponents_n3072_neon:                     2986.2 ( 3.35x)
>
> ac3_extract_exponents_n512_neon:                       211.2 ( 8.02x)
> ac3_extract_exponents_n3072_neon:                     1251.5 ( 8.00x)
>
> A72
> ac3_extract_exponents_n512_neon:                       964.7 ( 2.39x)
> ac3_extract_exponents_n3072_neon:                     5434.5 ( 2.47x)
>
> ac3_extract_exponents_n512_neon:                       465.6 ( 4.87x)
> ac3_extract_exponents_n3072_neon:                     2696.3 ( 4.97x)
> ---
> This version handles 16 ints in one go and consolidates separate
> extractions and writes into one. I assume the length of the input is a
> multiple of 16 (there are no constraints defined in the template file),
> but the tests are passing.

I have no clue about whehter this is ok or not (it may be good to check 
other assembly implementations if we do this on e.g. x86). Codewise, the 
patch looks good, thanks!

This description of the patch, what it does and the assumptions it makes, 
is probably nice to keep in the final commit as well, so it could be 
included above "---" too.

// Martin



More information about the ffmpeg-devel mailing list