[FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

Sebastian Pop sebpop at gmail.com
Mon Nov 25 23:59:33 EET 2019


This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
and bumps the vectorization factor from 2 to 4. I have seen speedups up to 15%
on Graviton A1 instances based on A-72 cpus.

$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
after:  t:0.037339 avg:0.037327 max:0.037550 min:0.036992

Tested with `make check` on aarch64-linux.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-aarch64-use-FMA-and-increase-vector-factor-to-4.patch
Type: application/octet-stream
Size: 3791 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20191125/afa00d17/attachment.obj>

More information about the ffmpeg-devel mailing list