[FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

Wed Nov 27 20:30:35 EET 2019

On Mon, Nov 25, 2019 at 11:20 PM Jean-Baptiste Kempf <jb at videolan.org> wrote:
> > Is there a coding rule in ffmpeg that restricts the use of intrinsics?
>
> Yes. See doc/optimization.txt.
> Use external asm (nasm/yasm) or inline asm (__asm__()), do not use intrinsics.

Thanks for the pointer.

> Also, here, you're changing some existing code, please improve the code and do not duplicate code.
>
> > If that is the case, I can adapt my code to the existing asm code.
>
> Please.

Please find attached a patch that improves the existing code in aarch64/hscale.S
Performance test with gcc and clang shows that the patch improves
performance by 34% on Graviton A1 instances:

$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
bench=start,scale=1024x1024,bench=stop -f null -

before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
after:  t:0.030079 avg:0.030102 max:0.030462 min:0.030051

Tested with `make check` on aarch64-linux.

Please let me know if I can make the patch better.

Thank you,
Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-aarch64-use-FMA-and-increase-vector-factor-to-4.patch
Type: application/x-patch
Size: 12499 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20191127/1c248059/attachment.bin>