[FFmpeg-devel] [PATCH] Some ARM VFP optimizations (vector_fmul, vector_fmul_reverse, float_to_int16)

Siarhei Siamashka siarhei.siamashka
Sun Apr 20 17:41:04 CEST 2008


Hello,

Here is a patch which adds some initial optimizations for ARM VFP (floating
point coprocessor available in some ARM11 cores).

Standard regression test from ffmpeg runs successfully (changing to
ALT_BITSTREAM_READER reader is needed to pass tests though, because
A32_BITSTREAM_READER does not work with flashsv decoder - that's not 
ARM specific problem, but can be reproduced on x86 too).

Also my additional test program 'test-vfp.c' from 
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libavcodec/tests/?root=mplayer
runs successfully and verifies performance, correctness and absence of any
incorrect memory accesses outside memory buffers.

Right now I'm more interested in getting ARM VFP support in FFmpeg build
infrastructure (configure script, etc.). More optimizations will follow 
(vector_fmul_add_add, vorbis_inverse_coupling, imdct/fft, ...).

Also there are many data cache misses in vorbis decoding code. Reducing memory
use (if it is possible of course) may improve performance. Adding
prefetch instructions to ARM VFP optimized functions might also help, but PLD
instruction has no effect in OS2008 firmware (I'm currently researching this
particular problem and I think that I already know what has caused it).

Current benchmark results (Nokia N810, OS2008 firmware, ARM11 400MHz):

64-kbit ogg vorbis file ('ffmpeg -benchmark -i test64.ogg -f null /dev/null')

sample file decoding time before patch:
16.227 16.289 16.242 (average 16.253, stddev 0.032)

sample file decoding time after patch:
14.406 14.336 14.281 (average 14.341, stddev 0.063)

that's ~13.7% improvement overall

Report from oprofile before patch:

samples  %        image name               symbol name
61766    18.9862  ffmpeg_g                 ff_imdct_calc
55035    16.9171  ffmpeg_g                 ff_fft_calc_c
33879    10.4140  ffmpeg_g                 vorbis_decode_frame
31120     9.5659  ffmpeg_g                 ff_vector_fmul_add_add_c
21592     6.6371  ffmpeg_g                 vorbis_inverse_coupling
20999     6.4549  ffmpeg_g                 vector_fmul_c
18154     5.5803  ffmpeg_g                 vector_fmul_reverse_c
17366     5.3381  ffmpeg_g                 pcm_encode_frame
13632     4.1903  ffmpeg_g                 ff_float_to_int16_c
11375     3.4965  ffmpeg_g                 ff_vorbis_floor1_render_list
6839      2.1022  libc-2.5.so              memset
5367      1.6498  ffmpeg_g                 vorbis_floor1_decode
4193      1.2889  libc-2.5.so              memcpy
2350      0.7224  ffmpeg_g                 output_packet
2216      0.6812  ffmpeg_g                 main
1423      0.4374  libc-2.5.so              _int_malloc
975       0.2997  ffmpeg_g                 __aeabi_idiv
960       0.2951  ffmpeg_g                 __udivsi3
951       0.2923  ffmpeg_g                 __divdi3
935       0.2874  ffmpeg_g                 compute_pkt_fields
909       0.2794  libc-2.5.so              memalign
866       0.2662  libc-2.5.so              malloc_consolidate
824       0.2533  ffmpeg_g                 av_rescale_rnd
782       0.2404  libc-2.5.so              _int_free
717       0.2204  ffmpeg_g                 compute_pkt_fields2
663       0.2038  ffmpeg_g                 av_interleaved_write_frame
637       0.1958  libc-2.5.so              _int_memalign
588       0.1807  ffmpeg_g                 av_read_frame_internal
553       0.1700  ffmpeg_g                 build_table
505       0.1552  ffmpeg_g                 av_interleave_packet_per_dts
432       0.1328  ffmpeg_g                 ogg_packet
385       0.1183  ffmpeg_g                 .plt
381       0.1171  ffmpeg_g                 ogg_read_packet
352       0.1082  ffmpeg_g                 __gnu_ldivmod_helper
282       0.0867  ffmpeg_g                 avcodec_decode_audio2
254       0.0781  libc-2.5.so              select
237       0.0729  libc-2.5.so              free

Report from oprofile after patch:

samples  %        image name               symbol name
59798    20.6286  ffmpeg_g.vfp             ff_imdct_calc
54855    18.9234  ffmpeg_g.vfp             ff_fft_calc_c
33664    11.6131  ffmpeg_g.vfp             vorbis_decode_frame
32138    11.0867  ffmpeg_g.vfp             ff_vector_fmul_add_add_c
21674     7.4769  ffmpeg_g.vfp             vorbis_inverse_coupling
17204     5.9349  ffmpeg_g.vfp             pcm_encode_frame
11785     4.0655  ffmpeg_g.vfp             ff_vorbis_floor1_render_list
7472      2.5776  ffmpeg_g.vfp             float_to_int16_vfp
6731      2.3220  libc-2.5.so              memset
6678      2.3037  ffmpeg_g.vfp             vector_fmul_vfp
5284      1.8228  ffmpeg_g.vfp             vorbis_floor1_decode
4820      1.6628  ffmpeg_g.vfp             vector_fmul_reverse_vfp
3975      1.3713  libc-2.5.so              memcpy
2358      0.8134  ffmpeg_g.vfp             output_packet
2239      0.7724  ffmpeg_g.vfp             main
1461      0.5040  libc-2.5.so              _int_malloc
1247      0.4302  ffmpeg_g.vfp             __divdi3
1078      0.3719  ffmpeg_g.vfp             __udivsi3
1059      0.3653  ffmpeg_g.vfp             __aeabi_idiv
881       0.3039  ffmpeg_g.vfp             compute_pkt_fields
805       0.2777  libc-2.5.so              malloc_consolidate
744       0.2567  libc-2.5.so              memalign
714       0.2463  libc-2.5.so              _int_free
679       0.2342  ffmpeg_g.vfp             compute_pkt_fields2
616       0.2125  libc-2.5.so              _int_memalign
589       0.2032  ffmpeg_g.vfp             av_interleaved_write_frame
565       0.1949  ffmpeg_g.vfp             av_rescale_rnd
550       0.1897  ffmpeg_g.vfp             build_table
537       0.1852  ffmpeg_g.vfp             av_interleave_packet_per_dts
504       0.1739  ffmpeg_g.vfp             av_read_frame_internal
501       0.1728  ffmpeg_g.vfp             ogg_packet
382       0.1318  ffmpeg_g.vfp             avcodec_decode_audio2
339       0.1169  ffmpeg_g.vfp             .plt
321       0.1107  ffmpeg_g.vfp             av_get_bits_per_sample
293       0.1011  ffmpeg_g.vfp             ogg_read_packet
278       0.0959  ffmpeg_g.vfp             __aeabi_uidivmod
275       0.0949  libc-2.5.so              select
272       0.0938  ffmpeg_g.vfp             __gnu_ldivmod_helper
243       0.0838  libm-2.5.so              lrintf
236       0.0814  libc-2.5.so              free

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: armvfp.diff
Type: text/x-diff
Size: 11320 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080420/f1777ecb/attachment.diff>



More information about the ffmpeg-devel mailing list