[FFmpeg-devel] [RFC] Loop unrolling in C code for 'vector_fmul_*' functions
Thu Jan 10 21:39:47 CET 2008
On Tue, Jan 08, 2008 at 02:20:07AM +0200, Siarhei Siamashka wrote:
> But at least for ARM, looks like the compiler is quite stupid and can't
> schedule instructions properly as seen from the benchmark results (just
> unrolling loop is not enough and some extra tweaks are needed
> in 'vector_fmul_c_other_unrolled'). VFP coprocessor has a high result latency
> (8 cycles), though throughput is quite good (1 cycle) and some other nice
> features which can improve performance exist (documantation for VFP can be
> found at http://www.arm.com). The compiler (gcc) does not even try to reorder
> instructions and pipeline is just stalled most of the time. I would not be
> surprised if the compiler screwed up and generated something suboptimal on
> more complicated floating point stuff as well (fft and imdct).
Please submit reports to the gcc devels for every case of suboptimal code
generated by gcc you stumble across!
Its much better if gcc would be improved instead of everyone having to hand
schedule c code.
> Tweaking C code, performance can be improved quite a lot
> ('vector_fmul_c_other_unrolled' vs. 'vector_fmul_c_unrolled').
> But such unnesessarily cluttering code because of inefficient compilers is not
> a good option. Anyway, probably at least just loops can be unrolled to help
> the compiler do its job? The compiler itself does not know that 'len is a
> multiple of 8' and manual loops unrolling seems to be reasonable.
Add a assert((len & 7) == 0); and the compiler can know it.
> Well, I will do the rest of ARM VFP optimizations for all
> these 'vector_fmul_*' functions anyway :)
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
In a rich man's house there is no place to spit but his face.
-- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel