[FFmpeg-devel] [RFC] Loop unrolling in C code for 'vector_fmul_*' functions
Fri Jan 11 03:23:19 CET 2008
On Thu, Jan 10, 2008 at 09:39:47PM +0100, Michael Niedermayer wrote:
> > Tweaking C code, performance can be improved quite a lot
> > ('vector_fmul_c_other_unrolled' vs. 'vector_fmul_c_unrolled').
> > But such unnesessarily cluttering code because of inefficient compilers is not
> > a good option. Anyway, probably at least just loops can be unrolled to help
> > the compiler do its job? The compiler itself does not know that 'len is a
> > multiple of 8' and manual loops unrolling seems to be reasonable.
> Add a assert((len & 7) == 0); and the compiler can know it.
I doubt it will use it though. Instead why not mask off the low bits
or right-shift it to be a direct iteration count? Then it's obvious
for the compiler. While I agree that the coder should not have to
hand-schedule instructions in C code, I think it's quite reasonable
for the coder to write code in a way that minimizes the amount of
intelligence needed to generate good asm. Not only does this help the
compiler; following that principle also tends to make assumptions more
clear to humans reading the code.
More information about the ffmpeg-devel