[FFmpeg-devel] [RFC] Loop unrolling in C code for 'vector_fmul_*' functions
Tue Jan 8 09:11:00 CET 2008
On Tue, 8 Jan 2008, Siarhei Siamashka wrote:
> And 'vector_fmul_*' functions look like a 'low hanging fruit' in the sense
> that they seem to be quite easy to optimize :) But there is another
> interesting thing, C implementation of these functions is very straightforward
> and it does not even unroll loops. But assembly or other SIMD optimizations
> exist only for x86 and ppc at the moment for these functions. Is it
> intentional and code readability is the main priority for them? Or some tweaks
> could be added to improve 'generic C' code performance?
My logic was: I could tell at a glance that SIMD would be faster than
scalar x86 code, so I wrote the SSE. Once that was done, the C version was
not used on any of my CPUs. So there was no point in optimizing it when I
couldn't benchmark what effect any potential optimization would have on
any CPU it's actually used on.
The same goes for you: If you write a VFP version, then what reason do you
have for tweaking the C to run better on your ARM? Unless you have reason
to believe that other ARM processors without VFP still have the same
scalar float characteristics.
More information about the ffmpeg-devel