[FFmpeg-devel] [PATCH] swr/resample: use fma when it is faster

Ganesh Ajjanagadde gajjanag at mit.edu
Mon Dec 14 01:29:52 CET 2015

On Sun, Dec 13, 2015 at 6:55 PM, James Almer <jamrial at gmail.com> wrote:
> FP_FAST_FMA is apparently not defined on mingw-w64 even though it supports
> fma() and generates FMA3/4 instructions when targeting relevant CPUs.

Guess some implementer took the "optional" literally and decided not to bother.

> I also noticed that GCC will on x86_32 generate a call to an external fma
> function instead of inlining the relevant FMA3/4 instructions, same as it
> does when the target lacks fast fma instructions, so simply checking the
> target CPU is not enough. On said builds this patch will probably mean a
> slowdown. No idea what GCC does with other arches.

I have tested the slowdown myself by avoiding the -march; it gives
numbers around 60 as opposed to the current 50 (or 40 with a proper
fma). Thus, an ~ 20% slowdown.

This is yet more broken stuff, and it really should be fixed upstream.
But until then, we need to work around it or not bother at all.

The trouble of runtime cpu detection is that if we want fma support,
we will have to write assembly here as far as I can tell: a generic
fma() call will be compiled into a slow libm call that emulates the
fma using some combination of instructions on default compiler

The worst part is that it is a bad idea to do runtime dispatch on the
fma() itself, as the function call overhead will be nonneglible, and
so one can't create a helper API in avutil or elsewhere. Thus, it can
only be used when a function is in a critical hotspot, where the
duplication of code and maintainence burden can be justified for the
performance benefits. I might be missing something here though.

In summary: fma's are very useful for obtaining performance even at
the level of C code across FFmpeg. Unfortunately, due to other issues,
it seems like their application can only be targeted at very specific,
high-profile instances.

Patch dropped; a clean solution for getting fma into various regions
of the C codebase is beyond my understanding.

Thanks all.

More information about the ffmpeg-devel mailing list