[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2

Uoti Urpala uoti.urpala
Thu Aug 24 22:16:19 CEST 2006

On Thu, 2006-08-24 at 13:28 -0400, Rich Felker wrote:
> On Thu, Aug 24, 2006 at 07:47:12PM +0300, Uoti Urpala wrote:
> > On Thu, 2006-08-24 at 12:15 -0400, Rich Felker wrote:
> > > disabled.. Any viable compiler for high-performance needs to have full
> > > inline asm available, not just a limited set of intrinsics for vector
> > > ops.
> > 
> > Not necessarily, and certainly not gcc-compatible inline asm. How many
> > asm routines are there in FFmpeg or MPlayer that could not achieve
> > comparable speed with intrinsics only?
> s/comparable/same or better/. 1-5% slowdown is not acceptable. And
> with this correction I suspect the answer is _NONE_.

I don't know whether 1% slowdowns would occur (and apparently you don't
know either), but I don't consider that unacceptable anyway. Bigger
speedups than 1% are rather easy to achieve on a particular platform.
Didn't some of the workarounds for old gcc versions cause slowdowns in
that range?

> > > It takes maybe 5-10 minutes more to write the obvious handwritten asm
> > > than to write the code with intrinsics, and performance should be same
> > > or better.
> > 
> > It takes much more at least if you don't already have a lot of
> > experience writing general asm. If you don't do much asm programming
> > otherwise practicing it just for FFmpeg/MPlayer usage doesn't pay off.
> If you don't have this experience you're probably not qualified for
> performance coding anyway.

You need understanding about the processor-level issues, and perhaps
some ability to read the generated assembler for a particular target to
understand performance profiles and platform-specific problems. You
certainly do not need to be fluent in writing gcc inline asm to write
high-performance code.

More information about the ffmpeg-devel mailing list