[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics

Uoti Urpala uoti.urpala
Fri Feb 29 01:22:50 CET 2008


On Thu, 2008-02-28 at 23:37 +0100, Michael Niedermayer wrote:
> On Thu, Feb 28, 2008 at 10:59:18PM +0200, Uoti Urpala wrote:
> > On Thu, 2008-02-28 at 20:37 +0100, Michael Niedermayer wrote:
> > > As far as i can see the only people supporting intrinsics either
> > > A. cant code asm
> > > B. never properly compared asm and intrinsics
> > 
> > Or C. given enough time to write everything in asm can do more
> > productive things during that time instead (such as optimizing C code,
> > converting more C to use intrinsics, fixing bugs, or adding new
> > features).
> 
> Noone stops people from optimizing C code, fixing bugs and adding new
> features if they prefer these over asm coding.

Is there or should there be something stopping them from converting more
C code to use intrinsics? And your earlier messages gave the impression
that you'd consider converting intrinsics to asm to be important; if so
that implies you think people who'd do that probably wouldn't find more
useful things to do in the other areas.

> Also even your patches used asm and not intrinsics in the past are you
> arguing that you choose the inefficient approuch?

I haven't posted any patches containing large new asm blocks, just
modifications based on existing ones. And I'm not saying that you should
never write asm blocks.

> If one wants to have the fastest program one has to spend the time to
> optimize the code. If one doesnt care one of course can use intrinsics.

"Optimize everything as close to the theoretical optimum as you can no
matter what the cost" and "don't care" are not the only alternatives.
Instead of polishing something to 100% perfection doing
harder/larger/more tasks at "merely" good level can often give more
practical benefit.

> > There is a lot in FFmpeg that is obviously far from
> > perfect, both in areas of performance and features. Development efforts
> > are best directed in areas where you can achieve the most with the least
> > effort. The right comparison is whether the effort to convert intrinsics
> > to asm could achieve more benefit than spending equal effort to improve
> > any alternative area.
> 
> I agree here, but one also must consider that such comparission is hard to
> do in practice and it could easily take longer to awnser the question of
> "which way to go" than to go both to the end and back.

An exact comparison is of course very hard to do, but if you want to
encourage people to work on converting intrinsics to asm that should
mean you consider it likely be one of the top areas. And no rule such as
"if it gives a 5% speedup it must be done" can work without considering
the effort needed (or benefit too - a 5% speedup on x86/amd64 or in a
commonly used feature is probably worth doing more work for than a 5%
speedup on PPC or in obscure features).

> > You're kidding yourself if you think you're not accepting a 5% speedloss
> > in many features even on x86. I wonder if there's any nontrivial feature
> > in FFmpeg that IS within 5% of optimal...
> 
> You are misunderstanding me, I do not accept something if I think (or know)
> that it can be done 5% faster.
> The things you speak about are >5% away from a global optimum which we do
> not know where it is nor how to reach it. The difference is that first is
> constructive (this is bad do that as its 5% faster) vs. the second being
> destructive (this is bad its rejected and we dont know how to improve it).

How about "convert all asm blocks to amd64-specific asm"? You think that
wouldn't give a 5% performance increase for anything or that we don't
know how to do it? Or "implement better parallelism in h264 decoder"?
(If you think that goes into "don't know how" territory, change to "find
out what highlevel parallelization techniques coreavc uses and implement
similar").

Given the about 5% "random" changes I've seen in h264 decoding
performance between compiles of slightly different code it should be
possible to optimize at least the worst case performance too. Include
the result from a "good" compile (or perhaps ICC compile) as an asm
block, and you'll have improved at least 5% over worst versions from
gcc :)





More information about the ffmpeg-devel mailing list