[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2
Thu Aug 24 21:46:41 CEST 2006
On Thu, Aug 24, 2006 at 06:35:07PM +0200, Guillaume Poirier wrote:
> >>Rich, you should really consider that some ppl aren't willing to spend
> >>their youth on writting killer hand tuned asm code.
> > It takes maybe 5-10 minutes more to write the obvious handwritten asm
> > than to write the code with intrinsics, and performance should be same
> > or better. If you want to make it even faster you may spend somewhat
> > longer but your claims of "spending their youth" are exaggerated and
> > misleading.
> Well, you forgot to consider several things:
> appropriate register allocation (gcc may not be to good at that, it's
> still easier to write code with named variables rather than with
> anonymous reg names).
are you complaining about the names like %1 %%eax and such?
quoting from the manual:
As of GCC version 3.1, it is also possible to specify input and output
operands using symbolic names which can be referenced within the
assembler code. These names are specified inside square brackets
preceding the constraint string, and can be referenced inside the
assembler code using `%[NAME]' instead of a percentage sign followed by
the operand number. Using named operands the above example could look
asm ("fsinx %[angle],%[output]"
: [output] "=f" (result)
: [angle] "f" (angle));
> appropriate scheduling (fair enough, GCC is not all that good at that,
> but ICC is better)
> appropriate clobbering of inputs
> ah, I almost forgot:
> writing a 2nd version of the code that _takes advantage_ of x86-64
> (using REG_xx is cheating as you limit yourself to just half of the
do you have a example where generic intrinsics perform better then
asm written so it works on both x86 and x86-64 ?
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is
More information about the ffmpeg-devel