[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics
Thu Feb 28 09:18:32 CET 2008
M?ns Rullg?rd schrieb:
> Michael Niedermayer <michaelni at gmx.at> writes:
>> On Wed, Feb 27, 2008 at 09:33:09PM +0000, M?ns Rullg?rd wrote:
>>> Michael Niedermayer <michaelni at gmx.at> writes:
>> Also one can always write asm code that is as fast as intrinsic
>> code, its not neccessarily possible to write intrinsics code that is
>> as fast as asm.
> One can write assembler that is as fast as intrinsics for *one* CPU
> variant. Even a moderately clever compiler may well compile the
> intrinsics into code outperforming code that was hand-tuned for the
> wrong CPU.
And not to forget: x86-64 has the double number of SSE registers, so the
compiler can in theory make use of that and generate faster code for
x86-64 using the same source as for x86-32. And in fact it did happen
(once used SSE-intrinsics in a project for x86-32 and x86-64).
Plain asm was 5% faster than intrinsics on x86-32, but on x86-64 the
speed was just amazing, i would have needed to rewrite the plain asm for
that, but got it for free out of gcc (4.1 that was). The particular code
was rather long, so it saved tons of coding and debugging time...
Maybe gcc developers could optimize intrinsic generator code better if
it would get used more often - chicken & egg?
More information about the ffmpeg-devel