[FFmpeg-devel] Inline ASM vs. Intrinsics

Michael Niedermayer michaelni
Sat May 12 01:00:49 CEST 2007


On Fri, May 11, 2007 at 05:10:56PM -0400, Dave Dodge wrote:
> On Fri, May 11, 2007 at 09:20:04PM +0200, Michael Niedermayer wrote:
> > and from what i remember its a nightmare for a compiler to generate
> > good code for it ...
> Instructions are 41 bits and are bundled three at a time into 128-bit

i know and this means the code is MANY times bigger than on other cpus
which means it needs a many times larger code cache to reach the same
instead of inventing a simple yet efficient packing intel removed the
packing (compared to x86)

> blocks; the remaining 5 bits have to do with specifying which
> execution units to use.  I don't think the chip does any internal
> reordering, instead relying on the compiler to figure it all out
> explicitly. 

which the compiler cannot do as it depends on branches and cache hit/
miss behavior which is a runtime thing ...
a normal cpu can execute other stuff while waiting for a memory read
the IA64 has to wait ...

> If you look at IA64 assembly you often see a lot of NOPs

if i look at IA64 asm i see a lot of vomit and need to clean my keyboard

> If an open source project requires a lot of assembly language to do
> its job (such as JIT compiler library), you'll typically find that
> it's been ported to everything _but_ IA64, because nobody is
> masochistic enough to try it.

the "fact" that no human can write asm for it is another huge disadvantage
just look at how much speed you loose with lets say ffmpeg if you disable
all asm ...

> The chips are large and expensive, but you can get them with 9MB of
> on-chip cache if you've got the money.  

they also need that amount of cache to cover up the total missdesign

> What IA64 _can_ do is scale way, way, WAY up.  If you want a single
> machine with hundreds of CPUs, several terabytes of shared,
> cache-coherent RAM, 90 PCI-X slots, and 16 GPUs, you can have it right
> now.  

yes i do want that :)

> That's the only reason I ended up dealing with IA64 for my
> project: when we were working out the RAM and CPU requirements IA64
> was simply the only practical option.  Today there are several other
> possibilities and I've kept my code mostly ready to drop onto x86_64
> when the time comes.

iam not disagreeing that some missdesigned CPUs are sometimes the 
best option when everything is considered ...

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070512/c8cf08fd/attachment.pgp>

More information about the ffmpeg-devel mailing list