[FFmpeg-devel] [RFC] snow SSE2 optimizations

Michael Niedermayer michaelni
Tue Aug 28 18:06:06 CEST 2007


Hi

On Tue, Aug 28, 2007 at 05:10:18PM +0200, Luca Barbato wrote:
> Michael Niedermayer wrote:
> > On Tue, Aug 28, 2007 at 01:09:54PM +0200, Guillaume POIRIER wrote:
> >> Exactly. You need a CPU that has full-width (128bits) ALU to almost
> >> guarantee that SSE will be faster. Core2 and upcoming K10 have
> >> full-with SSE ALUs.
> > 
> > another way to say it is that you need a cpu which has 2 mmx units and
> > can use both for sse instructions but can only use 1 for mmx
> > 
> > if that is a step in the correct direction well ...
> > 
> 
> Start guessing why there is just one altivec (across 3 generations of
> cpus) and SPU is still quite similar...
> 
> The intel design for instructions set wasn't and isn't the smarter and
> they keep adding irregular changes...

true, but it also isnt the most stupid, sparc-vis beats them by quite a bit
and i dont think a perfectly regular set is a good idea either, because
90% of the resulting instructions are never used by anyone, but the
cpu must support them, it makes the cpu more complex and slower

where i think intel did mess up is:
* the mmx design which uses the floating point registers is sick

* the fact that both mmx and sse have just 8 registers is sick
  it was well known that 8 is a limiting factor in many cases
  and with IA64 intel demonstrated that you can as well do it wrong
  in the opposite direction by having hundreads of registers ...

* i want 8bit shifts, signed average, pack with shift and rounding and
  some lea like instruction for mmx

* the stack based FPU registers ...

* having implicit source and destination registers for some instructions
  like the 32x32->64 bit multiply

* integer fixed point multiply (multiply + rounding + shift down) like 
  pmulhrsw but for normal integers is missing ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I know you won't believe me, but the highest form of Human Excellence is
to question oneself and others. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070828/432cd319/attachment.pgp>



More information about the ffmpeg-devel mailing list