[FFmpeg-devel] Mixed data type in SIMD code?

Loren Merritt lorenm
Thu Mar 6 10:06:52 CET 2008

On Mon, 3 Mar 2008, Zuxy Meng wrote:

> I've always been unsure about the performance penalty (if there's any)
> brought in by using SIMD instructions that mismatch with the actual
> data type. For example to use pxor to clear a register that is later
> used as packet single, or to use movlhps to move part of a register
> that is later used as packet integer or packet double.
> These instructions are bitwise equivalent, but instructions operating
> on packed integer and packed single are usually shorter than those on
> packed double, so Intel recommends only to differentiate integers and
> floatings, and to use *ps instructions even for packed doubles. Don't
> know what AMD suggests. So does that mean we should replace things
> like movapd, movlhpd, xorpd and andpd with their *ps equivalents?

I measured no difference on a Core2 or a netburst Celeron.

But K8 got (same for Opteron 240 and Athlon64 3400):
cycles per copy, throughput bound
  2.01  xorps; addps
  3.53  pxor;  addps
  3.02  xorps; paddd
  2.01  pxor;  paddd

cycles per copy, latency bound
  6.05  andps; addps
45.86  pand;  addps
  6.05  andps; paddd
  4.02  pand;  paddd

There does not appear to be any difference between ps and pd versions,
nor between movdqa/movaps/movapd.

--Loren Merritt

More information about the ffmpeg-devel mailing list