[FFmpeg-devel] Mixed data type in SIMD code?
Thu Mar 6 10:06:52 CET 2008
On Mon, 3 Mar 2008, Zuxy Meng wrote:
> I've always been unsure about the performance penalty (if there's any)
> brought in by using SIMD instructions that mismatch with the actual
> data type. For example to use pxor to clear a register that is later
> used as packet single, or to use movlhps to move part of a register
> that is later used as packet integer or packet double.
> These instructions are bitwise equivalent, but instructions operating
> on packed integer and packed single are usually shorter than those on
> packed double, so Intel recommends only to differentiate integers and
> floatings, and to use *ps instructions even for packed doubles. Don't
> know what AMD suggests. So does that mean we should replace things
> like movapd, movlhpd, xorpd and andpd with their *ps equivalents?
I measured no difference on a Core2 or a netburst Celeron.
But K8 got (same for Opteron 240 and Athlon64 3400):
cycles per copy, throughput bound
2.01 xorps; addps
3.53 pxor; addps
3.02 xorps; paddd
2.01 pxor; paddd
cycles per copy, latency bound
6.05 andps; addps
45.86 pand; addps
6.05 andps; paddd
4.02 pand; paddd
There does not appear to be any difference between ps and pd versions,
nor between movdqa/movaps/movapd.
More information about the ffmpeg-devel