[Ffmpeg-devel] [RFC] AltiVec optimizations, try 2
Thu Aug 3 11:44:08 CEST 2006
Guillaume POIRIER wrote:
> Just out of curiosity, is it necessary to explicit vec_splat_s32 so that
> gcc uses the "splat" asm instruction, otherwise it will allocate 64, 7,
> ... on the stack and load each register with these constants?
You want to not use the stack at all but just have it inlined as direct
operation since vec_splat_(s|u)(8|16|32) doesn't require memory access
> Also, as far as I understood how vec_splat_s32 works, it should be
> possible to generate a vector full of "64" with a single
nope you can put in a ppc instruction a value in the range of -16 .. 15
if is an immediate.
vec_splat_* take an immediate, not a register.
> so why is it desirable to use the form with more
> instructions (more decoding bw, more dependencies, more computation unit
> slots used up)... is this an optimization specific to G4 or to Altivec
> in general?
generic optimization, in Altivec the most expensive operation is memory
access (think it about 3-4 times slower than every other instructions)
> Or am I just to blind to see the obvious solution?
not blind, just not used to it.
In theory you'd like to have those const values in some registers and
not have to pay a visit to the memory and then keep them there.
since we already splatted 4 somewhere the vec_sl will just use this
register if there aren't deps on it too near, so even splatting 64 would
be a single algebric op.
More information about the ffmpeg-devel