[Ffmpeg-devel] [RFC] VC1 Transform in AltiVec

Michael Niedermayer michaelni
Wed Jul 19 08:18:30 CEST 2006


Hi

On Wed, Jul 19, 2006 at 07:23:59AM +0300, Kostya wrote:
[...]
> > > +    ssrc7 = vec_ld(112, block);
> > > +
> > > +    TRANSPOSE8(ssrc0, ssrc1, ssrc2, ssrc3, ssrc4, ssrc5, ssrc6, ssrc7);
> > 
> > the TRANSPOSE is unneeded, the scantables can be transposed to get the same
> > effect

i meant the first TRANSPOSE is unneeded, it can be done by transposing the
scantables, the second of course is still needed


> 
> I'm not sure about this. Looks like to be the simplest way to do horizontal
> transform with AltiVec.
> > 
> > 
[...]
> > > +    sA = vec_unpackh(ssrc2);
> > > +    sB = vec_unpackh(ssrc3);
> > > +    sC = vec_unpackh(ssrc4);
> > > +    sD = vec_unpackh(ssrc5);
> > > +    sE = vec_unpackh(ssrc6);
> > > +    sF = vec_unpackh(ssrc7);
> > > +    STEP8(s0, s1, s2, s3, s4, s5, s6, s7, vec_4);
> > > +    SHIFT_VERT(s0, s1, s2, s3, s4, s5, s6, s7);
> > > +    STEP8(s8, s9, sA, sB, sC, sD, sE, sF, vec_4);
> > > +    SHIFT_VERT(s8, s9, sA, sB, sC, sD, sE, sF);
> > 
> > the vertical transform can also be done in 16bit though its a little trickier
> > 
> >             t1 = 6 * (src[ 0] + src[32]);
> >             t2 = 6 * (src[ 0] - src[32]);
> >             t3 = 8 * src[16] +  3 * src[48];
> >             t4 = 3 * src[16] -  8 * src[48];
> > 
> >             t5 = t1 + t3;
> >             t6 = t2 + t4;
> >             t7 = t2 - t4;
> >             t8 = t1 - t3;
> > 
> >             t1 = (8 * src[ 8] + 8 * src[24] + 4 * src[40] + 2 * src[56]) + ((- src[24] + src[40])>>1);
> >             t2 = (8 * src[ 8] - 2 * src[24] - 8 * src[40] - 4 * src[56]) + ((- src[ 8] - src[56])>>1);
> >             t3 = (4 * src[ 8] - 8 * src[24] + 2 * src[40] + 8 * src[56]) + ((  src[ 8] - src[56])>>1);
> >             t4 = (2 * src[ 8] - 4 * src[24] + 8 * src[40] - 8 * src[56]) + ((- src[24] - src[40])>>1);
> > 
> >             dst[ 0] = (t5 + t1 + 32) >> 6;
> >             dst[ 8] = (t6 + t2 + 32) >> 6;
> >             dst[16] = (t7 + t3 + 32) >> 6;
> >             dst[24] = (t8 + t4 + 32) >> 6;
> >             dst[32] = (t8 - t4 + 32) >> 6;
> >             dst[40] = (t7 - t3 + 32) >> 6;
> >             dst[48] = (t6 - t2 + 32) >> 6;
> >             dst[56] = (t5 - t1 + 32) >> 6;
> > 
> > its also interresting to note that microsoft must be aware of this due to the
> > way rounding is done on the second half of coeffs but they apparently 
> > dont mention it in the spec ... i am wondering what other stuff they have
> > hidden ...
> > 
> > and the + 32 can be added to t1/t2 instead of the end
> 
> Well, here is my version converted back to C:
> 
> t1 = ((src[0] + src[4]) << 2) * 3 + 4;
> t2 = ((src[0] - src[4]) << 2) * 3 + 4;
> t3 = ((src[6] * 3) << 1) + (src[2] << 4);
> t4 = ((src[2] * 3) << 1) - (src[6] << 4);
> 
> t5 = t1 + t3;
> t6 = t2 + t4;
> t7 = t2 - t4;
> t8 = t1 - t3;
> 
> // t1 = 16 * src[1] + 15 * src[3] + 9 * src[5] + 4 * src[7]
> t1 = ((((((src[1] + src[3]) << 1) + src[5]) << 1) + src[7]) << 2) + src[5] - src[3];
> ...etc

your version can overflow with 16bit variables

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is




More information about the ffmpeg-devel mailing list