[Ffmpeg-devel] [RFC] VC1 Transform in AltiVec
Michael Niedermayer
michaelni
Wed Jul 19 08:18:30 CEST 2006
Hi
On Wed, Jul 19, 2006 at 07:23:59AM +0300, Kostya wrote:
[...]
> > > + ssrc7 = vec_ld(112, block);
> > > +
> > > + TRANSPOSE8(ssrc0, ssrc1, ssrc2, ssrc3, ssrc4, ssrc5, ssrc6, ssrc7);
> >
> > the TRANSPOSE is unneeded, the scantables can be transposed to get the same
> > effect
i meant the first TRANSPOSE is unneeded, it can be done by transposing the
scantables, the second of course is still needed
>
> I'm not sure about this. Looks like to be the simplest way to do horizontal
> transform with AltiVec.
> >
> >
[...]
> > > + sA = vec_unpackh(ssrc2);
> > > + sB = vec_unpackh(ssrc3);
> > > + sC = vec_unpackh(ssrc4);
> > > + sD = vec_unpackh(ssrc5);
> > > + sE = vec_unpackh(ssrc6);
> > > + sF = vec_unpackh(ssrc7);
> > > + STEP8(s0, s1, s2, s3, s4, s5, s6, s7, vec_4);
> > > + SHIFT_VERT(s0, s1, s2, s3, s4, s5, s6, s7);
> > > + STEP8(s8, s9, sA, sB, sC, sD, sE, sF, vec_4);
> > > + SHIFT_VERT(s8, s9, sA, sB, sC, sD, sE, sF);
> >
> > the vertical transform can also be done in 16bit though its a little trickier
> >
> > t1 = 6 * (src[ 0] + src[32]);
> > t2 = 6 * (src[ 0] - src[32]);
> > t3 = 8 * src[16] + 3 * src[48];
> > t4 = 3 * src[16] - 8 * src[48];
> >
> > t5 = t1 + t3;
> > t6 = t2 + t4;
> > t7 = t2 - t4;
> > t8 = t1 - t3;
> >
> > t1 = (8 * src[ 8] + 8 * src[24] + 4 * src[40] + 2 * src[56]) + ((- src[24] + src[40])>>1);
> > t2 = (8 * src[ 8] - 2 * src[24] - 8 * src[40] - 4 * src[56]) + ((- src[ 8] - src[56])>>1);
> > t3 = (4 * src[ 8] - 8 * src[24] + 2 * src[40] + 8 * src[56]) + (( src[ 8] - src[56])>>1);
> > t4 = (2 * src[ 8] - 4 * src[24] + 8 * src[40] - 8 * src[56]) + ((- src[24] - src[40])>>1);
> >
> > dst[ 0] = (t5 + t1 + 32) >> 6;
> > dst[ 8] = (t6 + t2 + 32) >> 6;
> > dst[16] = (t7 + t3 + 32) >> 6;
> > dst[24] = (t8 + t4 + 32) >> 6;
> > dst[32] = (t8 - t4 + 32) >> 6;
> > dst[40] = (t7 - t3 + 32) >> 6;
> > dst[48] = (t6 - t2 + 32) >> 6;
> > dst[56] = (t5 - t1 + 32) >> 6;
> >
> > its also interresting to note that microsoft must be aware of this due to the
> > way rounding is done on the second half of coeffs but they apparently
> > dont mention it in the spec ... i am wondering what other stuff they have
> > hidden ...
> >
> > and the + 32 can be added to t1/t2 instead of the end
>
> Well, here is my version converted back to C:
>
> t1 = ((src[0] + src[4]) << 2) * 3 + 4;
> t2 = ((src[0] - src[4]) << 2) * 3 + 4;
> t3 = ((src[6] * 3) << 1) + (src[2] << 4);
> t4 = ((src[2] * 3) << 1) - (src[6] << 4);
>
> t5 = t1 + t3;
> t6 = t2 + t4;
> t7 = t2 - t4;
> t8 = t1 - t3;
>
> // t1 = 16 * src[1] + 15 * src[3] + 9 * src[5] + 4 * src[7]
> t1 = ((((((src[1] + src[3]) << 1) + src[5]) << 1) + src[7]) << 2) + src[5] - src[3];
> ...etc
your version can overflow with 16bit variables
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is
More information about the ffmpeg-devel
mailing list