[Ffmpeg-devel] [PATCH] idct8 in Altivec for H.264 decoding
Mon Oct 9 10:40:05 CEST 2006
On Mon, Oct 09, 2006 at 12:05:30AM +0200, Guillaume POIRIER wrote:
> Attached patch should provide a 2% decoding speed-up if I do the math right.
> This patch isn't meant to be merged as it is now, as in addition to
> adding idct8 routine, it moves TRANSPOSE8 macro to dsputil_altivec.h as
> this macro is already duplicated in vc1dsp_altivec.c, and
> This patch also carries some macros that are useful in Altivec
> programming. They are taken from x264 project, and I have permission
> from the author to re-licence them in LGPL.
could you send a seperate patch for the TRANSPOSE move and these?
> One more thing: if the dst array is 8 or 16 bytes aligned, it should be
> possible to make the routine even faster. Unfortunately, I can't manage
> to make an implementation that works.
> I've left the optimized routines ALTIVEC_STORE_SUM_CLIP_ALIGN8_A (16
> bytes aligned *dst) and ALTIVEC_STORE_SUM_CLIP_ALIGN8_B (8 bytes aligned
> *dst (but _not_ 16 bytes aligned) so ppl can have a look at them and
> hopefully find what is wrong.
> As far as I can see, ALTIVEC_STORE_SUM_CLIP_ALIGN8_A works as expected,
> but ALTIVEC_STORE_SUM_CLIP_ALIGN8_B doesn't (that's really surprising
> considering how much alike they are).
1. check that the stuff is really 8byte aligned (yes it should be but ...)
2. maybe some print_vec() function which prints the contents of a vec*
together with a check at the end if the calculaton matches what you
expect could help
my idea is something like:
vec_u8_t dstv = vec_ld(0, dest);
vec_st(sum8, 0, temp);\
for(i=0; i<16; i++)
if(temp[i] != dest[i])
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is
More information about the ffmpeg-devel