[Ffmpeg-devel] [PATCH] idct8 in Altivec for H.264 decoding
Mon Oct 9 00:05:30 CEST 2006
Attached patch should provide a 2% decoding speed-up if I do the math right.
This patch isn't meant to be merged as it is now, as in addition to
adding idct8 routine, it moves TRANSPOSE8 macro to dsputil_altivec.h as
this macro is already duplicated in vc1dsp_altivec.c, and
This patch also carries some macros that are useful in Altivec
programming. They are taken from x264 project, and I have permission
from the author to re-licence them in LGPL.
One more thing: if the dst array is 8 or 16 bytes aligned, it should be
possible to make the routine even faster. Unfortunately, I can't manage
to make an implementation that works.
I've left the optimized routines ALTIVEC_STORE_SUM_CLIP_ALIGN8_A (16
bytes aligned *dst) and ALTIVEC_STORE_SUM_CLIP_ALIGN8_B (8 bytes aligned
*dst (but _not_ 16 bytes aligned) so ppl can have a look at them and
hopefully find what is wrong.
As far as I can see, ALTIVEC_STORE_SUM_CLIP_ALIGN8_A works as expected,
but ALTIVEC_STORE_SUM_CLIP_ALIGN8_B doesn't (that's really surprising
considering how much alike they are).
Anyway, comments and tests welcome.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 13964 bytes
Desc: not available
More information about the ffmpeg-devel