[Ffmpeg-devel] [PATCH] idct8 in Altivec for H.264 decoding
Guillaume Poirier
gpoirier
Mon Oct 9 11:41:26 CEST 2006
Hi,
Luca Barbato wrote:
> Guillaume POIRIER wrote:
>
>
>>+
>>+/***********************************************************************
>>+ * Vector types
>>+ **********************************************************************/
>>+#define vec_u8_t vector unsigned char
>>+#define vec_s8_t vector signed char
>>+#define vec_u16_t vector unsigned short
>>+#define vec_s16_t vector signed short
>>+#define vec_u32_t vector unsigned int
>>+#define vec_s32_t vector signed int
>>+
>>+/***********************************************************************
>>+ * Null vector
>>+ **********************************************************************/
>>+#define LOAD_ZERO const vec_u8_t zerov = vec_splat_u8( 0 )
>>+
>>+#define zero_u8v (vec_u8_t) zerov
>>+#define zero_s8v (vec_s8_t) zerov
>>+#define zero_u16v (vec_u16_t) zerov
>>+#define zero_s16v (vec_s16_t) zerov
>>+#define zero_u32v (vec_u32_t) zerov
>>+#define zero_s32v (vec_s32_t) zerov
>
>
> move them in a types_altivec.h
ok.
>
>
>>+
>>+/***********************************************************************
>>+* VEC_DIFF_H_8BYTE_ALIGNED
>>+***********************************************************************
>>+* p1, p2: u8 *
>>+* i1, i2, n: int
>>+* d: s16v
>>+*
>>+* Loads n bytes from p1 and p2, do the diff of the high elements into
>>+* d, increments p1 and p2 by i1 and i2
>>+* Slightly faster when we know we are loading/diffing 8bytes which
>>+* are 8 byte aligned. Reduces need for two loads and two vec_lvsl()'s
>>+**********************************************************************/
>>+#define PREP_DIFF_8BYTEALIGNED \
>>+LOAD_ZERO; \
>>+vec_s16_t pix1v, pix2v; \
>>+vec_u8_t permPix1, permPix2; \
>>+permPix1 = vec_lvsl(0, pix1); \
>>+permPix2 = vec_lvsl(0, pix2); \
>>+
>>+#define VEC_DIFF_H_8BYTE_ALIGNED(p1,i1,p2,i2,n,d) \
>>+pix1v = vec_perm(vec_ld(0,p1), zero_u8v, permPix1); \
>>+pix2v = vec_perm(vec_ld(0, p2), zero_u8v, permPix2); \
>>+pix1v = vec_u8_to_s16( pix1v ); \
>
>
> missing macro?
woops, good catch.
It may be just simpler to just add all macros from x264 in ffmpeg.
They are quite usefull IMHO.
>>+#define ALTIVEC_STORE_SUM_CLIP_ALIGN8_A
>>+#define ALTIVEC_STORE_SUM_CLIP_ALIGN8_B
>
>
> if is 8 bytes aligned you have to pick the high part or the low part of
> it, B should take the low part, while A is taking the high part.
Isn't that what they already do?
You can check that by replacing in the code
ALTIVEC_STORE_SUM_CLIP(&dst[0*stride], idct0, perm_ldv, perm_stv, sel);
by
ALTIVEC_STORE_SUM_CLIP_ALIGN8_A/B(&dst[0*stride], idct0);
BTW, the names of these variants suck, they should be named HIGH/LOW
instead of A/B.
Guillaume
More information about the ffmpeg-devel
mailing list