[Ffmpeg-devel] [PATCH] idct8 in Altivec for H.264 decoding

Guillaume Poirier gpoirier
Mon Oct 9 11:41:26 CEST 2006


Hi,

Luca Barbato wrote:
> Guillaume POIRIER wrote:
> 
> 
>>+
>>+/***********************************************************************
>>+ * Vector types
>>+ **********************************************************************/
>>+#define vec_u8_t  vector unsigned char
>>+#define vec_s8_t  vector signed char
>>+#define vec_u16_t vector unsigned short
>>+#define vec_s16_t vector signed short
>>+#define vec_u32_t vector unsigned int
>>+#define vec_s32_t vector signed int
>>+
>>+/***********************************************************************
>>+ * Null vector
>>+ **********************************************************************/
>>+#define LOAD_ZERO const vec_u8_t zerov = vec_splat_u8( 0 )
>>+
>>+#define zero_u8v  (vec_u8_t)  zerov
>>+#define zero_s8v  (vec_s8_t)  zerov
>>+#define zero_u16v (vec_u16_t) zerov
>>+#define zero_s16v (vec_s16_t) zerov
>>+#define zero_u32v (vec_u32_t) zerov
>>+#define zero_s32v (vec_s32_t) zerov
> 
> 
> move them in a types_altivec.h

ok.


> 
> 
>>+
>>+/***********************************************************************
>>+* VEC_DIFF_H_8BYTE_ALIGNED
>>+***********************************************************************
>>+* p1, p2:    u8 *
>>+* i1, i2, n: int
>>+* d:         s16v
>>+*
>>+* Loads n bytes from p1 and p2, do the diff of the high elements into
>>+* d, increments p1 and p2 by i1 and i2
>>+* Slightly faster when we know we are loading/diffing 8bytes which
>>+* are 8 byte aligned. Reduces need for two loads and two vec_lvsl()'s
>>+**********************************************************************/
>>+#define PREP_DIFF_8BYTEALIGNED \
>>+LOAD_ZERO;                     \
>>+vec_s16_t pix1v, pix2v;        \
>>+vec_u8_t permPix1, permPix2;   \
>>+permPix1 = vec_lvsl(0, pix1);  \
>>+permPix2 = vec_lvsl(0, pix2);  \
>>+
>>+#define VEC_DIFF_H_8BYTE_ALIGNED(p1,i1,p2,i2,n,d)    \
>>+pix1v = vec_perm(vec_ld(0,p1), zero_u8v, permPix1);  \
>>+pix2v = vec_perm(vec_ld(0, p2), zero_u8v, permPix2); \
>>+pix1v = vec_u8_to_s16( pix1v );                      \
> 
> 
> missing macro?

woops, good catch.

It may be just simpler to just add all macros from x264 in ffmpeg.
They are quite usefull IMHO.



>>+#define ALTIVEC_STORE_SUM_CLIP_ALIGN8_A
>>+#define ALTIVEC_STORE_SUM_CLIP_ALIGN8_B
> 
> 
> if is 8 bytes aligned you have to pick the high part or the low part of
> it, B should take the low part, while A is taking the high part.

Isn't that what they already do?

You can check that by replacing in the code
ALTIVEC_STORE_SUM_CLIP(&dst[0*stride], idct0, perm_ldv, perm_stv, sel);
 by
ALTIVEC_STORE_SUM_CLIP_ALIGN8_A/B(&dst[0*stride], idct0);

BTW, the names of these variants suck, they should be named HIGH/LOW
instead of A/B.

Guillaume




More information about the ffmpeg-devel mailing list