[Ffmpeg-devel] [PATCH] Some MMX optimizations for Chinese AVS

Michael Niedermayer michaelni
Fri Jul 28 21:42:33 CEST 2006


Hi

On Fri, Jul 28, 2006 at 07:57:41PM +0200, Stefan Gehrer wrote:
[...]

> @@ -2779,6 +2795,8 @@
>                      c->idct_permutation_type= FF_PARTTRANS_IDCT_PERM;
>                  }
>  #endif
> +            }else if(idct_algo==FF_IDCT_H264){
> +                    c->idct_permutation_type= FF_TRANSPOSE_IDCT_PERM;

cavs idct != h.264 idct IIRC


[...]
> +static const uint64_t ff_pw_4  __attribute__ ((aligned(8))) = 0x0004000400040004ULL;
> +static const uint64_t ff_pw_5  __attribute__ ((aligned(8))) = 0x0005000500050005ULL;
> +static const uint64_t ff_pw_7  __attribute__ ((aligned(8))) = 0x0007000700070007ULL;
> +static const uint64_t ff_pw_42 __attribute__ ((aligned(8))) = 0x002A002A002A002AULL;
> +static const uint64_t ff_pw_64 __attribute__ ((aligned(8))) = 0x0040004000400040ULL;
> +static const uint64_t ff_pw_96 __attribute__ ((aligned(8))) = 0x0060006000600060ULL;

DECLARE_ALIGNED_8 should be used here


[...]
> +        "psllw  $1,    %%mm4  \n\t" /* mm4 = 2*src7 */
> +        "psllw  $1,    %%mm3  \n\t" /* mm3 = 2*src1 */
> +        "psllw  $1,    %%mm6  \n\t" /* mm6 = 2*src5 */
> +        "psllw  $1,    %%mm1  \n\t" /* mm1 = 2*src3 */

i think paddw is faster then psllw $1 on some cpus and equaly fast on the
rest


[...]
> +static void cavs_idct8_add_mmx(uint8_t *dst, int16_t *block, int stride)
> +{
> +    int i;
> +    int16_t __attribute__ ((aligned(8))) b2[64];
> +
> +    for(i=0; i<2; i++){
> +        uint64_t tmp;

this should be aligned

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is




More information about the ffmpeg-devel mailing list