[FFmpeg-devel] 4xm idct computation

yann.lepetitcorps at free.fr yann.lepetitcorps at free.fr
Thu Dec 29 02:42:28 CET 2011


> > Perhaps that the y*stride can to be factorised into the
> ff_snow_pred_block()
> > func because this was very redundant ?
> > (the same thing with the y*src_stride into ff_snow_inner_add_yblock() )
>
>
> Like this, I see now that the copy of color4 can be make by blocs :)
>
>
> diff --git a/libavcodec/snow.c b/libavcodec/snow.c
> index 0ce9b28..432d1d4 100644
> --- a/libavcodec/snow.c
> +++ b/libavcodec/snow.c
> @@ -288,32 +288,33 @@ static void mc_block(Plane *p, uint8_t *dst, const
> uint8_t
> *src, int stride, int
>  }
>
>  void ff_snow_pred_block(SnowContext *s, uint8_t *dst, uint8_t *tmp, int
> stride,
> int sx, int sy, int b_w, int b_h, BlockNode *block, int plane_index, int w,
> int
> h){
> +
>      if(block->type & BLOCK_INTRA){
>          int x, y;
>          const unsigned color  = block->color[plane_index];
>          const unsigned color4 = color*0x01010101;
>          if(b_w==32){
> -            for(y=0; y < b_h; y++){
> -                *(uint32_t*)&dst[0 + y*stride]= color4;
> -                *(uint32_t*)&dst[4 + y*stride]= color4;
> -                *(uint32_t*)&dst[8 + y*stride]= color4;
> -                *(uint32_t*)&dst[12+ y*stride]= color4;
> -                *(uint32_t*)&dst[16+ y*stride]= color4;
> -                *(uint32_t*)&dst[20+ y*stride]= color4;
> -                *(uint32_t*)&dst[24+ y*stride]= color4;
> -                *(uint32_t*)&dst[28+ y*stride]= color4;
> +            for(y=0; y < b_h; y++, dst += stride){
> +                *(uint32_t*)&dst[0]= color4;
> +                *(uint32_t*)&dst[4]= color4;
> +                *(uint32_t*)&dst[8]= color4;
> +                *(uint32_t*)&dst[12]= color4;
> +                *(uint32_t*)&dst[16]= color4;
> +                *(uint32_t*)&dst[20]= color4;
> +                *(uint32_t*)&dst[24]= color4;
> +                *(uint32_t*)&dst[28]= color4;
>              }
>          }else if(b_w==16){
> -            for(y=0; y < b_h; y++){
> -                *(uint32_t*)&dst[0 + y*stride]= color4;
> -                *(uint32_t*)&dst[4 + y*stride]= color4;
> -                *(uint32_t*)&dst[8 + y*stride]= color4;
> -                *(uint32_t*)&dst[12+ y*stride]= color4;
> +            for(y=0; y < b_h; y++, dst += stride){
> +                *(uint32_t*)&dst[0]= color4;
> +                *(uint32_t*)&dst[4]= color4;
> +                *(uint32_t*)&dst[8]= color4;
> +                *(uint32_t*)&dst[12]= color4;
>              }
>          }else if(b_w==8){
> -            for(y=0; y < b_h; y++){
> -                *(uint32_t*)&dst[0 + y*stride]= color4;
> -                *(uint32_t*)&dst[4 + y*stride]= color4;
> +            for(y=0; y < b_h; y++, dst += stride){
> +                *(uint32_t*)&dst[0]= color4;
> +                *(uint32_t*)&dst[4]= color4;
>              }
>          }else if(b_w==4){
>              for(y=0; y < b_h; y++){

We can too add this little optimisation :

@@ -321,8 +322,9 @@ void ff_snow_pred_block(SnowContext *s, uint8_t *dst,
uint8_t *tmp, int stride,
             }
         }else{
             for(y=0; y < b_h; y++){
+                ystride = y * stride;
                 for(x=0; x < b_w; x++){
-                    dst[x + y*stride]= color;
+                    dst[x + ystride]= color;

But I see now than only the BLOC_INTRA type has been modified ...

=>  and the other bloc, that use mc_block(), seem to be the more important part
to be optimised :(

==> that is why it seemed too simple :)

@+
Yannoo


More information about the ffmpeg-devel mailing list