[FFmpeg-devel] snow.c optimisations

Reimar Döffinger Reimar.Doeffinger at gmx.de
Thu Dec 29 14:05:51 CET 2011



On 29 Dec 2011, at 04:07, yann.lepetitcorps at free.fr wrote:
> 
> diff --git a/libavcodec/snow.c b/libavcodec/snow.c
> index 0ce9b28..4aae985 100644
> --- a/libavcodec/snow.c
> +++ b/libavcodec/snow.c
> @@ -190,16 +190,26 @@ static void mc_block(Plane *p, uint8_t *dst, const uint8_t
> *src, int stride, int
>     tmp2= tmp2t[1];
> 
>     if(b&2){
> +
> +        int  s_1 = (HTAPS_MAX/2-4)*stride;
> +        int  s0 = (HTAPS_MAX/2-3)*stride;
> +        int  s1 = (HTAPS_MAX/2-2)*stride;
> +        int  s2 = (HTAPS_MAX/2-1)*stride;
> +        int  s3 = (HTAPS_MAX/2-0)*stride;
> +        int  s4 = (HTAPS_MAX/2+1)*stride;
> +        int  s5 = (HTAPS_MAX/2+2)*stride;
> +        int  s6 = (HTAPS_MAX/2+3)*stride;

That does not even remotely fit into the register set on x86 and since the multiplication is with a constant probably significantly slower.

>         const unsigned color  = block->color[plane_index];
>         const unsigned color4 = color*0x01010101;
>         if(b_w==32){
> -            for(y=0; y < b_h; y++){
> -                *(uint32_t*)&dst[0 + y*stride]= color4;
> -                *(uint32_t*)&dst[4 + y*stride]= color4;
> -                *(uint32_t*)&dst[8 + y*stride]= color4;
> -                *(uint32_t*)&dst[12+ y*stride]= color4;
> -                *(uint32_t*)&dst[16+ y*stride]= color4;
> -                *(uint32_t*)&dst[20+ y*stride]= color4;
> -                *(uint32_t*)&dst[24+ y*stride]= color4;
> -                *(uint32_t*)&dst[28+ y*stride]= color4;
> +            for(y=0; y < b_h; y++, dst += stride){
> +                memset(dst,color4, 32);

Using color4 with memset makes for rather confusing code IMO.
Also relying on the compiler inlining a suitably optimized variant of memset in performance-critical code is at least risky.


More information about the ffmpeg-devel mailing list