[Ffmpeg-devel] a little optim for a SSE version of H263_LOOP_FILTER

Kostya kostya.shishkov
Sun Nov 12 05:54:06 CET 2006


On Fri, Nov 10, 2006 at 11:48:16PM +0100, skal wrote:
>    btw, while i have the mike:
> 
>    seems to me the following replacement functions for 
>    vc1_v_overlap_c() and vc1_h_overlap_c() in vc1dsp.c:31
>    are likely to be faster (and bitwise equivalent of course)
> 
> static void vc1_v_overlap_c(uint8_t* src, int stride, int rnd)
> {
>     int i;
>     for(i = 0; i < 8; i++) {
>         const int a = src[-2*stride];
>         const int b = src[-stride];
>         const int c = src[0];
>         const int d = src[stride];
>         const int d1 = ( a-d       + 3 + rnd ) >> 3;
>         const int d2 = ( a-d + b-c + 4 - rnd ) >> 3;
>         src[-2*stride] = clip_uint8(a-d1);
>         src[-stride]   = clip_uint8(b+d2);
>         src[0]         = clip_uint8(c-d2);
>         src[stride]    = clip_uint8(d+d1);
>         src++;
>     }
> }
> 
>    but i might of course be wrong...

They are almost correct (it should be read 'b-d2' and 'c+d2' instead) - except the rounding:
original:
 4-rnd
 3+rnd
 4-rnd
 3+rnd
yours:
 -3-rnd
 -4-rnd
 4+rnd
 3+rnd

> 
>    bye!
> 
> Skal
> 
> 




More information about the ffmpeg-devel mailing list