[FFmpeg-devel] [PATCH] Use av_clip_uint8 in swscale.

Sun Aug 16 16:46:33 CEST 2009

On Sun, Aug 16, 2009 at 01:19:39AM +0100, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> 
> > On Sat, Aug 15, 2009 at 05:53:49PM +0100, M?ns Rullg?rd wrote:
> >> Reimar D?ffinger <Reimar.Doeffinger at gmx.de> writes:
> >> 
> >> > On Sat, Aug 15, 2009 at 12:27:49PM -0300, Ramiro Polla wrote:
> >> >> diff --git a/swscale.c b/swscale.c
> >> >> index c513066..340acfc 100644
> >> >> --- a/swscale.c
> >> >> +++ b/swscale.c
> >> >> @@ -688,21 +688,12 @@ static inline void yuv2nv12XinC(const int16_t *lumFilter, const int16_t **lumSrc
> >> >>  
> >> >>  #define YSCALE_YUV_2_PACKEDX_C(type,alpha) \
> >> >>          YSCALE_YUV_2_PACKEDX_NOCLIP_C(type,alpha)\
> >> >> -        if ((Y1|Y2|U|V)&256)\
> >> >> -        {\
> >> >> -            if (Y1>255)   Y1=255; \
> >> >> -            else if (Y1<0)Y1=0;   \
> >> >> -            if (Y2>255)   Y2=255; \
> >> >> -            else if (Y2<0)Y2=0;   \
> >> >> -            if (U>255)    U=255;  \
> >> >> -            else if (U<0) U=0;    \
> >> >> -            if (V>255)    V=255;  \
> >> >> -            else if (V<0) V=0;    \
> >> >> -        }\
> >> >> -        if (alpha && ((A1|A2)&256)){\
> >> >> -            A1=av_clip_uint8(A1);\
> >> >> -            A2=av_clip_uint8(A2);\
> >> >> -        }
> >> >> +        Y1 = av_clip_uint8(Y1); \
> >> >> +        Y2 = av_clip_uint8(Y2); \
> >> >> +        U  = av_clip_uint8(U ); \
> >> >> +        V  = av_clip_uint8(V ); \
> >> >> +        A1 = av_clip_uint8(A1); \
> >> >> +        A2 = av_clip_uint8(A2); \
> >> >
> >> > This
> >> >
> >> >> -            if ((u|v)&256){
> >> >> -                if (u<0)        u=0;
> >> >> -                else if (u>255) u=255;
> >> >> -                if (v<0)        v=0;
> >> >> -                else if (v>255) v=255;
> >> >> -            }
> >> >> -
> >> >> -            uDest[i]= u;
> >> >> -            vDest[i]= v;
> >> >> +            uDest[i]= av_clip_uint8((chrSrc[i       ]+64)>>7);
> >> >> +            vDest[i]= av_clip_uint8((chrSrc[i + VOFW]+64)>>7);
> >> >
> >> > And this need to be benchmarked (well, or at least have a look at the
> >> > generated code.
> >> > If clipping is very, very rare the original code might be faster.
> >> 
> >> Depends on hardware.  On processors with fast clipping instructions,
> >> always clipping is likely to be faster.
> >
> > if they are fast enough, sure, but which cpu would that be?
> 
> ARM and AVR32 to name two.

I dont really know ARM & AVR32 asm ...
but i must admit that iam surprised that some cpu has cliping instructions
that match in throughput a simple bitwise or. I guess i should spend
more time with non x86 asm

> 
> > besides which compiler would turn the pure C av_clip_uint8 into such
> > instructions ?
> 
> We could write an asm version of it.

yes but that brings us back to the issue of cpu specific optimizations
in libavutil headers ...

besides we would need more than a optimized av_clip_uint8() because on
x86 4 or and 1 clip check is faster than 4 cliping checks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

> ... defining _GNU_SOURCE...
For the love of all that is holy, and some that is not, don't do that.
-- Luca & Mans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090816/af6d41e1/attachment.pgp>