[FFmpeg-devel] Trans.: a64multienc.c and drawutils.c optimisations

yann.lepetitcorps at free.fr yann.lepetitcorps at free.fr
Thu Dec 29 00:47:37 CET 2011


> > +    for(i=0;i<num;i++)
> > +        dst[i] = set16;
>
> If you don't trust the compiler, this variant should make it more
> explicit what you want the end-result to look like:
> int16_t *end = dst + num;
> while (dst < end)
>   *dst++ = set16;
> Disadvantage: compiler potentially will not recognize it as a loop
> and thus not do advanced optimizations like auto-vectorization etc.
> Of course depending on the specifics it might make a little to a lot
> more sense to unroll the loop.

Something like this ?

    #define LOOP_UNROLL_SIZE 8

    int16_t *end = dst + num;

    while (num > LOOP_UNROLL_SIZE)
    {
            dst[0] = set16;
            dst[1] = set16;
            ...
            dst[LOOP_UNROLL_SIZE-1] = set16;
            dst += LOOP_UNROLL_SIZE;
            num -= LOOP_UNROLL_SIZE;
        }
   }

   while ( dst < end)
        *dst++ = set16;


(the while (num > LOOP_UNROLL_SIZE) bloc can too use MMX/SSE registers for to
make the copy by blocs of four/height set16 values)

> > +void ff_memset_sized(char *dst, char *src, int num, int stepsize)
> > +{
> > +    int i;
> > +
> > +    for (i = 0; i < num; i++, dst += stepsize)
> > +        memcpy(dst, src, stepsize);
> > +}
>
> Of course there's the question if one single macro (or av_always_inline
> function) with this content wouldn't serve the same purpose as all those
> different functions.
> For non-x86 alignment might be a bit of an issue though (as in, this
> variant doesn't tell the compiler that dst will always be aligned to
> stepsize).

We can perhaps use some #define for to handle problematics platforms ?


@+
Yannoo


More information about the ffmpeg-devel mailing list