[FFmpeg-devel] [PATCH] Optimization of original IFF codec

Måns Rullgård mans
Mon Apr 26 19:50:28 CEST 2010


Sebastian Vater <cdgs.basty at googlemail.com> writes:

> Sebastian Vater a ?crit :
>> Hi Mans!
>>
>> I'll just commit a series of patches which should prepare the
>> decodeplane8/32 stuff.
>> The first one simply removes that #define hack and really makes two
>> functions out of them. Could you review it, please?
>>
>>   
> So here follows the decodeplane8 optimized stuff, please review that also...
>
> -- 
>
> Best regards,
>                    :-) Basty/CDGS (-:
>
> diff --git a/libavcodec/iff.c b/libavcodec/iff.c
> index b57c0a7..b0e2118 100644
> --- a/libavcodec/iff.c
> +++ b/libavcodec/iff.c
> @@ -94,15 +94,39 @@ static av_cold int decode_init(AVCodecContext *avctx)
>   * @param bps bits_per_coded_sample (must be <= 8)
>   * @param plane plane number to decode as
>   */
> -static void decodeplane8(uint8_t *dst, const uint8_t *const buf, int buf_size, int bps, int plane)
> +static inline void decodeplane8(uint8_t *dst,
> +                                const uint8_t *const buf,
> +                                const unsigned buf_size,
> +                                const unsigned bps,
> +                                const unsigned plane)
>  {
>      GetBitContext gb;
> -    int i, b;
> +    unsigned i;
> +    const unsigned b = (buf_size * 8) + bps - 1;
> +    const unsigned b32 = b & ~3;
> +    const uint32_t lut[] = {0x0000000,
> +                            0x1000000 << plane,
> +                            0x0010000 << plane,
> +                            0x1010000 << plane,
> +                            0x0000100 << plane,
> +                            0x1000100 << plane,
> +                            0x0010100 << plane,
> +                            0x1010100 << plane,
> +                            0x0000001 << plane,
> +                            0x1000001 << plane,
> +                            0x0010001 << plane,
> +                            0x1010001 << plane,
> +                            0x0000101 << plane,
> +                            0x1000101 << plane,
> +                            0x0010101 << plane,
> +                            0x1010101 << plane};
>      init_get_bits(&gb, buf, buf_size * 8);
> -    for(i = 0; i < (buf_size * 8 + bps - 1) / bps; i++) {
> -        for (b = 0; b < bps; b++) {
> -            dst[ i*bps + b ] |= get_bits1(&gb) << plane;
> -        }
> +    for(i = 0; i < b32; i += 4) {
> +        const uint32_t v = lut[get_bits(&gb, 4)];
> +        AV_WN32A(dst+i, AV_RN32A(dst+i) | v);
> +    }
> +    for(i = b32; i < b; i++) {
> +        dst[i] |= get_bits1(&gb) << plane;
>      }
>  }

This is inefficient.  You are building the table afresh on each call
to the function.  Make the table static const, dropping the shift, and
instead shift the table value inside the loop.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list