[FFmpeg-devel] [PATCH] Optimization of original IFF codec

Sebastian Vater cdgs.basty
Mon Apr 26 22:06:42 CEST 2010


Hi Mans!

M?ns Rullg?rd a ?crit :
> Sebastian Vater <cdgs.basty at googlemail.com> writes:  
>   
>> Btw, you brought me to a nice idea with your complaints...I could
>> precalculate all these values for each plane in decode_init and then
>> just memcpy it in decodeplane8/24 to local stack, what do you think of this?
>>     
>
> Skip the memcpy and make the table static const.
>
>   
>> This will yield in 8 (planes)*4 (uint32_t's)*16 (sizeof (struct lut)) =
>> 512 bytes of tables for decodeplane8
>>     
>
> 512 bytes is nothing to worry about.
>
>   
>> and 24 (planes)*4 (uint32_t's)*16 (sizeof (struct lut))*4
>> (lut[0123]) = 6144 bytes.
>>     
>
> 6k isn't a lot either.  Just store it statically.
>
>   
Bad news here...
Tried almost everything, the new code is not faster than the old one I
had before. :-(

It just wastes memory for gain of nothing.

I tried:
    const uint32_t lut[16];
    memcpy (lut, &decodeplane8_tab[plane], 16 * sizeof(uint32_t));

In best case as fast as the original. Usually slower.

Then I tried:
    const uint32_t *lut = &decodeplane8_tab[plane];

Results are same as above.

Finally I tried without local stack copy as above:
        const uint32_t v = decodeplane8_tab[plane][get_bits(&gb, 4)];
        AV_WN32A(dst+i, AV_RN32A(dst+i) | v);

This is the slowest of them all...

Please don't ask why, but it's not worth the hassle. I think discarding
the table and keep it the way as I submitted it in the patch is the
best. :-(

Here is the code, how I initialize these tables:
#define DECODEPLANE8(plane) {0x0000000, \
                             0x1000000 << plane, \
                             0x0010000 << plane, \
                             0x1010000 << plane, \
                             0x0000100 << plane, \
                             0x1000100 << plane, \
                             0x0010100 << plane, \
                             0x1010100 << plane, \
                             0x0000001 << plane, \
                             0x1000001 << plane, \
                             0x0010001 << plane, \
                             0x1010001 << plane, \
                             0x0000101 << plane, \
                             0x1000101 << plane, \
                             0x0010101 << plane, \
                             0x1010101 << plane} \

// 8 planes * 4-bit mask
static const uint32_t decodeplane8_tab[8][16] = {DECODEPLANE8(0), \
                                                 DECODEPLANE8(1), \
                                                 DECODEPLANE8(2), \
                                                 DECODEPLANE8(3), \
                                                 DECODEPLANE8(4), \
                                                 DECODEPLANE8(5), \
                                                 DECODEPLANE8(6), \
                                                 DECODEPLANE8(7)};

// 24 planes * 4 lookup tables each * 4-bit mask
#define DECODEPLANE24(plane) {{0, \
                               0, \
                               0, \
                               0, \
                               0, \
                               0, \
                               0, \
                               0, \
                               1 << plane, \
                               1 << plane, \
                               1 << plane, \
                               1 << plane, \
                               1 << plane, \
                               1 << plane, \
                               1 << plane, \
                               1 << plane}, \
                              {0, \
                               0, \
                               0, \
                               0, \
                               1 << plane, \
                               1 << plane, \
                               1 << plane, \
                               1 << plane, \
                               0, \
                               0, \
                               0, \
                               0, \
                               1 << plane, \
                               1 << plane, \
                               1 << plane, \
                               1 << plane}, \
                              {0, \
                               0, \
                               1 << plane, \
                               1 << plane, \
                               0, \
                               0, \
                               1 << plane, \
                               1 << plane, \
                               0, \
                               0, \
                               1 << plane, \
                               1 << plane, \
                               0, \
                               0, \
                               1 << plane, \
                               1 << plane}, \
                              {0, \
                               1 << plane, \
                               0, \
                               1 << plane, \
                               0, \
                               1 << plane, \
                               0, \
                               1 << plane, \
                               0, \
                               1 << plane, \
                               0, \
                               1 << plane, \
                               0, \
                               1 << plane, \
                               0, \
                               1 << plane}}

static const uint32_t decodeplane24_tab[24][4][16] = {DECODEPLANE24( 0), \
                                                      DECODEPLANE24( 1), \
                                                      DECODEPLANE24( 2), \
                                                      DECODEPLANE24( 3), \
                                                      DECODEPLANE24( 4), \
                                                      DECODEPLANE24( 5), \
                                                      DECODEPLANE24( 6), \
                                                      DECODEPLANE24( 7), \
                                                      DECODEPLANE24( 8), \
                                                      DECODEPLANE24( 9), \
                                                      DECODEPLANE24(10), \
                                                      DECODEPLANE24(11), \
                                                      DECODEPLANE24(12), \
                                                      DECODEPLANE24(13), \
                                                      DECODEPLANE24(14), \
                                                      DECODEPLANE24(15), \
                                                      DECODEPLANE24(16), \
                                                      DECODEPLANE24(17), \
                                                      DECODEPLANE24(18), \
                                                      DECODEPLANE24(19), \
                                                      DECODEPLANE24(20), \
                                                      DECODEPLANE24(21), \
                                                      DECODEPLANE24(22), \
                                                      DECODEPLANE24(23)}; \

-- 

Best regards,
                   :-) Basty/CDGS (-:

Warum ich spirituell bin? Ganz einfach, weil ich lieber nach
der Formel des Weltfriedens statt nach der Weltformel suche.




More information about the ffmpeg-devel mailing list