[FFmpeg-devel] [PATCH] Optimization of original IFF codec

Sebastian Vater cdgs.basty
Mon Apr 26 17:50:37 CEST 2010


Hi Micha, I did a new patch which also optimizes 24bpp decoding.

Again, I did some benchmarks ;)

Michael Niedermayer a ?crit :
> On Mon, Apr 26, 2010 at 12:19:28AM +0200, Sebastian Vater wrote:
>   
>> Hi Michael!
>>
>> Michael Niedermayer a ?crit :
>>     
>>> On Sun, Apr 25, 2010 at 01:49:54PM +0200, Sebastian Vater wrote:
>>>   
>>>       
>>>>> that loop then can be unrolled by a factor of 4 and its inside for the
>>>>> uint8_t type case be implemented like:
>>>>>     v= lut[get_bits(&gb, 4)];
>>>>>     AV_WN32A(dst+b, AV_RN32A(dst+b) | v);
>>>>>   
>>>>>       
>>>>>           
>>>> The thing is that type can be both uint8_t and uint32_t. It's a #define
>>>> macro which gets the type (uint8_t or uint32_t) passed by.
>>>>
>>>> So not fixed yet because I'm unsure here, if those two lines can be done
>>>> with dst being uint32_t also.
>>>>     
>>>>         
>>> they can, and it will speed the uint8 case up significantly
>>>   
>>>       
>> When I understand you right, I have to create a lookup table the
>> following way:
>> For each of the 4-pair read bits:
>> {0000 = 0, 0001 = 1 << plane, 0010 = 0x100 << plane, 0011 = (1 << plane)
>> | (0x100 << plane), 0100 = (0x10000 << plane), ...}
>>
>> Is that correct?
>>     
Benchmarking original code (with my latest patch without lut):
basty at cdgs-basty:~/src/ffmpeg/build$ ./ffplay ../patches/Ooze.iff
FFplay version git-36b1b3c, Copyright (c) 2003-2010 the FFmpeg developers
  built on Apr 26 2010 00:00:19 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
  configuration:
  libavutil     50.14. 0 / 50.14. 0
  libavcodec    52.66. 0 / 52.66. 0
  libavformat   52.61. 0 / 52.61. 0
  libavdevice   52. 2. 0 / 52. 2. 0
  libswscale     0.10. 0 /  0.10. 0
[IFF @ 0x8b32790]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/Ooze.iff':
  Duration: N/A, bitrate: N/A
    Stream #0.0: Video: iff_byterun1, rgba, 666x536, PAR 1:1 DAR
333:268, 90k tbr, 90k tbn, 90k tbc
55480 dezicycles in decodeplane32, 1 runs, 0 skips
54105 dezicycles in decodeplane32, 2 runs, 0 skips
53517 dezicycles in decodeplane32, 4 runs, 0 skips
53095 dezicycles in decodeplane32, 8 runs, 0 skips
52895 dezicycles in decodeplane32, 16 runs, 0 skips
52772 dezicycles in decodeplane32, 32 runs, 0 skips
52663 dezicycles in decodeplane32, 64 runs, 0 skips
52584 dezicycles in decodeplane32, 128 runs, 0 skips
52938 dezicycles in decodeplane32, 256 runs, 0 skips
52717 dezicycles in decodeplane32, 512 runs, 0 skips
52682 dezicycles in decodeplane32, 1023 runs, 1 skips
52675 dezicycles in decodeplane32, 2045 runs, 3 skips sq=    0B f=0/0
52710 dezicycles in decodeplane32, 4088 runs, 8 skips
52810 dezicycles in decodeplane32, 8165 runs, 27 skipssq=    0B f=0/0
   0.39 A-V:  0.000 s:0.0 aq=    0KB vq=    0KB sq=    0B f=0/0   0/0

Benchmarking with your idea about lut table with my new implementation of this patch without inline statement:
FFplay version git-36b1b3c, Copyright (c) 2003-2010 the FFmpeg developers
  built on Apr 26 2010 00:00:19 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
  configuration:
  libavutil     50.14. 0 / 50.14. 0
  libavcodec    52.66. 0 / 52.66. 0
  libavformat   52.61. 0 / 52.61. 0
  libavdevice   52. 2. 0 / 52. 2. 0
  libswscale     0.10. 0 /  0.10. 0
[IFF @ 0x8b32790]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/Ooze.iff':
  Duration: N/A, bitrate: N/A
    Stream #0.0: Video: iff_byterun1, rgba, 666x536, PAR 1:1 DAR 333:268, 90k tbr, 90k tbn, 90k tbc
41950 dezicycles in decodeplane32, 1 runs, 0 skips
31175 dezicycles in decodeplane32, 2 runs, 0 skips
51882 dezicycles in decodeplane32, 4 runs, 0 skips
35820 dezicycles in decodeplane32, 8 runs, 0 skips
27796 dezicycles in decodeplane32, 16 runs, 0 skips
23752 dezicycles in decodeplane32, 32 runs, 0 skips
21754 dezicycles in decodeplane32, 64 runs, 0 skips
20713 dezicycles in decodeplane32, 128 runs, 0 skips
20193 dezicycles in decodeplane32, 256 runs, 0 skips
19934 dezicycles in decodeplane32, 512 runs, 0 skips
19814 dezicycles in decodeplane32, 1023 runs, 1 skips
19756 dezicycles in decodeplane32, 2047 runs, 1 skips
19752 dezicycles in decodeplane32, 4092 runs, 4 skips
19724 dezicycles in decodeplane32, 8184 runs, 8 skips sq=    0B f=0/0
   2.35 A-V:  0.000 s:0.0 aq=    0KB vq=    0KB sq=    0B f=0/0   0/0

Benchmarking with your idea about lut table with my new implementation of this patch with inline statement:
basty at cdgs-basty:~/src/ffmpeg/build$ ./ffplay ../patches/Ooze.iff
FFplay version git-36b1b3c, Copyright (c) 2003-2010 the FFmpeg developers
  built on Apr 26 2010 00:00:19 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
  configuration:
  libavutil     50.14. 0 / 50.14. 0
  libavcodec    52.66. 0 / 52.66. 0
  libavformat   52.61. 0 / 52.61. 0
  libavdevice   52. 2. 0 / 52. 2. 0
  libswscale     0.10. 0 /  0.10. 0
[IFF @ 0x8b32790]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/Ooze.iff':
  Duration: N/A, bitrate: N/A
    Stream #0.0: Video: iff_byterun1, rgba, 666x536, PAR 1:1 DAR 333:268, 90k tbr, 90k tbn, 90k tbc
48350 dezicycles in decodeplane32, 1 runs, 0 skips
38375 dezicycles in decodeplane32, 2 runs, 0 skips
32605 dezicycles in decodeplane32, 4 runs, 0 skips
29513 dezicycles in decodeplane32, 8 runs, 0 skips
27933 dezicycles in decodeplane32, 16 runs, 0 skips
30328 dezicycles in decodeplane32, 32 runs, 0 skips
29113 dezicycles in decodeplane32, 64 runs, 0 skips
27651 dezicycles in decodeplane32, 128 runs, 0 skips
26902 dezicycles in decodeplane32, 256 runs, 0 skips
26522 dezicycles in decodeplane32, 512 runs, 0 skips
26341 dezicycles in decodeplane32, 1023 runs, 1 skips
26305 dezicycles in decodeplane32, 2046 runs, 2 skips
26296 dezicycles in decodeplane32, 4092 runs, 4 skips sq=    0B f=0/0
26239 dezicycles in decodeplane32, 8182 runs, 10 skips
   1.30 A-V:  0.000 s:0.0 aq=    0KB vq=    0KB sq=    0B f=0/0   0/0

So, as opposed to decodeplane8 where adding the inline statement makes it much faster, with decodeplane32 we have just the opposite...

If you're looking at this patch, you'll notice at I commented out two AV_WN64A lines...I did this because they made everything much slower.

But this might not be the case with a 64-bit CPU, since I haven't one, could someone check with this?

-- 

Best regards,
                   :-) Basty/CDGS (-:

-------------- next part --------------
A non-text attachment was scrubbed...
Name: iff-optimize-lut.patch
Type: text/x-patch
Size: 6215 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100426/aa130de4/attachment.bin>



More information about the ffmpeg-devel mailing list