[FFmpeg-devel] [PATCH] Fix non-rounding up to next 16-bit aligned bug in IFF decoder

Sebastian Vater cdgs.basty
Fri Apr 30 14:05:25 CEST 2010


Sebastian Vater a ?crit :
> Sebastian Vater a ?crit :
>   
>> Ronald S. Bultje a ?crit :
>>   
>>     
>>> Hi,
>>>
>>> On Thu, Apr 29, 2010 at 2:45 PM, Sebastian Vater
>>> <cdgs.basty at googlemail.com> wrote:
>>>   
>>>     
>>>       
>>>> Did another version:
>>>>     
>>>>       
>>>>         
>>> [..]
>>>   
>>>     
>>>       
>>>>    START_TIMER;
>>>>    const uint32_t *lut = plane8_lut[plane];
>>>>    for(; --buf_size != 0; dst += 8) {
>>>>        uint32_t v;
>>>>        const unsigned x = *buf++;
>>>>        v = AV_RN32A(dst) | lut[x >> 4];
>>>>        AV_WN32A(dst, v);
>>>>        v = AV_RN32A(dst + 4) | lut[x & 0x0F];
>>>>        AV_WN32A(dst + 4, v);
>>>>    }
>>>>    STOP_TIMER("decodeplane8");
>>>>     
>>>>       
>>>>         
>>> [..]
>>>   
>>>     
>>>       
>>>>      58:       0f b6 16                movzbl (%esi),%edx
>>>>      5b:       83 c6 01                add    $0x1,%esi
>>>>      5e:       89 d0                   mov    %edx,%eax
>>>>      60:       83 e2 0f                and    $0xf,%edx
>>>>      63:       c1 e8 04                shr    $0x4,%eax
>>>>      66:       8b 04 87                mov    (%edi,%eax,4),%eax
>>>>      69:       09 01                   or     %eax,(%ecx)
>>>>      6b:       8b 04 97                mov    (%edi,%edx,4),%eax
>>>>      6e:       09 41 04                or     %eax,0x4(%ecx)
>>>>      71:       83 c1 08                add    $0x8,%ecx
>>>>      74:       83 eb 01                sub    $0x1,%ebx
>>>>      77:       75 df                   jne    58 <decodeplane8+0x58>
>>>>     
>>>>       
>>>>         
>>> [..]
>>>   
>>>     
>>>       
>>>> 9067 dezicycles in decodeplane8, 32 runs, 0 skips
>>>> 8562 dezicycles in decodeplane8, 64 runs, 0 skips
>>>> 8318 dezicycles in decodeplane8, 128 runs, 0 skips
>>>> 8195 dezicycles in decodeplane8, 256 runs, 0 skips
>>>> 8132 dezicycles in decodeplane8, 512 runs, 0 skips
>>>> 8096 dezicycles in decodeplane8, 1023 runs, 1 skips
>>>> 8077 dezicycles in decodeplane8, 2046 runs, 2 skips
>>>> 8070 dezicycles in decodeplane8, 4094 runs, 2 skips
>>>>     
>>>>       
>>>>         
>>> That looks good to me.
>>>   
>>>     
>>>       
>> Using uint64_t again with 8-bit lut:
>> /**
>>  * Decode interleaved plane buffer up to 8bpp
>>  * @param dst Destination buffer
>>  * @param buf Source buffer
>>  * @param buf_size
>>  * @param bps bits_per_coded_sample (must be <= 8)
>>  * @param plane plane number to decode as
>>  */
>> static void decodeplane8(uint8_t *dst,
>>                          const uint8_t *buf,
>>                          unsigned buf_size,
>>                          const unsigned bps,
>>                          const unsigned plane)
>> {
>>     START_TIMER;
>>     const uint64_t *lut = plane8_lut[plane];
>>     for(; --buf_size != 0; dst += 8) {
>>         const uint64_t v = AV_RN64A(dst) | lut[*buf++];
>>         AV_WN64A(dst, v);
>>     }
>>     STOP_TIMER("decodeplane8");
>> }
>>
>>       58:       8b 54 24 64             mov    0x64(%esp),%edx
>>       5c:       8b 4c 24 44             mov    0x44(%esp),%ecx
>>       60:       8b 5d 04                mov    0x4(%ebp),%ebx
>>       63:       8b 45 00                mov    0x0(%ebp),%eax
>>       66:       0f b6 32                movzbl (%edx),%esi
>>       69:       83 44 24 64 01          addl   $0x1,0x64(%esp)
>>       6e:       8b 54 f1 04             mov    0x4(%ecx,%esi,8),%edx
>>       72:       0b 04 f1                or     (%ecx,%esi,8),%eax
>>       75:       09 da                   or     %ebx,%edx
>>       77:       89 45 00                mov    %eax,0x0(%ebp)
>>       7a:       89 55 04                mov    %edx,0x4(%ebp)
>>       7d:       83 c5 08                add    $0x8,%ebp
>>       80:       83 ef 01                sub    $0x1,%edi
>>       83:       75 d3                   jne    58 <decodeplane8+0x58>
>>
>> Benchmark results:
>> basty at cdgs-basty:~/src/ffmpeg/build$ ./ffplay ../patches/MRLake.iff
>> FFplay version git-5b9f10d, Copyright (c) 2003-2010 the FFmpeg developers
>>   built on Apr 29 2010 15:19:11 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
>>   configuration:
>>   libavutil     50.14. 0 / 50.14. 0
>>   libavcodec    52.66. 0 / 52.66. 0
>>   libavformat   52.61. 0 / 52.61. 0
>>   libavdevice   52. 2. 0 / 52. 2. 0
>>   libswscale     0.10. 0 /  0.10. 0
>> [IFF @ 0x8b33790]Estimating duration from bitrate, this may be inaccurate
>> Input #0, IFF, from '../patches/MRLake.iff':
>>   Duration: N/A, bitrate: N/A
>>     Stream #0.0: Video: iff_byterun1, pal8, 737x595, PAR 17:20 DAR
>> 737:700, 90k tbr, 90k tbn, 90k tbc
>> 40560 dezicycles in decodeplane8, 1 runs, 0 skips
>> 35145 dezicycles in decodeplane8, 2 runs, 0 skips
>> 23632 dezicycles in decodeplane8, 4 runs, 0 skips
>> 17127 dezicycles in decodeplane8, 8 runs, 0 skips
>> 11680 dezicycles in decodeplane8, 16 runs, 0 skips
>> 8965 dezicycles in decodeplane8, 32 runs, 0 skips
>> 7628 dezicycles in decodeplane8, 64 runs, 0 skips
>> 6939 dezicycles in decodeplane8, 128 runs, 0 skips
>> 6565 dezicycles in decodeplane8, 256 runs, 0 skips
>> 6385 dezicycles in decodeplane8, 512 runs, 0 skips
>> 6290 dezicycles in decodeplane8, 1024 runs, 0 skips
>> 6246 dezicycles in decodeplane8, 2048 runs, 0 skips
>> 6224 dezicycles in decodeplane8, 4096 runs, 0 skips
>>    1.94 A-V:  0.000 s:0.0 aq=    0KB vq=    0KB sq=    0B f=0/0   0/0
>>
>> Way faster, but when I look at the code I wonder why...probably because
>> it handles pipeline stalling much better...
>>   
>>     
> I have attached the uint64_t patch. Please review it!
>
> The 8-bit table declaration looks a little bit long...
>   

Sorry, the patch from yesterday had a wrong dp8 8-bit table which caused
graphics glitches. The new patch attached here fixes that.

BTW, I got confirmed that this patch also works on big-endian now!
Little endian was tested by me, so it works now for both...if someone
could help me shortening the #define stuff here for the 8-bit table,
I'ld be glad.

-- 

Best regards,
                   :-) Basty/CDGS (-:

-------------- next part --------------
A non-text attachment was scrubbed...
Name: iff-decoder-fix-heavy-dp8.patch
Type: text/x-diff
Size: 17260 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100430/7faf6e85/attachment.patch>



More information about the ffmpeg-devel mailing list