[FFmpeg-devel] [PATCH] Fix non-rounding up to next 16-bit aligned bug in IFF decoder

Måns Rullgård mans
Thu Apr 29 15:35:22 CEST 2010


Sebastian Vater <cdgs.basty at googlemail.com> writes:

> M?ns Rullg?rd a ?crit :
>> Sebastian Vater <cdgs.basty at googlemail.com> writes:
>>
>>   
>>> Just got the idea, we can get rid of the GetBitContext
>>> completely...Instead of reading 4 bits, we simply read a byte:
>>> const uint8_t lut_offsets = *buf++; // instead of get_bits(gb,4);
>>
>> That's a separate thing.
>
> Separate in what way? What did you mean exactly?

Separate from the LUT byte order.

>>> Then we do loop unrolling by 8 and do two accesses to lut one with >> 4
>>> and one with & 0x0F, or we get even rid of this and create a lut table
>>> with 256 entries using AV_WN64A / AV_RN64A ;-)
>>>
>>> The advance here is that on a 64 bit CPU we get another nice speed
>>> improvement ;-)
>>> If we avoid calculations with AV_RN64A etc.
>>>     
>>
>> Those macros don't do any calculations.  All they do is some magic to
>> avoid type aliasing errors.
>
> Yes, I know, but I meant stuff like (lut0[...] << 32ULL) | lut1[...];

Why on earth would you do that?

> But this isn't necessary if we use an 8-bit table storing uint64_t's...

That would fall apart completely on 32-bit machines.  I doubt any
speedup you might see on 64-bit is worth the added complexity of
doing it conditionally.  Just leave it as 32-bit.

>>> gcc just should use 2 registers on 32-bit CPU and that's it.
>>
>> Should, but doesn't.
>
> With the way I meant above, it should...I'll test that now, but without
> a completed table and tell you what it does.

Believe me, it doesn't.  GCC is terrible with 64-bit data on 32-bit
machines.  Do not tempt it.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list