[FFmpeg-devel] [PATCH] Fix non-rounding up to next 16-bit aligned bug in IFF decoder

Thu Apr 29 20:18:36 CEST 2010

Hi,

On Thu, Apr 29, 2010 at 2:08 PM, Sebastian Vater
<cdgs.basty at googlemail.com> wrote:
> Ronald S. Bultje a ?crit :
>> On Thu, Apr 29, 2010 at 10:37 AM, Sebastian Vater
>> <cdgs.basty at googlemail.com> wrote:
>>> ? ? ?66: ? ? ? 0f b6 c0 ? ? ? ? ? ? ? ?movzbl %al,%eax
>>
>> Err...?
>
> Zero extends al to eax in order to use eax as offset for next instruction.

I know that. It shouldn't be necessary. gcc is screwing up here, and
we might be able to help gcc to not screw up.

> The question remains if it's better to use a 8-bit or 4-bit table with
>>> and &.
>
> For me it seems that using 8-bit table is slower at initial steps but
> becomes faster in the next turns (very probably due to more cache misses).
>
> I think the speed advantage is proportional to image size, i.e. bigger
> images benefit more from a 8-bit table while smaller ones more from a
> 4-bit table.
>
> Since speed gain is more important for larger images, the 8-bit table is
> probably the way to go. What do you think?

Ideally, we'd see optimal code for both cases before we can decide
which is better. Right now the code for the 4-bit table path is
clearly suboptimal, which makes any kind of fair comparison between
the two impossible.

But in general, sure, a 8-bit codepath is fine with me.

Ronald