[FFmpeg-devel] [PATCH]Fix alpha for iff deep

Don Moir donmoir at comcast.net
Sun Mar 11 15:01:17 CET 2012


>>
>> Seems to be a strange lack of any optimization.
>>
>> if you look at this code in the patch:
>>
>> +        for(x = 0; x < avctx->width; x++)
>> +            row[4 * x + 3] = row[4 * x + 3] & 0xF0 | (row[4 * x + 3] >> 
>> 4);
>>
>> would be better if:
>>
>> +      row += 3;
>> +      for(x = 0; x < avctx->width; x++, row+=4)
>> +          *row = *row & 0xF0 | (*row >> 4);
>>
>> But this is more of a minor point then the rest of the code being
>> less than optimal in iff.c. It should be obvious but I could
>> elaborate. I bring this up because sometimes optimization issues are
>> sometime nic-picked to death and sometimes not.
>
> There are several reasons for that.
> In the case that you mention, it is because the code you propose
> can easily be slower than the current code, particularly
> on register-starved CISC configurations like 32 bit x86 with PIC.
> The compiler might be able to encode the 4*x+3 into a single
> instruction in the first case (thus it is "free"), whereas your
> proposal might consume an extra register.

The MS compiler does a pretty good job on optimizing the above and the 
difference was pretty minor.

Assuming row is in eax:

In this case MS added 7 to the row offset first so reason you see BYTE PTR 
[eax - 4]

row[4 * x + 3] = row[4 * x + 3] & 0xF0 | (row[4 * x + 3] >> 4); compiled as:

mov  cl, BYTE PTR [eax-4]
mov  dl, cl
and  dl, 240   ; 000000f0H
shr  cl, 4
or  dl, cl
mov  BYTE PTR [eax-4], dl

In this case the hints to compiler was enough to reduce [eax - 4] to [eax]

*row = *row & 0xF0 | (*row >> 4); compiler as:

mov  cl, BYTE PTR [eax]
mov  dl, cl
and  dl, 240   ; 000000f0H
shr  cl, 4
or  dl, cl
mov  BYTE PTR [eax], dl

It could be argued that MS got this wrong and the 2 should be identical. Not 
sure what other compilers might do.

I did say I thought this was a minor case because I figured the compiler 
would take care of the above case mostly.

I was more looking at the usage of FFMIN thru-out the code and this could be 
eliminated with a little more complex code. A few other cases that seemed 
odd to me as well.

As in: memcpy (row,buf,FFMIN(raw_width,buf_end - buf));

> Fringe formats generally aren't optimized well,
> and easy to read code also is rather more relevant than performance
> most of the time.

Certainly there is a balance here and the reason I don't use terse variable 
names like 's' for a context etc.



More information about the ffmpeg-devel mailing list