[FFmpeg-devel] [PATCH] Fix apply_welch_window_sse2 compilation on Mac OS X/x86

Loren Merritt lorenm
Thu Oct 18 13:17:48 CEST 2007


On Thu, 18 Oct 2007, Pierre d'Herbemont wrote:
> On Oct 18, 2007, at 2:23 AM, Trent Piepho wrote:
>> On Thu, 18 Oct 2007, Pierre d'Herbemont wrote:
>>> On Oct 17, 2007, at 10:05 PM, Loren Merritt wrote:
>>>
>>>> But I don't think the fix is correct: It's not valid to jmp from
>>>> one asm block to another. If you split the asm block, you also have
>>>> to write the loop part in C. And the reason I put it in asm is that
>>>> gcc doesn't generate as efficient a loop.
>>>
>>> There is no problem in jumping from one asm block to an other.
>>> However there can be problem with labels.
>>
>> Suppose gcc _did_ spill a register to the stack?
>
> Right. But again, in this case, we are only assuming that %xmmx
> registers won't be messed up, we have don't use the stack directly,
> nor any general registers. We may not rely on this for sure.
> Experience prove that in this case it just work.

The whole point of splitting the asm block was to allow gcc to spill 
registers in between, because it doesn't have 6 general regs free. And 
look at your own disassembly: it did. So you jump from the 2nd asm block 
to the 1st without running the appropriate spilling code, and run the 1st 
block with the register values from the 2nd. Then you run the 
initialization code for the 2nd block again, which gcc expected to only 
run once.

BTW, spilling shouldn't be needed. It's possible to write the loop with 5 
regs, but that's slower than 6 if you have 6. Ideally I'd be able write 
the loop part in C and gcc would use 5 or 6 regs for addressing depending 
on what's available, but that's not what happens in practice.

>> You also have no guarantee that gcc won't move code around so that
>> what was before or after an asm is now on the other side.  And no, 
>> making the asm volatile does not work!  Suppose you code this:
>>
>> for(i=0;i<100;i++) {
>>    asm volatile("1: ..." : : "=r"(i));
>>    asm volatile("... je 1b" : :);
>> }
>>
>> gcc might decide the perfect place to put the i++ is between the
>> two asm
>> blocks, which will completely mess up the for loop, since now i is
>> incremented inside the asm loop.
>
> I am not sure of that. Since we are using "=r"(i), gcc can't put the i
> ++ part afterwards. Or that's a bug.
> Again, in our specific case we don't have that kind of problem anyway.

Details. Here's a trivial case that breaks in gcc-4.1.2:

void func0(void)
{
     int i=9, j=0;
     asm volatile("1: \n inc %0 \n dec %1 \n jg 1b \n" :"+r"(j), "+r"(i));
}

void func1(void)
{
     int i=9, j=0;
     asm volatile("1: \n inc %0 \n" :"+r"(j));
     asm volatile("dec %0 \n jg 1b \n" :"+r"(i));
}

Disassembly of section .text:
00000000 <func0>:
    0:   31 c0                   xor    %eax,%eax
    2:   ba 09 00 00 00          mov    $0x9,%edx
    7:   40                      inc    %eax
    8:   4a                      dec    %edx
    9:   7f fc                   jg     7 <func0+0x7>
    b:   c3                      ret

00000010 <func1>:
   10:   31 c0                   xor    %eax,%eax
   12:   40                      inc    %eax
   13:   ba 09 00 00 00          mov    $0x9,%edx
   18:   4a                      dec    %edx
   19:   7f f7                   jg     12 <func1+0x2>
   1b:   c3                      ret

Note how the initialization of i was moved inside the loop, thus making
it an infinite loop. And this is the same kind of breakage as occurs in 
your patch.

--Loren Merritt





More information about the ffmpeg-devel mailing list