[FFmpeg-devel] Subject: Re: swscale-test segfault with 64-bit icc 11.1

Måns Rullgård mans
Wed Jul 21 01:23:41 CEST 2010


"Winterton, Richard" <richard.winterton at intel.com> writes:

>> On Sat, Jul 17, 2010 at 04:50:10PM -0300, Ramiro Polla wrote:
>>> Hi,
>>>
>>> swscale-test segfaults when built with 64-bit icc 11.1 (20100414). The
>>> function that fails is hyscale_fast_MMX2(). Here's a disassembly of
>>> the function:
>>>     a4b0:       53                      push   %rbx
>>>     a4b1:       48 8b 87 c8 30 00 00    mov    0x30c8(%rdi),%rax
>>>     a4b8:       4c 8b 9f a8 30 00 00    mov    0x30a8(%rdi),%r11
>>>     a4bf:       48 89 74 24 d8          mov    %rsi,-0x28(%rsp)
>>>     a4c4:       45 89 ca                mov    %r9d,%r10d
>>>     a4c7:       48 89 54 24 e0          mov    %rdx,-0x20(%rsp)
>>>     a4cc:       41 f7 da                neg    %r10d
>>>     a4cf:       83 bf 10 31 00 00 00    cmpl   $0x0,0x3110(%rdi)
>>>     a4d6:       48 89 4c 24 e8          mov    %rcx,-0x18(%rsp)
>>>     a4db:       48 89 44 24 d0          mov    %rax,-0x30(%rsp)
>>>     a4e0:       48 8b 87 00 31 00 00    mov    0x3100(%rdi),%rax
>>>     a4e7:       4c 89 5c 24 f0          mov    %r11,-0x10(%rsp)
>>>     a4ec:       48 89 44 24 f8          mov    %rax,-0x8(%rsp)
>>>     a4f1:       0f 84 05 01 00 00       je     a5fc <hyscale_fast_MMX2+0x14c>
>>>     a4f7:       0f ef ff                pxor   %mm7,%mm7
>>>     a4fa:       48 8b 4c 24 e8          mov    -0x18(%rsp),%rcx
>>>     a4ff:       48 8b 7c 24 d8          mov    -0x28(%rsp),%rdi
>>>     a504:       48 8b 54 24 f0          mov    -0x10(%rsp),%rdx
>>>     a509:       48 8b 5c 24 d0          mov    -0x30(%rsp),%rbx
>>>     a50e:       48 31 c0                xor    %rax,%rax
>>>     a511:       0f 18 01                prefetchnta (%rcx)
>>>     a514:       0f 18 41 20             prefetchnta 0x20(%rcx)
>>>     a518:       0f 18 41 40             prefetchnta 0x40(%rcx)
>>>     a51c:       8b 33                   mov    (%rbx),%esi
>>>     a51e:       ff 54 24 f8             callq  *-0x8(%rsp)
>>>     a522:       8b 34 03                mov    (%rbx,%rax,1),%esi
>>>     a525:       48 01 f1                add    %rsi,%rcx
>>>     a528:       48 01 c7                add    %rax,%rdi
>>>     a52b:       48 31 c0                xor    %rax,%rax
>>>     a52e:       8b 33                   mov    (%rbx),%esi
>>>     a530:       ff 54 24 f8             callq  *-0x8(%rsp)
>>> [...]
>>>
>>> Since no functions are being called in C inside hyscale_fast_MMX2(),
>>> the compiler decides it's ok to use -0x8(%rsp) instead of properly
>>> sub'ing rsp, as it supposedly won't get overwritten. But in this case
>>> we call the mmx2 code inside asm, overwriting -0x8(%rsp). The second
>>> callq goes to a522, and when run again, it tries to run some random
>>> code that was the next pointer on the stack. gcc does the same thing,
>>> but it seems it leaves -0x8(%rsp) alone and uses the stack -0x10(%rsp)
>>> and below.
>>>
>>> Is this a compiler bug (as in should it detect a call inside asm)?
>>> Could (or should) we hint to the compiler that a call is being made
>>> inside the asm block (I don't even know if this is possible)?
>> I would suggest that you ask intel (after checking the manual).
>> its surely possible to workaround this in various ways but this
>> feels unclean.
>
> I believe I was able to duplicate the issue described replicating
> the segment fault with a small snippet.  I checked with a compiler
> engineer at and he replied with the following:
>
> The compiler is unable to detect which stack spaces the users uses
> in inlined asm, and avoid them. As a workaround, you can use
> -mno-red-zone to disable the optimization where we use the lower
> part of ESP in leaf functions, but this will disable red-zone for
> all other leaf functions also, and may cost performance.
>
> I can look into a modification of the assembly to work around the
> problem if you still have the issue.

This problem could potentially appear with any gcc version as well; I
ran into it on PPC64 a while ago.  There is no point making only one
compiler safe in this manner, since we'd still need solve it for the
other ones.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list