[Ffmpeg-devel] Re: [PATCH] fix snow x86 SIMD
Thu Aug 10 16:52:10 CEST 2006
On Thu, 10 Aug 2006, Michael Niedermayer wrote:
> On Thu, Aug 10, 2006 at 02:42:11PM +0200, Martin von Gagern wrote:
>> The code currently fails regression tests when configured with
>> --extra-cflags="-DPIC -fPIC -fomit-frame-pointer"
>> --disable-static --enable-shared
>> The program dies with SIGSEGV at the position corresponding to line 776:
>> "paddd (%%"REG_D"), %%xmm0 \n\t"
>> After some debugging I found out that REG_D % 16 == 8 at this location,
>> whereas IA32 Handbook requires it to be 16 byte aligned, causing a
>> general protection exception otherwise, which explains an SIGSEGV imho.
>> Would you go through your code and see if it correctly ensures 16 byte
>> alignment for the address assembled in REG_D? Or maybe add a few
>> assertions so I can test where things are no longer aligned that should
>> have been? I would very much appreciate that.
> if the code where not guranteeing alignment it likely would have failed
> on someone elses system ...
I had a similar failure with OS X on a different paddd instruction in the
ff_snow_horizontal_compose97i_sse2 function (line 159). ffmpeg crashed
with an illegal instruction signal.
> * check that av_malloc() provides aligned memory
> * check that gcc assembled the code correctly, ive seen cases where gcc
> silently violated some constrains
> * try valgrind, maybe it finds some out of array or uninitalized accesses
> * look at sb->line which is where edx comes from i think
In my case the problem IMHO is gcc. Despite the efforts of this line 28:
DWTELEM * const temp = temp_buf + 4 - (((int)temp_buf & 0xF) >> 2);
.. temp does not end up aligned on a 16 byte boundary.
The first for loop in Lift 3 then effectively aligns temp but offsets the
src and b pointers. So I think that loop is quite useless because if it
ever executes it will break the asm code below (unless I am missing
The reason that the code to align temp doesn't work is because gcc has
optimised out the subexpression: '(((int)temp_buf & 0xF) >> 2)' giving it
a value of 0. It appears that it believes that temp_buf will always be
a multiple of 16. However when it executes it is not so.
I didn't post this earlier because I hadn't finished investigating it, to
see if I can reproduce it in a simple program (so far no - gcc always
aligns var-len arrays to 16 bytes), or to see if it's been fixed in a
later gcc or acknowledged as a bug (I'm using 4.0.1).
I'm too tired to continue this tonight, but here is the workaround I am
DWTELEM * const temp = (DWTELEM*)(((intptr_t)&temp_buf)&~0xF);
Disclaimer: I don't know if my crash is related. I couldn't easily locate
a var-len array used by the line quoted above. So just FYI then.
btw Not 20 hours I know... got back early enough for a quick email :)
More information about the ffmpeg-devel