[Ffmpeg-devel] [MacIntel] Testers welcome!
Mon Aug 14 16:06:57 CEST 2006
On Mon, 14 Aug 2006, Marco Manfredini wrote:
> On Sunday 13 August 2006 04:46, John Dalgliesh wrote:
>> If you want to continue the investigation instead, then please be my
>> guest! :)
> Hi John,
> It looks like the compiler has a personal feud with 'i' in
> ff_snow_horizontal_compose97i_sse and tries to kill it as best as he can.
Well in general if the compiler wants to optimise it out I would say
that's a good thing...
> Getting i in again is not so easy:
> For example, none of these work
> These work:
> my copy of ffmpeg passes the tests after this modification.
OK I think you are missing a couple of steps of explanation here. Why do
you care about that variable? Or are you implying that because it has
reclaimed the stack space originally intended for 'i', that's why the
var-len array isn't 16-byte aligned - despite the optimiser's assumption?
If so, then please say so explictly; it's not at all clear to me that this
is the reasoning you're following.
> Potentially bad is, that both solutions force 'i' into memory after
Yes if it affects the optimisation of the routine I don't think it'll be
My main question is: Did my workaround work for you? The code obviously
already does not expect temp_buf to be aligned, that's why it calculates
temp at some offset into temp_buf. If think if the reason for the problem
is found and addressed, then a simple patch like that workaround would be
The approach should be to either reproduce the behaviour in a simpler test
case, and figure out which compilers it breaks on, or find the gcc bug /
patch / changelog entry where it is fixed. It doesn't happen for me with
gcc4.0.3 on linux ... but there are too many variations there to say that
it has been fixed by 4.0.3.
I've attached my workaround as a patch to this email so you don't have to
go hunting for it.
-------------- next part --------------
--- snowdsp_mmx.c (revision 5992)
+++ snowdsp_mmx.c (working copy)
@@ -25,7 +25,7 @@
const int w2= (width+1)>>1;
// SSE2 code runs faster with pointers aligned on a 32-byte boundary.
DWTELEM temp_buf[(width>>1) + 4];
- DWTELEM * const temp = temp_buf + 4 - (((int)temp_buf & 0xF) >> 2);
+ DWTELEM * const temp = (DWTELEM*)(((intptr_t)&temp_buf)&~0xF);
const int w_l= (width>>1);
const int w_r= w2 - 1;
More information about the ffmpeg-devel