[Libav-user] gcc auto-vectorisation
klaussfreire at gmail.com
Tue Feb 26 16:53:58 CET 2013
On Tue, Feb 26, 2013 at 7:05 AM, "René J.V. Bertin" <rjvbertin at gmail.com> wrote:
> On Feb 26, 2013, at 02:21, Claudio Freire wrote:
>> I wouldn't assume. Even if they are in effect aligned, if the compiler
>> doesn't know it (ie, if malloc doesn't mark them as such),
>> vectorization will still assume out-of-alignment access.
> I may be wrong, but if that were the case (and glue code were added to ensure proper alignment), auto-vectorisation should not in that case be able to provoke a crash on win32 because of ... incorrect alignment. And yet that happens (i.e. crashes).
Um... AFAIR, SSE doesn't crash on misaligned access, it just performs poorly.
>> Architecture-mandated and SSE/2/3/MMX/Whatever alignment requirements
>> tend to be different.
> Of course, but as far as I have understood not in this case, because Apple makes such intensive use of SIMD throughout its APIs/SDKs.
Ok, but intrinsics and auto-vectorization aren't the same thing, I was
just wondering if gcc knew about the alignment.
>> You can write a very simple test case to check it out.
> Done. More exactly, I was doing some comparisons of a hand-coded SIMD vs. a straightforward scalar version of functions I'd found when I discovered that gcc-4.7 has auto-vectorisation on by default (at least on OS X) because the scalar version was almost 2.5x faster than the SIMD version. That's what set the whole thing rolling, begging the question if there wouldn't be any gains (albeit undoubtedly smaller) to be had letting the compiler do its thing on the ffmpeg sources.
I remember that thread, I just didn't get from your last post that you
had actually tested. Sorry for the waste then.
I guess the only possibility then is to compare the resulting assembly
to try to spot why SIMD isn't outperforming scalar code. It's all in
the details, but it should outperform it, if there isn't too much
More information about the Libav-user