[FFmpeg-devel] Intrinsics (and NEON in particular)
pascal.massimino at gmail.com
Wed Sep 3 14:06:39 CEST 2014
On Wed, Sep 3, 2014 at 9:16 AM, Reimar Döffinger <Reimar.Doeffinger at gmx.de>
> On 03.09.2014, at 08:38, Pascal Massimino <pascal.massimino at gmail.com>
> > On Tue, Sep 2, 2014 at 10:26 PM, Reimar Döffinger <
> Reimar.Doeffinger at gmx.de>
> > wrote:
> >> On 03.09.2014, at 00:49, Pascal Massimino <pascal.massimino at gmail.com>
> >> wrote:
> >>> On Tue, Sep 2, 2014 at 9:39 AM, Michael Niedermayer <michaelni at gmx.at>
> >>> wrote:
> >>> [ahem: ffmpeg doesn't feel like using intrinsics, by chance?]
> >> I tried that about 5 months back, once more.
> >> It still results in code that is slower than the plain C version, even
> >> when using SIMD, on trivial NEON audio format conversion (same thing in
> >> was about 8x faster).
> >> So you can get the same effect with less effort by disabling just
> >> disabling asm code.
> > strange. I exclusively used intrinsics for libwebp (x86, but also
> > neon/aarch64) and was pretty
> > pleased with the result (say <2% perf loss, but 10x easier maintenance
> > friendliness to non-guru contributors).
> I guess you never used uint16x8x2 and similar types then, because almost
> any access to them seems to go via the stack.
> See the last file of
> , it spilled the data to stack twice per loop iteration.
indeed, i just tried to compile the patch (gcc 4.8.3) and the output is
It's likely related to the poor support of post-incremented instructions.
that in several occasions.
But on the bright side, things seems to be moving in the right direction,
More information about the ffmpeg-devel