[FFmpeg-devel] [PATCH 04/10] x86: synth filter float: implement SSE2 version

Michael Niedermayer michaelni at gmx.at
Fri Feb 28 20:54:00 CET 2014


On Fri, Feb 14, 2014 at 04:00:48PM +0000, Christophe Gisquet wrote:
> Timings for Arrandale:
>           C    SSE
> win32:  2108   334
> win64:  1152   322
> 
> Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
> the jmp destination being aligned.
> 
> Unrolling for ARCH_X86_64 is a 20 cycles gain.

applied

thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140228/b525254d/attachment.asc>


More information about the ffmpeg-devel mailing list