[FFmpeg-devel] [PATCH] x86/sbrdsp: add ff_sbr_autocorrelate_{sse, sse3}
James Almer
jamrial at gmail.com
Sun Jan 25 22:23:38 CET 2015
On 25/01/15 10:11 AM, Christophe Gisquet wrote:
> Hi,
>
> 2015-01-25 2:05 GMT+01:00 James Almer <jamrial at gmail.com>:
>> 2 to 2.5 times faster.
>>
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>> libavcodec/x86/sbrdsp.asm | 114 +++++++++++++++++++++++++++++++++++++++++++
>
> Not the first time that I notice that, but memmoves are often
> suboptimal using old SSE ones.
> While movlhps is fine, movlps isn't, on my old core i5. You may want
> to validate this with the attached patch, where storing ps_mask3 in m8
> is a gain in Win64 (the gain does not match the number of loops, but
> it is still there).
I can reproduce the gains using mov{q,sd} instead of movlps, but not with the
mask loaded into m8 (Tested on win64 using a k10 cpu and linux x64 using a
Haswell cpu).
>
> Benchmarks:
> x64: 6023 decicycles in g, 262108 runs, 36 skips
> SSE: 3049 decicycles in g, 262130 runs, 14 skips
> SSE3: 2843 decicycles in g, 262086 runs, 58 skips
> movq: 2693 decicycles in g, 262117 runs, 27 skips
> m8: 2648 decicycles in g, 262083 runs, 61 skips
>
> Thanks for doing it, I had only 3yo scraps left and no further
> motivation to tackle the start/tail parts.
I applied the first part for now.
Thanks.
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list