[FFmpeg-devel] [PATCH 0/3] synth filter float ASM

James Almer jamrial at gmail.com
Sat Mar 1 09:35:24 CET 2014


On 01/03/14 4:22 AM, Christophe Gisquet wrote:
> Hi,
> 
> 2014-03-01 4:32 GMT+01:00 James Almer <jamrial at gmail.com>:
>> Here are some extra implementations that extend Christophe's work.
> 
> Thanks for this, it looks very nice.
> 
>> The first one (SSE) could very well replace SSE2 considering the only difference
>> is in essence one extra mova.
>> I benched a bit and there didn't seem to be any difference in speed at all between
>> the two.
> 
> Actually, I would have written an SSE-only version (there was some
> difference for me though), but I remember people wanting no further
> SSE asm when there is a SSE2 version, up to the point that it made my
> life simpler doing what they asked rather than argue with them. I hope
> you'll be saved the trouble.

Having both SSE and SSE2 is pointless on x64, which is why i made it x86 only.

The function is pretty much SSE for that matter. pxor/xorps work exactly the same 
to zero the registers, and so do pshufd/shufps to spread the 32 of data bits across 
registers.
The only difference is in one pshufd/shufps case since the former uses one source 
and the latter uses two (dst being the second), so for a memory source the movaps 
was needed if i wanted the same results as pshufd.
I didn't notice a performance hit from those extra movaps, but if you or others do 
then maybe it's better to keep both versions.

> 
>> Second patch is an implementation of AVX using ymm registers.
>> In my tests it was about 30 cycles faster than SSE2 on a Sandy Bridge CPU, 150
>> cycles vs 180 cycles.
> 
> Nice, maybe update the patch comment with this reference number,
> because it underlines it is a 15% speedup, which is not small.

Personally i was expecting a bigger boost than that, considering the main loop is 
being run only once in x64 and twice in x86, compared to two and four times 
respectively with SSE2. But i guess things aren't as linear as i thought.

> I don't have comments on the asm otherwise, as I don't know avx. I
> know your code passes fate-dts so that should be ok.
> 


More information about the ffmpeg-devel mailing list