[FFmpeg-devel] [PATCH 2/2] tta/x86: add ff_ttafilter_process_dec_{ssse3, sse4}

James Almer jamrial at gmail.com
Tue Feb 11 04:34:34 CET 2014


On 10/02/14 10:26 PM, Christophe Gisquet wrote:
> 2014-02-11 2:12 GMT+01:00 Christophe Gisquet <christophe.gisquet at gmail.com>:
>> I haven't quite checked if the code is optimal, but I haven't seen any
>> other issue. Maybe using more registers to break dependencies, but
>> that's a short function, and there's no loop to amortize their use.
> 
> There are a few spots where some dependencies might exist (not
> checked) and could be lifted, e.g.
> +    paddd      m6, m7
> +
> +    movd       m7, [filterq + 0x4]
> 
> I think at that point m2 and m3 are free, you should use them instead,
> because those 2 insns may not execute in parallel.
> 

Changing this didn't affect speed at all on my tests (still 455 decicycles).

What did however affect speed negatively was calling the asm functions using 
all seven elements from TTAFilter as arguments as i mentioned I'd do in my 
previous email. I lost about 10 cycles on Win64 and 38 on Win32 just by doing 
that.
I assume this is because of the prologue code in x86inc.

I'll send an updated patch soon. If you find any dependencies please tell so.


More information about the ffmpeg-devel mailing list