[FFmpeg-devel] [PATCH 2/2] tta/x86: add ff_ttafilter_process_dec_{ssse3, sse4}

James Almer jamrial at gmail.com
Tue Feb 11 06:02:12 CET 2014


On 11/02/14 12:34 AM, James Almer wrote:
> On 10/02/14 10:26 PM, Christophe Gisquet wrote:
>> 2014-02-11 2:12 GMT+01:00 Christophe Gisquet <christophe.gisquet at gmail.com>:
>>> I haven't quite checked if the code is optimal, but I haven't seen any
>>> other issue. Maybe using more registers to break dependencies, but
>>> that's a short function, and there's no loop to amortize their use.
>>
>> There are a few spots where some dependencies might exist (not
>> checked) and could be lifted, e.g.
>> +    paddd      m6, m7
>> +
>> +    movd       m7, [filterq + 0x4]
>>
>> I think at that point m2 and m3 are free, you should use them instead,
>> because those 2 insns may not execute in parallel.
>>
> 
> Changing this didn't affect speed at all on my tests (still 455 decicycles).
> 
> What did however affect speed negatively was calling the asm functions using 
> all seven elements from TTAFilter as arguments as i mentioned I'd do in my 
> previous email. I lost about 10 cycles on Win64 and 38 on Win32 just by doing 
> that.
> I assume this is because of the prologue code in x86inc.
> 
> I'll send an updated patch soon. If you find any dependencies please tell so.
> 

New patchset sent. Kinda bummed at the loss of performance for using seven 
general purpose registers for the arguments, but if it's safer then it can't 
be helped.

Thanks for reviewing.


More information about the ffmpeg-devel mailing list