[FFmpeg-devel] [PATCH] x86/swr: make int32_to_int32 un/pack_2ch functions SSE

James Almer jamrial at gmail.com
Wed Jan 14 22:23:54 CET 2015


On 14/01/15 1:59 PM, Michael Niedermayer wrote:
> On Wed, Jan 14, 2015 at 01:53:48AM -0300, James Almer wrote:
>> unpack_2ch is already using sse float ops only, and pack_2ch is a trivial change.
>> Rename both to float_to_float for consistency.
>>
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>>  libswresample/x86/audio_convert.asm    | 14 ++++++++------
>>  libswresample/x86/audio_convert_init.c | 11 +++++++----
>>  2 files changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/libswresample/x86/audio_convert.asm b/libswresample/x86/audio_convert.asm
>> index 1617e0b..c13c26f 100644
>> --- a/libswresample/x86/audio_convert.asm
>> +++ b/libswresample/x86/audio_convert.asm
>> @@ -60,8 +60,8 @@ pack_2ch_%2_to_%1_u_int %+ SUFFIX
>>      punpcklwd m0, m2
>>      punpckhwd m1, m2
>>  %else
>> -    punpckldq m0, m2
>> -    punpckhdq m1, m2
>> +    unpcklps  m0, m2
>> +    unpckhps  m1, m2
>>  %endif
>>      %6 m0,m1,m2,m3,m4,m5
>>  %else
> 
> did you benchmark this ?
> ive just checked and on Pentium M, Core Solo and Core Duo these are
> listed as having only 1/5 the throughput
> on sandybridge they are still listed with half the throughput than
> their integer counterparts
> i didnt benchmark it though

No, i didn't benchmark. And you're right, even on recent CPUs they seem to 
have half the throughput as the integer counterparts.
Do you think it will mean a considerable performance hit? These functions 
aren't even that important in audio processing anyway (perf shows they 
represent less than 1% of total cpu time when doing pcm -> pcm).

Nonetheless, considering this maybe the other functions should be changed 
to not use SBUTTERFLYPS.


More information about the ffmpeg-devel mailing list