[FFmpeg-devel] [PATCH] x86/dsputilenc: optimize sum_abs_dctelem functions
James Almer
jamrial at gmail.com
Sun May 25 01:18:01 CEST 2014
On 24/05/14 7:45 PM, Michael Niedermayer wrote:
> On Sat, May 24, 2014 at 07:39:22PM -0300, James Almer wrote:
>> Use a single register as accumulator, and make the SUM_ABS_DCTELEM
>> macro more readable
>>
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>> libavcodec/x86/dsputilenc.asm | 44 ++++++++++++++++++-------------------------
>> 1 file changed, 18 insertions(+), 26 deletions(-)
>
> what effect does this have on speed ?
SSE2
Before 300 decicycles in dctelem, 1048574 runs, 2 skips
After: 298 decicycles in dctelem, 1048574 runs, 2 skips
SSSE3
Before: 289 decicycles in dctelem, 1048574 runs, 2 skips
After: 293 decicycles in dctelem, 1048573 runs, 3 skips
This was encoding a 1 minute long 1920x1080 video using the snow encoder.
I originally tested this on an SSE2 only machine, so i didn't see the hit
on the SSSE3 version. Sorry about that.
I'll send a patch that only adds the macro changes but leaves the assembly
intact.
More information about the ffmpeg-devel
mailing list