[FFmpeg-devel] [PATCH 7/9] sbcenc: add MMX optimizations
James Almer
jamrial at gmail.com
Sat Dec 23 22:47:04 EET 2017
On 12/23/2017 5:44 PM, Aurelien Jacobs wrote:
> On Sat, Dec 23, 2017 at 03:35:28PM -0300, James Almer wrote:
>> On 12/23/2017 3:01 PM, Aurelien Jacobs wrote:
>>> This was originally based on libsbc, and was fully integrated into ffmpeg.
>>>
>>> Rough speed test:
>>> C version: speed= 592x
>>> MMX version: speed= 785x
>>> ---
>>> libavcodec/sbcdsp.c | 3 +
>>> libavcodec/sbcdsp.h | 2 +
>>> libavcodec/x86/Makefile | 2 +
>>> libavcodec/x86/sbcdsp.asm | 284 +++++++++++++++++++++++++++++++++++++++++++
>>> libavcodec/x86/sbcdsp_init.c | 51 ++++++++
>>> 5 files changed, 342 insertions(+)
>>> create mode 100644 libavcodec/x86/sbcdsp.asm
>>> create mode 100644 libavcodec/x86/sbcdsp_init.c
>>
>> [...]
>>
>>> +;*******************************************************************
>>> +;void ff_sbc_calc_scalefactors(int32_t sb_sample_f[16][2][8],
>>> +; uint32_t scale_factor[2][8],
>>> +; int blocks, int channels, int subbands)
>>> +;*******************************************************************
>>> +INIT_MMX mmx
>>> +cglobal sbc_calc_scalefactors, 5, 7, 3, sb_sample_f, scale_factor, blocks, channels, subbands, ptr, blk
>>> + ; subbands = 4 * subbands * channels
>>> + shl subbandsd, 2
>>> + cmp channelsd, 2
>>> + jl .loop_1
>>> + shl subbandsd, 1
>>> +
>>> +.loop_1:
>>> + sub subbandsq, 8
>>> + lea ptrq, [sb_sample_fq + subbandsq]
>>> +
>>> + ; blk = (blocks - 1) * 64;
>>> + lea blkq, [blocksq - 1]
>>> + shl blkd, 6
>>> +
>>> + movq m0, [scale_mask]
>>
>> I insist, this can be easily loaded outside the loop. You have enough
>> spare regs to store a copy.
>
> Oh, I forgot to reply to this. There isn't any register left available
> on x86_32, hence why I kept those load inside the loop.
You're not using a gprs to store the mask nor need to. You're using mmx
regs and have 5 left.
More information about the ffmpeg-devel
mailing list