[FFmpeg-devel] [PATCH 1/4] lavc/flacenc: add sse4 version of the 16-bit lpc encoder
James Almer
jamrial at gmail.com
Tue Feb 25 06:51:50 CET 2014
On 25/02/14 12:42 AM, James Almer wrote:
> On 20/02/14 3:48 PM, James Darnley wrote:
>> From 1.8 to 2.4 times faster. Runtime is reduced by 2 to 39%. The
>> speed-up generally increases with compression_level.
>>
>> This lpc encoder is not used with levels < 3 so it provides no speed-up
>> in these cases.
>> ---
>> LICENSE | 1 +
>> libavcodec/flacenc.c | 2 +-
>> libavcodec/x86/Makefile | 3 +
>> libavcodec/x86/flac_dsp_gpl.asm | 83 +++++++++++++++++++++++++++++++++++++++
>> libavcodec/x86/flacdsp_init.c | 4 ++
>> 5 files changed, 92 insertions(+), 1 deletions(-)
>> create mode 100644 libavcodec/x86/flac_dsp_gpl.asm
>>
>
> [...]
>
>> +.looplen:
>> + pxor m0, m0
>> + mov posj, orderq
>> + xor negj, negj
>> +
>> + .looporder:
>> + movd m2, [coefsq+posj*4] ; c = coefs[j]
>> + SPLATD m2
>> + movu m1, [smpq+negj*4-4] ; s = smp[i-j-1]
>
>> + pmulld m1, m2
>> + paddd m0, m1 ; p += c * s
>
> PMACSDD m0, m1, m2, m0, m1
>
> Same with the encoder (PMACSDQL instead in there). Do it of course with the
> unrolling patches as well.
> You can then make the functions into macros to get both SSE4 and XOP versions,
> as i mentioned in a previous email.
>
Meant to say "Same with the 32-bit encoder". Sorry for the confusion.
More information about the ffmpeg-devel
mailing list