[FFmpeg-devel] [PATCH] Add x86-optimized versions of lshift_tab().
Måns Rullgård
mans
Sat Feb 12 21:30:46 CET 2011
"Ronald S. Bultje" <rsbultje at gmail.com> writes:
> Hi,
>
> On Sat, Feb 12, 2011 at 2:31 PM, Justin Ruggles
> <justin.ruggles at gmail.com> wrote:
>> New function name AC3DSPContext.ac3_lshift_int16().
>> ---
>> ?libavcodec/ac3dsp.c ? ? ? ? | ? 11 +++++++++++
>> ?libavcodec/ac3dsp.h ? ? ? ? | ? 11 +++++++++++
>> ?libavcodec/ac3enc_fixed.c ? | ? 19 +------------------
>> ?libavcodec/x86/ac3dsp.asm ? | ? 35 +++++++++++++++++++++++++++++++++++
>> ?libavcodec/x86/ac3dsp_mmx.c | ? ?7 +++++++
>> ?5 files changed, 65 insertions(+), 18 deletions(-)
> [..]
>> + /**
>> + * Left-shift each value in an array of int16_t by a specified amount.
>> + * @param src input array
>> + * constraints: align 16
>> + * @param len number of values in the array
>> + * constraints: multiple of 32 greater than 0
>> + * @param shift left shift amount
>> + * constraints: range [0,15]
>> + */
>> + void (*ac3_lshift_int16)(int16_t *src, int len, unsigned int shift);
>
> See below on this.
>
>> +cglobal ac3_lshift_int16_%1, 3,3,5, src, offset, shift
>> + cmp shiftd, 0
>> + je .end
>> + shl offsetq, 1
>> + sub offsetq, mmsize*4
>> + movd m0, shiftd
>> +.loop:
>> + mova m1, [srcq+offsetq ]
>> + mova m2, [srcq+offsetq+mmsize ]
>> + mova m3, [srcq+offsetq+mmsize*2]
>> + mova m4, [srcq+offsetq+mmsize*3]
>> + psllw m1, m0
>> + psllw m2, m0
>> + psllw m3, m0
>> + psllw m4, m0
>> + mova [srcq+offsetq ], m1
>> + mova [srcq+offsetq+mmsize ], m2
>> + mova [srcq+offsetq+mmsize*2], m3
>> + mova [srcq+offsetq+mmsize*3], m4
>> + sub offsetq, mmsize*4
>> + jge .loop
>> +.end:
>> + RET
>> +%endmacro
>> +
>> +INIT_MMX
>> +AC3_LSHIFT_INT16 mmx
>> +INIT_XMM
>> +AC3_LSHIFT_INT16 sse2
>
> Doesn't this do 64 per loop iteration for sse2? If so, doesn't that
> conflict with the function definition and/or overflow?
64 bytes, 32 int16_t elements.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list