[FFmpeg-devel] [PATCH] Altivec implementation of int32_to_float_fmul_scalar

Guillaume POIRIER poirierg
Tue Dec 16 10:29:45 CET 2008


Hello,

On Tue, Dec 16, 2008 at 10:21 AM, Luca Barbato <lu_zero at gentoo.org> wrote:
> Guillaume POIRIER wrote:
>> Damn, I feel stupid! (all the more since I didn't understand why you
>> wrote that at first....)
>>
>> Here it is now!
>
> What about unaligned cases?

Assuming that SSE2 version is correct:

static void int32_to_float_fmul_scalar_sse2(float *dst, const int
*src, float mul, int len)
{
    x86_reg i = -4*len;
    __asm__ volatile(
        "movss  %3, %%xmm4 \n"
        "shufps $0, %%xmm4, %%xmm4 \n"
        "1: \n"
        "cvtdq2ps   (%2,%0), %%xmm0 \n"
        "cvtdq2ps 16(%2,%0), %%xmm1 \n"
        "mulps    %%xmm4,    %%xmm0 \n"
        "mulps    %%xmm4,    %%xmm1 \n"
        "movaps   %%xmm0,   (%1,%0) \n"
        "movaps   %%xmm1, 16(%1,%0) \n"
        "add $32, %0 \n"
        "jl 1b \n"
        :"+r"(i)
        :"r"(dst+len), "r"(src+len), "m"(mul)
    );
}


Then we don't need to worry about unaligned case, since SSE2 version
doesn't care.

> Beside that looks ok

Good. BTW, do you confirm that Altivec has no instruction to perform a
"plain" multiplication, but only has vectorized multiply-add?

Guillaume
-- 
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.

Stephen Leacock  - "I detest life-insurance agents: they always argue
that I shall some day die, which is not so."




More information about the ffmpeg-devel mailing list