[FFmpeg-devel] Mixed data type in SIMD code?

Zuxy Meng zuxy.meng
Wed Mar 5 08:22:00 CET 2008


Hi,

2008/3/5, Zuxy Meng <zuxy.meng at gmail.com>:
> Hi,
>
> 2008/3/5, Loren Merritt <lorenm at u.washington.edu>:
> > On Tue, 4 Mar 2008, Michael Niedermayer wrote:
> > > On Mon, Mar 03, 2008 at 04:30:08PM -0700, Loren Merritt wrote:
> > >> On Mon, 3 Mar 2008, Michael Niedermayer wrote:
> > >>>
> > >>> Also i doubt we use or ever will use packed double.
> > >>
> > >> flac encoder does. Single isn't precise enough for a linear sum of up
> > >> to 16k elements. Reordering the sum to a tree made it half-way
> > >> decent decent precision, but also made it as slow as double.
> > >
> > > What about something like:
> > >
> > > for(i=0; i<16000;){
> > >    float sum=0;
> > >    do{
> > >        sum+= whatever[i++];
> > >    }while(i&127);
> > >    double_sum += sum;
> > > }
> >
> > done.
> >
> > core2:
> > 2039632 dezicycles in autocorr_double_c, 65536 runs, 0 skips
> > 771026 dezicycles in autocorr_double_sse2, 65536 runs, 0 skips
> > 524713 dezicycles in autocorr_float_sse1, 65536 runs, 0 skips
> > 500609 dezicycles in autocorr_float_sse2, 65534 runs, 2 skips
> > 432458 dezicycles in autocorr_float_ssse3, 65535 runs, 1 skips
> > overall: 4.8%
> >
> > k8:
> > 1776170 dezicycles in autocorr_double_c, 65534 runs, 2 skips
> > 1062022 dezicycles in autocorr_double_sse2, 65535 runs, 1 skips
> > 932452 dezicycles in autocorr_float_sse1, 65533 runs, 3 skips
> > 911259 dezicycles in autocorr_float_sse2, 65534 runs, 2 skips
> > overall: 2.5%
> >
>
> It looks to me that
>
> +        OP2(movhlps,  6,0, 7,1)\
> +        OP2(addsd,    6,0, 7,1)\
> +        "movsd   %%xmm0,    %2  \n\t"\
> +        "movsd   %%xmm1,  8+%2  \n\t"\
>
> can be optimized to
>
>          haddpd %%xmm7, %%xmm6\n\t
>          movapd %%xmm6, %2\n\t
>
> when SSE3 is available.

Benchmarking only this piece of code (6 inst. SSE vs 2 inst SSE3), on
a K8 SSE3 is merely about 1% faster but on a Prescott SSE3 is 85%
faster. Don't have access to any Core 2 though.

-- 
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6




More information about the ffmpeg-devel mailing list