[Ffmpeg-devel] SSE load and store doubts

Balatoni Denes dbalatoni
Thu Oct 6 12:59:07 CEST 2005


Hi!

What I should have written is: explicit loading and storing is not necessarily 
needed, as eg. many instructions accept a memory operand. And even if they 
don't the compiler apparently loads (and stores) the data on demand.

bye
Denes

cs?t?rt?k 06 okt?ber 2005 12.08-kor Roberto Pariset ezeket a bolcs 
gondolatokat fogalmazta meg:
> hello and thanks alot for your prompt answer!
>
> Il giorno gio, 06/10/2005 alle 11.44 +0200, Balatoni Denes ha scritto:
> > Hi!
> >
> > If do_stuff doesn't use the MMX/SSE intrinsics, than SSE instructions
> > usually won't be used (the compiler itself however could decide to use
> > SSE, but than it is not because of you using the intrinsics).
>
> well, let's assume do_something() actually does SSE :)
>
> > On another note I didn't actually find anything resembling your example
> > in current ffmpeg.
>
> (source: fft_sse.c)
> try look at r and z: it actually uses no _mm_store_ps() function for
> it... hence my doubts...
>
> void ff_fft_calc_sse(FFTContext *s, FFTComplex *z)
> {
>     int ln = s->nbits;
>     int j, np, np2;
>     int nblocks, nloops;
>     register FFTComplex *p, *q;
>     FFTComplex *cptr, *cptr1;
>     int k;
>
>     np = 1 << ln;
>
>     {
>         __m128 *r, a, b, a1, c1, c2;
>
>         r = (__m128 *)&z[0];
>         c1 = *(__m128 *)p1p1m1m1;
>         c2 = *(__m128 *)p1p1p1m1;
>         if (s->inverse)
>             c2 = *(__m128 *)p1p1m1p1;
>         else
>             c2 = *(__m128 *)p1p1p1m1;
>
>         j = (np >> 2);
>         do {
>             a = r[0];
>             b = _mm_shuffle_ps(a, a, _MM_SHUFFLE(1, 0, 3, 2));
>             a = _mm_mul_ps(a, c1);
>             /* do the pass 0 butterfly */
>             a = _mm_add_ps(a, b);
>
>             a1 = r[1];
>             b = _mm_shuffle_ps(a1, a1, _MM_SHUFFLE(1, 0, 3, 2));
>             a1 = _mm_mul_ps(a1, c1);
>             /* do the pass 0 butterfly */
>             b = _mm_add_ps(a1, b);
>
>             /* multiply third by -i */
>             b = _mm_shuffle_ps(b, b, _MM_SHUFFLE(2, 3, 1, 0));
>             b = _mm_mul_ps(b, c2);
>
>             /* do the pass 1 butterfly */
>             r[0] = _mm_add_ps(a, b);
>             r[1] = _mm_sub_ps(a, b);
>             r += 2;
>         } while (--j != 0);
>     }
> ...
>
> bye bye and thanks!
> rob
>
> > cs?t?rt?k 06 okt?ber 2005 11.10-kor Roberto Pariset ezeket a bolcs
> >
> > gondolatokat fogalmazta meg:
> > > hello everyone.
> > >
> > > please consider the following example:
> > >
> > > /* imagine long_array filled with floats here */
> > > float long_array[MANY] __attribute__((aligned(16)));
> > >
> > > i often[1] see this kind of code:
> > > __m128 *reg = (__m128 *) long_array;
> > > for(i=0; i<MANY; i+=4)
> > > {
> > > 	do_stuff();
> > > 	r++; /* skip to next 4 floats of long_array */
> > > }
> > >
> > > while i'd expect the following:
> > > __m128 reg;
> > > for(i=0; i<MANY; i+=4)
> > > {
> > > 	reg = _mm_load_ps( &long_array[i] );
> > > 	reg = do_stuff();
> > > 	_mm_store_ps( &long_array[i], reg );
> > > }
> > >
> > > if i compile and deassemble a simple example as the one before, i see
> > > the first doesn't actually use XMMn registers, while the second does:
> > >
> > >     reg = _mm_load_ps( &long_array[i] );
> > > 400548:    48 8d 7d 80             lea    0xffffffffffffff80(%rbp),%rdi
> > > 40054c:    e8 67 00 00 00          callq  4005b8 <_mm_load_ps> 400551:
> > > 0f 29 45 e0             movaps %xmm0,0xffffffffffffffe0(%rbp)
> > >
> > >     __m128 *reg = (__m128 *) long_array;
> > > 400555:    48 8d 85 70 ff ff ff    lea    0xffffffffffffff70(%rbp),%rax
> > > 40055c:    48 89 45 f0             mov    %rax,0xfffffffffffffff0(%rbp)
> > >
> > > so, basically, i am not sure if this is an error or not, as i am just a
> > > n00b with SSE. to me, it seems that the first syntax is not taking
> > > advantage of sse register, so it'd not make things faster. i might be
> > > wrong, of course. i just wanted to point it out, and would appreciate
> > > much if i could get some explanations, as i haven't found any on the
> > > web (all the code i have found use either load/store or pointer with no
> > > apparent difference, and none explains motivation of the choice).
> > > thanks alot,
> > > roberto
> > >
> > >
> > >
> > >
> > > [1] as in ffmpeg-0.4.9-pre1/libavcodec/i386/
> > >
> > > _______________________________________________
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel at mplayerhq.hu
> > > http://mplayerhq.hu/mailman/listinfo/ffmpeg-devel
> > >
> > >
> > > --- Hirdet?s ---
> > > Minden nap sz?ks?ged van egy kis Witaminra!
> > > Klikkelj ide, pr?b?ld ki, ?ll?tsd be nyit?lapodnak:
> > > http://ads4.adverticum.net/b/cl,1,4008,78817,132589/click.prm
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> http://mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>
>
> --- Hirdets ---
> Minden nap szksged van egy kis Witaminra!
> Klikkelj ide, prbld ki, lltsd be nyitlapodnak:
> http://ads4.adverticum.net/b/cl,1,4008,78817,132589/click.prm

-- 
---
What kills me, doesn't make me stronger.





More information about the ffmpeg-devel mailing list