[FFmpeg-devel] [PATCH] split-radix FFT

Michael Niedermayer michaelni
Tue Jul 29 14:05:46 CEST 2008


On Tue, Jul 29, 2008 at 01:43:43PM +0200, Michael Niedermayer wrote:
> On Tue, Jul 29, 2008 at 12:22:59AM -0600, Loren Merritt wrote:
> > AOn Tue, 29 Jul 2008, Michael Niedermayer wrote:
> > > On Fri, Jul 25, 2008 at 08:14:00PM -0600, Loren Merritt wrote:
> > >
> > >> +#ifdef EMULATE_3DNOWEXT
> > >> +#define PSWAPD(s,d)\
> > >> +    "movq "#s","#d"\n"\
> > >> +    "psrlq $32,"#d"\n"\
> > >> +    "punpckldq "#s","#d"\n"
> > >
> > >> +#define PSWAPD_UNARY(s)\
> > >> +    "sub $8, %%"REG_SP"\n"\
> > >> +    "movd "#s", 4(%%"REG_SP")\n"\
> > >> +    "punpckhdq (%%"REG_SP"), "#s"\n"\
> > >> +    "add $8, %%"REG_SP"\n"
> > >
> > > Gcc failed with a "+m" ?
> > 
> > No, I just designed the 3dn1 emulation of 3dn2 for simplicity (including 
> > code locality) rather than speed. I wouldn't have written it at all 
> > except that then I wouldn't be able to delete the radix-2 init code. 
> > (I still can't delete it until someone ports split-radix to altivec, 
> > but I assume that'll happen.)
> > 
> > >> +static void fft4(FFTComplex *z)
> > >>  {
> > >> -    int ln = s->nbits;
> > >> -    long j;
> > >> -    x86_reg i;
> > >> -    long nblocks, nloops;
> > >> -    FFTComplex *p, *cptr;
> > >> +    T2(z[0], z[1], %%mm0, %%mm1);
> > >> +    LOAD(z[2], %%mm2);
> > >> +    LOAD(z[3], %%mm3);
> > >> +    T4(%%mm0, %%mm1, %%mm2, %%mm3, %%mm4, %%mm5);
> > >> +    PUNPCK(%%mm0, %%mm1, %%mm4);
> > >> +    PUNPCK(%%mm2, %%mm3, %%mm5);
> > >> +    SAVE(z[0], %%mm0);
> > >> +    SAVE(z[1], %%mm4);
> > >> +    SAVE(z[2], %%mm2);
> > >> +    SAVE(z[3], %%mm5);
> > >> +}
> > >
> > > is there any reason why seperate asm() are chained? I think a single
> > > asm block, or even nasm/yasm if you prefer would be better.
> > 
> > Because it works for me, and I don't see any alternatives that are as 
> > concise.
> > yasm, ok.
> 
> I prefer code that is easy to maintain over concise code, and code that gcc
> can silently pessimize is not easy to maintain IMHO. It easily can cost
> someone quite some time to debug why some codec is slower on some gcc
> version or compiled with different flags ...
> 
> It would of course be different if such "silent pessimization" where just
> hypothetical but it isnt, gcc is really following murphis law here, if it
> can mess up it does.
> Thats why i would strogly prefer if gcc couldnt put anything at all between
> the asm parts ...
> 
> 
> > 
> > > The way its written is almost asking for gcc to put something in between,
> > > iam especially concerned about the -fPIC case and gcc putting all the GOT
> > > "magic" in between the asms ...
> > 
> > Is gcc so stupid as to emit GOT stuff when dereferencing a pointer that's 
> > already in a register, no global variables involved?
> 
> yes, examples below with gcc 4.3.1

after re-reading your question, i realize that you didnt ask for stuff
involving globals. So my reply didnt match it, sorry ive read over
it too quickly.
The GOT stuff quoted though still is not strictly needed. Even though
for some parts thats not related to the asm() being split, so again
sorry for talking nonsense ill try to avoid writing replies when iam
tired.
The 2 leal though while they are use to access globals are not needed
at all and added by gcc between asm()

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The greatest way to live with honor in this world is to be what we pretend
to be. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080729/39633a42/attachment.pgp>



More information about the ffmpeg-devel mailing list