[Ffmpeg-devel] [PATCH] Snow mmx+sse2 asm optimizations

Robert Edele yartrebo
Wed Mar 8 15:49:22 CET 2006


On Tue, 2006-03-07 at 20:52 -0500, Robert Edele wrote:
> On Tue, 2006-03-07 at 20:39 -0500, Robert Edele wrote:
> > On Wed, 2006-03-08 at 01:31 +0100, Michael Niedermayer wrote:
> > > Hi
> > > 
> > > On Tue, Mar 07, 2006 at 07:07:06PM -0500, Robert Edele wrote:
> > > > On Tue, 2006-03-07 at 15:34 -0800, Loren Merritt wrote:
> > > > > On Tue, 7 Mar 2006, Robert Edele wrote:
> > > > > > On Mon, 2006-03-06 at 02:06 +0100, Michael Niedermayer wrote:
> > > > > >> On Sun, Mar 05, 2006 at 06:09:09PM -0500, Robert Edele wrote:
> > > > > >>> +        ::
> > > > > >>> +        "m"(b0),"m"(b1),"m"(b2),"m"(b3),"m"(b4),"m"(b5),"d"(end_w2):
> > > > > >>> +        "%"REG_a"","%"REG_b"","%"REG_c"");
> > > > > >>
> > > > > >> this code is not valid, REG_d is changed but neither output nor on the clobber list
> > > > > >
> > > > > > REG_d is on the input list, so GCC recognizes it as clobbered? GCC
> > > > > > also refuses that I put it REG_d into the clobber list. I believe the
> > > > > > code is good as is?
> > > > > 
> > > > > If it's both input and clobbered, put it on the output list with "+d".
> > > > 
> > > > Putting "+d"(end_w2) pr "=d"(end_w2) in the output list causes the
> > > > program to crash. 
> > > 
> > > well, you will have to debug it i fear if you want to see the code in cvs
> > > 
> > > 
> > > > The variable is not used after the asm block.
> > > 
> > > maybe not at the C level but at the asm level gcc is free to use the register
> > > and unless you mark it as changed u MUST gurantee that it didnt change
> > > these are the rules of gcc asm, you can look it up in the gcc manual
> > > page or some introduction to gcc asm (try google)
> > > 
> > > [...]
> > > 
> > I found two bugs, one of which was just as you said - a clobber list
> > issue. The other bug doesn't effect the result but could effect cache
> > performance. Both have been fixed and the code now runs with REG_d in
> > the clobber list.
> > 
> > The new and hopefully correct patch is attached.
> 
> I forgot to fix the similar REG_c bug in inner_add_yblock. This has now
> been fixed. New patch attached.
> 
> Robert Edele

The makefile on CVS has been updated and my patch no longer applies
cleanly. I've updated my patch and attached it to this e-mail.

Sincerely,
Robert Edele
-------------- next part --------------
A non-text attachment was scrubbed...
Name: snow_mmx.patch
Type: text/x-patch
Size: 93974 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20060308/f886d701/attachment.bin>



More information about the ffmpeg-devel mailing list