[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian)

Aurelien Jacobs aurel
Wed Feb 1 00:01:14 CET 2006


On Tue, 31 Jan 2006 23:37:04 +0100
Pawe? Sikora <pluto at pld-linux.org> wrote:

> Dnia Tuesday, 31 of January 2006 21:25, matthieu castet napisa?:
> > Hi Pawe?,
> >
> > Pawe? Sikora wrote:
> > > Hi all,
> > >
> > > I have an implementation of transpose4x4 in C which uses gcc's vector
> > > extensions. It doesn't press register allocator so much and allows
> > > optimal code scheduling.
> > >
> > > Instantiation of attached patch e.g. in foo(dst, src, 4, 4)
> > > gives a nice piece of code:
> > >
> > > [ x86-64 example ]
> > >
> > > foo:    movd        4(%rsi), %mm0
> > >         movd        (%rsi), %mm1
> > >         movd        8(%rsi), %mm2
> > >         movd        12(%rsi), %mm3
> > >         punpcklbw   %mm0, %mm1
> > >         punpcklbw   %mm3, %mm2
> > >         movq        %mm1, %mm0
> > >         punpckhwd   %mm2, %mm1
> > >         punpcklwd   %mm2, %mm0
> > >         movd        %mm1, 8(%rdi)
> > >         punpckhdq   %mm1, %mm1
> > >         movd        %mm0, (%rdi)
> > >         punpckhdq   %mm0, %mm0
> > >         movd        %mm1, 12(%rdi)
> > >         movd        %mm0, 4(%rdi)
> > >         ret
> > >
> > > actually gcc-4.1 has a good optimizer and happy asm. hardcoding
> > > doesn't introduce incredible performance boost but only degradation
> > > of code scheduling.
> >
> > Could you post a benchmarck between the 2 versions ?
> 
> I did a simple benchmark with transpose4x4 marked with attribute noinline.
> 
> results:
> 
> orig:  iters = 1000000000, dt = 7.92 [avg]
> fixed: iters = 1000000000, dt = 7.35 [avg]
> 
> we gain: ~7.2%

That sounds interesting, but here, with gcc-4.0.2 on amd64, I have some
rather different results :

orig:  iters = 1000000000, dt = 12.16
fixed: iters = 1000000000, dt = 173.86

So it seems that gcc-4.1 gives some spectacular improvements in this area,
but this code really shouldn't be enabled with gcc-4.0.

Aurel





More information about the ffmpeg-devel mailing list