[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian)

Aurelien Jacobs aurel
Wed Feb 1 01:08:59 CET 2006


On Wed, 1 Feb 2006 00:21:56 +0100
Pawe? Sikora <pluto at pld-linux.org> wrote:

> Dnia Wednesday, 1 of February 2006 00:01, Aurelien Jacobs napisa?:
> 
> > > orig:  iters = 1000000000, dt = 7.92 [avg]
> > > fixed: iters = 1000000000, dt = 7.35 [avg]
> > >
> > > we gain: ~7.2%
> >
> > That sounds interesting, but here, with gcc-4.0.2 on amd64, I have some
> > rather different results :
> >
> > orig:  iters = 1000000000, dt = 12.16
> > fixed: iters = 1000000000, dt = 173.86
> >
> > So it seems that gcc-4.1 gives some spectacular improvements in this area,
> > but this code really shouldn't be enabled with gcc-4.0.
> 
> I would like to see the asm. dump of 4.0.2 output.
> It seems to be a gcc-bug.

Oh ! My bad... stupid me. I just forgot the -O3 when compiling !
Now here are some better results :

  orig:  iters = 1000000000, dt = 5.04
  fixed: iters = 1000000000, dt = 5.47

So that's still worse for the fixed version, but that's much more
reasonable.

Here is the asm code resulting of fixed_transpose4x4:

        movslq  %ecx,%rax
        movd    (%rsi), %mm1
        movd    (%rsi,%rax), %mm3
        leal    (%rcx,%rcx), %eax
        movslq  %eax,%r8
        addl    %ecx, %eax
        punpcklbw       %mm3, %mm1
        cltq
        movd    (%rsi,%r8), %mm2
        movd    (%rsi,%rax), %mm0
        movslq  %edx,%rax
        punpcklbw       %mm0, %mm2
        movq    %mm1, %mm0
        punpcklwd       %mm2, %mm0
        punpckhwd       %mm2, %mm1
        movd    %mm0, (%rdi)
        punpckhdq       %mm0, %mm0
        movd    %mm0, (%rdi,%rax)
        leal    (%rdx,%rdx), %eax
        movslq  %eax,%rcx
        addl    %edx, %eax
        movd    %mm1, (%rdi,%rcx)
        punpckhdq       %mm1, %mm1
        cltq
        movd    %mm1, (%rdi,%rax)
        ret

Aurel





More information about the ffmpeg-devel mailing list