[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian)

Michael Niedermayer michaelni
Wed Feb 1 21:30:15 CET 2006


Hi

On Wed, Feb 01, 2006 at 01:56:21AM +0100, Pawe?? Sikora wrote:
> Dnia Wednesday, 1 of February 2006 01:39, Aurelien Jacobs napisa??:
> > Pawe?? Sikora <pluto at pld-linux.org> wrote:
> 
> > > hmmm, the 4.1/4.0 fixed_transpose4x4 are equal but benchmarks differs.
> > > maybe orig_transpose4x4 has different prologue?
> >
> > seems so.
> >
> > > [ 4.1 / -O2 ]
> > > orig_transpose4x4:
> > >         leal    (%rdx,%rdx), %r9d
> > >         leal    (%rcx,%rcx), %eax
> > >         movslq  %edx,%r11
> > >         movslq  %ecx,%r8
> > >         movslq  %r9d,%r10
> > >         addl    %edx, %r9d
> > >         movslq  %eax,%rdx
> > >         addl    %ecx, %eax
> > >         movslq  %r9d,%r9
> > >         cltq
> 
> > [ 4.0 / -O2 ]
> > orig_transpose4x4:
> >         leal    (%rdx,%rdx), %r8d
> >         movslq  %edx,%r10
> >         leaq    (%rcx,%rcx,2), %rax
> >         movslq  %r8d,%r9
> >         addl    %edx, %r8d
> >         movslq  %r8d,%r8
> 
> yeah, the 4.1 gives worse code and my first benchmark can be send
> to /dev/null. moreover the second fix (s/int/long/) simplifies x86-64
> prologue and gives measurable gain.

maybe we should typedef int int64_t; on x86-64? arrays where space matters
should be of the intXX_t type or similar anyway

opinions?

benchmarks?

[...]
-- 
Michael





More information about the ffmpeg-devel mailing list