[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian)

Paweł Sikora pluto
Wed Feb 1 01:56:21 CET 2006


Dnia Wednesday, 1 of February 2006 01:39, Aurelien Jacobs napisa?:
> Pawe? Sikora <pluto at pld-linux.org> wrote:

> > hmmm, the 4.1/4.0 fixed_transpose4x4 are equal but benchmarks differs.
> > maybe orig_transpose4x4 has different prologue?
>
> seems so.
>
> > [ 4.1 / -O2 ]
> > orig_transpose4x4:
> >         leal    (%rdx,%rdx), %r9d
> >         leal    (%rcx,%rcx), %eax
> >         movslq  %edx,%r11
> >         movslq  %ecx,%r8
> >         movslq  %r9d,%r10
> >         addl    %edx, %r9d
> >         movslq  %eax,%rdx
> >         addl    %ecx, %eax
> >         movslq  %r9d,%r9
> >         cltq

> [ 4.0 / -O2 ]
> orig_transpose4x4:
>         leal    (%rdx,%rdx), %r8d
>         movslq  %edx,%r10
>         leaq    (%rcx,%rcx,2), %rax
>         movslq  %r8d,%r9
>         addl    %edx, %r8d
>         movslq  %r8d,%r8

yeah, the 4.1 gives worse code and my first benchmark can be send
to /dev/null. moreover the second fix (s/int/long/) simplifies x86-64
prologue and gives measurable gain.

thx for tests.

-- 
to_be || !to_be == 1, to_be | ~to_be == -1





More information about the ffmpeg-devel mailing list