[FFmpeg-devel] [PATCH] h264_i386: Optimize decode_significance_8x8_x86 for 64 bit.

Michael Niedermayer michaelni at gmx.at
Wed Dec 3 13:19:48 CET 2014


On Wed, Dec 03, 2014 at 09:00:39AM +0100, Reimar Döffinger wrote:
> On 03.12.2014, at 01:40, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Sat, Nov 22, 2014 at 02:09:01PM +0100, Reimar Döffinger wrote:
> >> On Mon, Nov 17, 2014 at 01:41:13PM +0100, Michael Niedermayer wrote:
> >>> On Mon, Nov 17, 2014 at 08:19:32AM +0100, Reimar Döffinger wrote:
> >>>> On 17.11.2014, at 02:37, Michael Niedermayer <michaelni at gmx.at> wrote:
> >>>>> On Sat, Nov 15, 2014 at 06:16:03PM +0100, Reimar Döffinger wrote:
> >>>>>> 11674 -> 10877 decicycles on my Phenom II.
> >>>>>> Overall speedup was unfortunately within measurement error.
> >>>>> 
> >>>>> here its  10153 ->10135
> >>>> 
> >>>> I suspect it also depends a bit on the compiler and how it changes the surrounding code.
> >>>> Note that I also tested with PIC actually.
> >>>> 
> >>>>> but ive a slightly odd feeling about the chnages to the asm code,
> >>>>> iam not sure if all assemblers will be happy about the changed
> >>>>> code
> >>>> 
> >>>> Do you mean particularly the movzbl change?
> >>> 
> >>> yes and the k stuff
> >>> 
> >>> 
> >>>> I am also unsure about that, I think there was a reason for that %k6 mess...
> >>>> But this as well as movzx seemed to work for me...
> >>> 
> >>> it works here too i just have the feeling it might fail on some odd
> >>> assembler or platform. Thats not meant to keep you from pushing this
> >>> just that it might require to be reverted or fixed if such
> >>> problems actually occor
> >> 
> >> I pushed it.
> >> If anyone sees issues please tell me and I'll look into it!
> > 
> > i think these fate failures are caused by it but thats based just
> > on other commits in the range looking unlikely:
> > 
> > http://fate.ffmpeg.org/report.cgi?time=20141122231657&slot=x86_64-darwin-clang-3.5-O3
> > http://fate.ffmpeg.org/report.cgi?time=20141122223720&slot=x86_64-darwin-clang-3.5
> 
> That's annoying, I only expected compile errors, this looks more like a compiler bug.
> Can someone run tests?
> Does just using the "m" instead of "r" constraint like on 32 bit fix it?

still aborts with:

@@ -37,7 +37,7 @@
 #if HAVE_INLINE_ASM

 #if ARCH_X86_64
-#define REG64 "r"
+#define REG64 "m"
 #else
 #define REG64 "m"
 #endif

ggdb shows not much usefull:
Program received signal SIGABRT, Aborted.
0x00007fff82a31866 in ?? ()
(gdb) bt
#0  0x00007fff82a31866 in ?? ()
#1  0x00007fff8ec4735c in ?? ()
warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

#2  0x0000000000000000 in ?? ()
(gdb) disassemble $rip-32,$rip+32
Dump of assembler code from 0x7fff82a31846 to 0x7fff82a31886:
   0x00007fff82a31846:  add    %eax,(%rax)
   0x00007fff82a31848:  add    -0x77(%rcx),%cl
   0x00007fff82a3184b:  lret   $0x50f
   0x00007fff82a3184e:  jae    0x7fff82a31858
   0x00007fff82a31850:  mov    %rax,%rdi
   0x00007fff82a31853:  jmpq   0x7fff82a2e175
   0x00007fff82a31858:  retq
   0x00007fff82a31859:  nop
   0x00007fff82a3185a:  nop
   0x00007fff82a3185b:  nop
   0x00007fff82a3185c:  mov    $0x2000148,%eax
   0x00007fff82a31861:  mov    %rcx,%r10
   0x00007fff82a31864:  syscall
=> 0x00007fff82a31866:  jae    0x7fff82a31870
   0x00007fff82a31868:  mov    %rax,%rdi
   0x00007fff82a3186b:  jmpq   0x7fff82a2e175
   0x00007fff82a31870:  retq
   0x00007fff82a31871:  nop
   0x00007fff82a31872:  nop
   0x00007fff82a31873:  nop
   0x00007fff82a31874:  mov    $0x200014c,%eax
   0x00007fff82a31879:  mov    %rcx,%r10
   0x00007fff82a3187c:  syscall
   0x00007fff82a3187e:  jae    0x7fff82a31888
   0x00007fff82a31880:  mov    %rax,%rdi
   0x00007fff82a31883:  jmpq   0x7fff82a2e175



-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Old school: Use the lowest level language in which you can solve the problem
            conveniently.
New school: Use the highest level language in which the latest supercomputer
            can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20141203/7f08ded7/attachment.asc>


More information about the ffmpeg-devel mailing list