[FFmpeg-devel] [PATCH] h264_i386: Optimize decode_significance_8x8_x86 for 64 bit.
michaelni at gmx.at
Wed Dec 3 13:19:48 CET 2014
On Wed, Dec 03, 2014 at 09:00:39AM +0100, Reimar Döffinger wrote:
> On 03.12.2014, at 01:40, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Sat, Nov 22, 2014 at 02:09:01PM +0100, Reimar Döffinger wrote:
> >> On Mon, Nov 17, 2014 at 01:41:13PM +0100, Michael Niedermayer wrote:
> >>> On Mon, Nov 17, 2014 at 08:19:32AM +0100, Reimar Döffinger wrote:
> >>>> On 17.11.2014, at 02:37, Michael Niedermayer <michaelni at gmx.at> wrote:
> >>>>> On Sat, Nov 15, 2014 at 06:16:03PM +0100, Reimar Döffinger wrote:
> >>>>>> 11674 -> 10877 decicycles on my Phenom II.
> >>>>>> Overall speedup was unfortunately within measurement error.
> >>>>> here its 10153 ->10135
> >>>> I suspect it also depends a bit on the compiler and how it changes the surrounding code.
> >>>> Note that I also tested with PIC actually.
> >>>>> but ive a slightly odd feeling about the chnages to the asm code,
> >>>>> iam not sure if all assemblers will be happy about the changed
> >>>>> code
> >>>> Do you mean particularly the movzbl change?
> >>> yes and the k stuff
> >>>> I am also unsure about that, I think there was a reason for that %k6 mess...
> >>>> But this as well as movzx seemed to work for me...
> >>> it works here too i just have the feeling it might fail on some odd
> >>> assembler or platform. Thats not meant to keep you from pushing this
> >>> just that it might require to be reverted or fixed if such
> >>> problems actually occor
> >> I pushed it.
> >> If anyone sees issues please tell me and I'll look into it!
> > i think these fate failures are caused by it but thats based just
> > on other commits in the range looking unlikely:
> > http://fate.ffmpeg.org/report.cgi?time=20141122231657&slot=x86_64-darwin-clang-3.5-O3
> > http://fate.ffmpeg.org/report.cgi?time=20141122223720&slot=x86_64-darwin-clang-3.5
> That's annoying, I only expected compile errors, this looks more like a compiler bug.
> Can someone run tests?
> Does just using the "m" instead of "r" constraint like on 32 bit fix it?
still aborts with:
@@ -37,7 +37,7 @@
-#define REG64 "r"
+#define REG64 "m"
#define REG64 "m"
ggdb shows not much usefull:
Program received signal SIGABRT, Aborted.
0x00007fff82a31866 in ?? ()
#0 0x00007fff82a31866 in ?? ()
#1 0x00007fff8ec4735c in ?? ()
warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
#2 0x0000000000000000 in ?? ()
(gdb) disassemble $rip-32,$rip+32
Dump of assembler code from 0x7fff82a31846 to 0x7fff82a31886:
0x00007fff82a31846: add %eax,(%rax)
0x00007fff82a31848: add -0x77(%rcx),%cl
0x00007fff82a3184b: lret $0x50f
0x00007fff82a3184e: jae 0x7fff82a31858
0x00007fff82a31850: mov %rax,%rdi
0x00007fff82a31853: jmpq 0x7fff82a2e175
0x00007fff82a3185c: mov $0x2000148,%eax
0x00007fff82a31861: mov %rcx,%r10
=> 0x00007fff82a31866: jae 0x7fff82a31870
0x00007fff82a31868: mov %rax,%rdi
0x00007fff82a3186b: jmpq 0x7fff82a2e175
0x00007fff82a31874: mov $0x200014c,%eax
0x00007fff82a31879: mov %rcx,%r10
0x00007fff82a3187e: jae 0x7fff82a31888
0x00007fff82a31880: mov %rax,%rdi
0x00007fff82a31883: jmpq 0x7fff82a2e175
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Old school: Use the lowest level language in which you can solve the problem
New school: Use the highest level language in which the latest supercomputer
can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 181 bytes
Desc: Digital signature
More information about the ffmpeg-devel