[FFmpeg-devel] [PATCH] h264_i386: Optimize decode_significance_8x8_x86 for 64 bit.

Reimar Döffinger Reimar.Doeffinger at gmx.de
Wed Dec 3 22:39:00 CET 2014


On Wed, Dec 03, 2014 at 01:19:48PM +0100, Michael Niedermayer wrote:
> On Wed, Dec 03, 2014 at 09:00:39AM +0100, Reimar Döffinger wrote:
> > On 03.12.2014, at 01:40, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > On Sat, Nov 22, 2014 at 02:09:01PM +0100, Reimar Döffinger wrote:
> > >> On Mon, Nov 17, 2014 at 01:41:13PM +0100, Michael Niedermayer wrote:
> > >>> On Mon, Nov 17, 2014 at 08:19:32AM +0100, Reimar Döffinger wrote:
> > >>>> On 17.11.2014, at 02:37, Michael Niedermayer <michaelni at gmx.at> wrote:
> > >>>>> On Sat, Nov 15, 2014 at 06:16:03PM +0100, Reimar Döffinger wrote:
> > >>>>>> 11674 -> 10877 decicycles on my Phenom II.
> > >>>>>> Overall speedup was unfortunately within measurement error.
> > >>>>> 
> > >>>>> here its  10153 ->10135
> > >>>> 
> > >>>> I suspect it also depends a bit on the compiler and how it changes the surrounding code.
> > >>>> Note that I also tested with PIC actually.
> > >>>> 
> > >>>>> but ive a slightly odd feeling about the chnages to the asm code,
> > >>>>> iam not sure if all assemblers will be happy about the changed
> > >>>>> code
> > >>>> 
> > >>>> Do you mean particularly the movzbl change?
> > >>> 
> > >>> yes and the k stuff
> > >>> 
> > >>> 
> > >>>> I am also unsure about that, I think there was a reason for that %k6 mess...
> > >>>> But this as well as movzx seemed to work for me...
> > >>> 
> > >>> it works here too i just have the feeling it might fail on some odd
> > >>> assembler or platform. Thats not meant to keep you from pushing this
> > >>> just that it might require to be reverted or fixed if such
> > >>> problems actually occor
> > >> 
> > >> I pushed it.
> > >> If anyone sees issues please tell me and I'll look into it!
> > > 
> > > i think these fate failures are caused by it but thats based just
> > > on other commits in the range looking unlikely:
> > > 
> > > http://fate.ffmpeg.org/report.cgi?time=20141122231657&slot=x86_64-darwin-clang-3.5-O3
> > > http://fate.ffmpeg.org/report.cgi?time=20141122223720&slot=x86_64-darwin-clang-3.5
> > 
> > That's annoying, I only expected compile errors, this looks more like a compiler bug.
> > Can someone run tests?
> > Does just using the "m" instead of "r" constraint like on 32 bit fix it?
> 
> still aborts with:

Oh dear.
On re-reading the code it seems I got a bit confused on what %0 actually
points to (I somehow thought it actually pointed to the on-stack x86_reg).
I can't test and benchmark today, but I think this one might fix it:
--- a/libavcodec/x86/h264_i386.h
+++ b/libavcodec/x86/h264_i386.h
@@ -178,7 +178,7 @@ static int decode_significance_8x8_x86(CABACContext *c,
 
         "mov %2, %0                             \n\t"
         "mov %1, %6                             \n\t"
-        "mov %6, (%0)                           \n\t"
+        "mov %k6, (%0)                          \n\t"
 
         "test $1, %4                            \n\t"
         " jnz 5f                                \n\t"
@@ -191,7 +191,7 @@ static int decode_significance_8x8_x86(CABACContext *c,
         "cmp $63, %6                            \n\t"
         " jb 3b                                 \n\t"
         "mov %2, %0                             \n\t"
-        "mov %6, (%0)                           \n\t"
+        "mov %k6, (%0)                          \n\t"
         "5:                                     \n\t"
         "addl %8, %k0                           \n\t"
         "shr $2, %k0                            \n\t"



More information about the ffmpeg-devel mailing list