[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Michael Niedermayer michaelni
Fri Sep 24 23:25:16 CEST 2010


On Fri, Sep 24, 2010 at 11:22:45PM +0200, Michael Niedermayer wrote:
> On Fri, Sep 24, 2010 at 05:10:35PM -0400, Ronald S. Bultje wrote:
> > Hi,
> > 
> > On Fri, Sep 24, 2010 at 4:50 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
> > > The only way to do it is to use "m" and put the entire address in the input
> > > constraint.
> > >
> > > ? ? ? ?"movd %0, %%mm1 \n"
> > > ? ? ? ?"por ?%1, %%mm1 \n"
> > > ? ? ? ?::"m"(nnz[b_idx]),
> > > ? ? ? ? ?"m"(nnz[b_idx+d_idx])
> > > ? ?);
> > 
> > Ah, that works.
> > 
> > after: 887 dezicycles in lf-strength, 4194133 runs, 171 skips
> > before: 964 dezicycles in lf-strength, 4194097 runs, 207 skips
> > 
> > See attached patch, to be applied after my original first patch -
> > reattached here for convenience - that went from 116 to 96 cycles.
> > This is pretty much the performance gain that yasm gave me also, I'm
> > only 2 cycles off now.
> > 
> >                     __asm__ volatile(
> >                         "movd        (%0), %%mm0 \n"
> >                         "psubb       (%1), %%mm0 \n" // ref[b] != ref[bn]
> >                         "movq        (%2), %%mm1 \n"
> >                         "movq       8(%2), %%mm2 \n"
> >                         "psubw       (%3), %%mm1 \n"
> >                         "psubw      8(%3), %%mm2 \n"
> >                         "packsswb   %%mm2, %%mm1 \n"
> >                         "paddb      %%mm6, %%mm1 \n"
> >                         "psubusb    %%mm5, %%mm1 \n" // abs(mv[b] -
> > mv[bn]) >= limit
> >                         "packsswb   %%mm1, %%mm1 \n"
> >                         "por        %%mm1, %%mm0 \n"
> >                         ::"r"(ref[0]+b_idx),
> >                           "r"(ref[0]+b_idx+d_idx),
> >                           "r"(mv[0]+b_idx),
> >                           "r"(mv[0]+b_idx+d_idx)
> >                     );
> > 
> > then leads to:
> > 
> > 0x000000010041eb00 <h264_loop_filter_strength_mmx2+144>:	lea
> > (%r10,%rbp,1),%rdx
> > 0x000000010041eb04 <h264_loop_filter_strength_mmx2+148>:	lea
> > (%r12,%r10,4),%rax
> > 0x000000010041eb08 <h264_loop_filter_strength_mmx2+152>:	movd   (%rdx),%mm0
> > 0x000000010041eb0b <h264_loop_filter_strength_mmx2+155>:	psubb
> > (%rdx,%r13,1),%mm0
> > 0x000000010041eb10 <h264_loop_filter_strength_mmx2+160>:	movq   (%rax),%mm1
> > 0x000000010041eb13 <h264_loop_filter_strength_mmx2+163>:	movq   0x8(%rax),%mm2
> > 0x000000010041eb17 <h264_loop_filter_strength_mmx2+167>:	psubw
> > (%rax,%r13,4),%mm1
> > 0x000000010041eb1c <h264_loop_filter_strength_mmx2+172>:	psubw
> > 0x8(%rax,%r13,4),%mm2
> > 0x000000010041eb22 <h264_loop_filter_strength_mmx2+178>:	packsswb %mm2,%mm1
> > 0x000000010041eb25 <h264_loop_filter_strength_mmx2+181>:	paddb  %mm6,%mm1
> > 0x000000010041eb28 <h264_loop_filter_strength_mmx2+184>:	psubusb %mm5,%mm1
> > 0x000000010041eb2b <h264_loop_filter_strength_mmx2+187>:	packsswb %mm1,%mm1
> > 0x000000010041eb2e <h264_loop_filter_strength_mmx2+190>:	por    %mm1,%mm0
> > 
> > I'm still relatively unhappy about the leas all around (this might
> > have a negative performance impact on x86-32, will test that later,
> > have to go now). But it mostly works the way I want it to.
> > 
> > Michael can you review both patches?
> > 
> > Ronald
> 
> >  h264dsp_mmx.c |   42 ++++++++++++++++++++++--------------------
> >  1 file changed, 22 insertions(+), 20 deletions(-)
> > 8b974e42f4b86f653d1def64b086946940cf8201  fix-lfstrength-inline-asm-lessvars.patch
> 
> 
> [...]
> >  h264dsp_mmx.c |   54 +++++++++++++++++++++++++++++-------------------------
> >  1 file changed, 29 insertions(+), 25 deletions(-)
> > cb33468fc082b96bad9ad7899f357e4133fe7294  fix-lfstrength-inline-asm.patch
> 
> great work, both patches ok if tested

and they pass tests aka not ok ;)

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Republics decline into democracies and democracies degenerate into
despotisms. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100924/b7458b5c/attachment.pgp>



More information about the ffmpeg-devel mailing list