[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Michael Niedermayer michaelni
Fri Sep 24 23:22:45 CEST 2010


On Fri, Sep 24, 2010 at 05:10:35PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> On Fri, Sep 24, 2010 at 4:50 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
> > The only way to do it is to use "m" and put the entire address in the input
> > constraint.
> >
> > ? ? ? ?"movd %0, %%mm1 \n"
> > ? ? ? ?"por ?%1, %%mm1 \n"
> > ? ? ? ?::"m"(nnz[b_idx]),
> > ? ? ? ? ?"m"(nnz[b_idx+d_idx])
> > ? ?);
> 
> Ah, that works.
> 
> after: 887 dezicycles in lf-strength, 4194133 runs, 171 skips
> before: 964 dezicycles in lf-strength, 4194097 runs, 207 skips
> 
> See attached patch, to be applied after my original first patch -
> reattached here for convenience - that went from 116 to 96 cycles.
> This is pretty much the performance gain that yasm gave me also, I'm
> only 2 cycles off now.
> 
>                     __asm__ volatile(
>                         "movd        (%0), %%mm0 \n"
>                         "psubb       (%1), %%mm0 \n" // ref[b] != ref[bn]
>                         "movq        (%2), %%mm1 \n"
>                         "movq       8(%2), %%mm2 \n"
>                         "psubw       (%3), %%mm1 \n"
>                         "psubw      8(%3), %%mm2 \n"
>                         "packsswb   %%mm2, %%mm1 \n"
>                         "paddb      %%mm6, %%mm1 \n"
>                         "psubusb    %%mm5, %%mm1 \n" // abs(mv[b] -
> mv[bn]) >= limit
>                         "packsswb   %%mm1, %%mm1 \n"
>                         "por        %%mm1, %%mm0 \n"
>                         ::"r"(ref[0]+b_idx),
>                           "r"(ref[0]+b_idx+d_idx),
>                           "r"(mv[0]+b_idx),
>                           "r"(mv[0]+b_idx+d_idx)
>                     );
> 
> then leads to:
> 
> 0x000000010041eb00 <h264_loop_filter_strength_mmx2+144>:	lea
> (%r10,%rbp,1),%rdx
> 0x000000010041eb04 <h264_loop_filter_strength_mmx2+148>:	lea
> (%r12,%r10,4),%rax
> 0x000000010041eb08 <h264_loop_filter_strength_mmx2+152>:	movd   (%rdx),%mm0
> 0x000000010041eb0b <h264_loop_filter_strength_mmx2+155>:	psubb
> (%rdx,%r13,1),%mm0
> 0x000000010041eb10 <h264_loop_filter_strength_mmx2+160>:	movq   (%rax),%mm1
> 0x000000010041eb13 <h264_loop_filter_strength_mmx2+163>:	movq   0x8(%rax),%mm2
> 0x000000010041eb17 <h264_loop_filter_strength_mmx2+167>:	psubw
> (%rax,%r13,4),%mm1
> 0x000000010041eb1c <h264_loop_filter_strength_mmx2+172>:	psubw
> 0x8(%rax,%r13,4),%mm2
> 0x000000010041eb22 <h264_loop_filter_strength_mmx2+178>:	packsswb %mm2,%mm1
> 0x000000010041eb25 <h264_loop_filter_strength_mmx2+181>:	paddb  %mm6,%mm1
> 0x000000010041eb28 <h264_loop_filter_strength_mmx2+184>:	psubusb %mm5,%mm1
> 0x000000010041eb2b <h264_loop_filter_strength_mmx2+187>:	packsswb %mm1,%mm1
> 0x000000010041eb2e <h264_loop_filter_strength_mmx2+190>:	por    %mm1,%mm0
> 
> I'm still relatively unhappy about the leas all around (this might
> have a negative performance impact on x86-32, will test that later,
> have to go now). But it mostly works the way I want it to.
> 
> Michael can you review both patches?
> 
> Ronald

>  h264dsp_mmx.c |   42 ++++++++++++++++++++++--------------------
>  1 file changed, 22 insertions(+), 20 deletions(-)
> 8b974e42f4b86f653d1def64b086946940cf8201  fix-lfstrength-inline-asm-lessvars.patch


[...]
>  h264dsp_mmx.c |   54 +++++++++++++++++++++++++++++-------------------------
>  1 file changed, 29 insertions(+), 25 deletions(-)
> cb33468fc082b96bad9ad7899f357e4133fe7294  fix-lfstrength-inline-asm.patch

great work, both patches ok if tested


-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Opposition brings concord. Out of discord comes the fairest harmony.
-- Heraclitus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100924/436e6749/attachment.pgp>



More information about the ffmpeg-devel mailing list