[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Ronald S. Bultje rsbultje
Fri Sep 24 23:10:35 CEST 2010


Hi,

On Fri, Sep 24, 2010 at 4:50 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
> The only way to do it is to use "m" and put the entire address in the input
> constraint.
>
> ? ? ? ?"movd %0, %%mm1 \n"
> ? ? ? ?"por ?%1, %%mm1 \n"
> ? ? ? ?::"m"(nnz[b_idx]),
> ? ? ? ? ?"m"(nnz[b_idx+d_idx])
> ? ?);

Ah, that works.

after: 887 dezicycles in lf-strength, 4194133 runs, 171 skips
before: 964 dezicycles in lf-strength, 4194097 runs, 207 skips

See attached patch, to be applied after my original first patch -
reattached here for convenience - that went from 116 to 96 cycles.
This is pretty much the performance gain that yasm gave me also, I'm
only 2 cycles off now.

                    __asm__ volatile(
                        "movd        (%0), %%mm0 \n"
                        "psubb       (%1), %%mm0 \n" // ref[b] != ref[bn]
                        "movq        (%2), %%mm1 \n"
                        "movq       8(%2), %%mm2 \n"
                        "psubw       (%3), %%mm1 \n"
                        "psubw      8(%3), %%mm2 \n"
                        "packsswb   %%mm2, %%mm1 \n"
                        "paddb      %%mm6, %%mm1 \n"
                        "psubusb    %%mm5, %%mm1 \n" // abs(mv[b] -
mv[bn]) >= limit
                        "packsswb   %%mm1, %%mm1 \n"
                        "por        %%mm1, %%mm0 \n"
                        ::"r"(ref[0]+b_idx),
                          "r"(ref[0]+b_idx+d_idx),
                          "r"(mv[0]+b_idx),
                          "r"(mv[0]+b_idx+d_idx)
                    );

then leads to:

0x000000010041eb00 <h264_loop_filter_strength_mmx2+144>:	lea
(%r10,%rbp,1),%rdx
0x000000010041eb04 <h264_loop_filter_strength_mmx2+148>:	lea
(%r12,%r10,4),%rax
0x000000010041eb08 <h264_loop_filter_strength_mmx2+152>:	movd   (%rdx),%mm0
0x000000010041eb0b <h264_loop_filter_strength_mmx2+155>:	psubb
(%rdx,%r13,1),%mm0
0x000000010041eb10 <h264_loop_filter_strength_mmx2+160>:	movq   (%rax),%mm1
0x000000010041eb13 <h264_loop_filter_strength_mmx2+163>:	movq   0x8(%rax),%mm2
0x000000010041eb17 <h264_loop_filter_strength_mmx2+167>:	psubw
(%rax,%r13,4),%mm1
0x000000010041eb1c <h264_loop_filter_strength_mmx2+172>:	psubw
0x8(%rax,%r13,4),%mm2
0x000000010041eb22 <h264_loop_filter_strength_mmx2+178>:	packsswb %mm2,%mm1
0x000000010041eb25 <h264_loop_filter_strength_mmx2+181>:	paddb  %mm6,%mm1
0x000000010041eb28 <h264_loop_filter_strength_mmx2+184>:	psubusb %mm5,%mm1
0x000000010041eb2b <h264_loop_filter_strength_mmx2+187>:	packsswb %mm1,%mm1
0x000000010041eb2e <h264_loop_filter_strength_mmx2+190>:	por    %mm1,%mm0

I'm still relatively unhappy about the leas all around (this might
have a negative performance impact on x86-32, will test that later,
have to go now). But it mostly works the way I want it to.

Michael can you review both patches?

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-lfstrength-inline-asm-lessvars.patch
Type: application/octet-stream
Size: 4202 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100924/400fb1a7/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-lfstrength-inline-asm.patch
Type: application/octet-stream
Size: 2946 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100924/400fb1a7/attachment-0001.obj>



More information about the ffmpeg-devel mailing list