[FFmpeg-devel] [FFmpeg-cvslog] r12171 - trunk/doc/optimization.txt

Michael Niedermayer michaelni
Thu Feb 21 20:11:20 CET 2008


On Thu, Feb 21, 2008 at 08:52:17PM +0200, ?smail D?nmez wrote:
> Hi,
> 
> >Author: melanson
> >Date: Thu Feb 21 19:46:49 2008
> >New Revision: 12171
> >
> >Log:
> >minor English corrections
> >
> >
> >Modified:
> >  trunk/doc/optimization.txt
> [...]
> >  -Use asm() instead of intrinsics. Later requires a good optimizing compiler
> >  +Use asm() instead of intrinsics. The latter requires a good optimizing compiler
> >   which gcc is not.
> 
> We all know this is FUD now, I know Michael still uses gcc 2.95 but
> the world have moved on. GCC 4.3 is about to be released.
> So please either backup these claims or note that this is not true for
> recent GCCs.

I use gcc r132072 ATM, i admit its a few days old, do you claim that gcc
was rewritten yesterday?

Also to backup the claim, the following was suggested to me a few days ago:
-static inline void diff_pixels_mmx(DCTELEM *block, const uint8_t *s1, const uint8_t *s2, int stride)
+static void diff_pixels_mmx(DCTELEM *block, const uint8_t *s1, const uint8_t *s2, long stride)
 {
-    asm volatile(
-        "pxor %%mm7, %%mm7              \n\t"
-        "mov $-128, %%"REG_a"           \n\t"
-        ASMALIGN(4)
-        "1:                             \n\t"
-        "movq (%0), %%mm0               \n\t"
-        "movq (%1), %%mm2               \n\t"
-        "movq %%mm0, %%mm1              \n\t"
-        "movq %%mm2, %%mm3              \n\t"
-        "punpcklbw %%mm7, %%mm0         \n\t"
-        "punpckhbw %%mm7, %%mm1         \n\t"
-        "punpcklbw %%mm7, %%mm2         \n\t"
-        "punpckhbw %%mm7, %%mm3         \n\t"
-        "psubw %%mm2, %%mm0             \n\t"
-        "psubw %%mm3, %%mm1             \n\t"
-        "movq %%mm0, (%2, %%"REG_a")    \n\t"
-        "movq %%mm1, 8(%2, %%"REG_a")   \n\t"
-        "add %3, %0                     \n\t"
-        "add %3, %1                     \n\t"
-        "add $16, %%"REG_a"             \n\t"
-        "jnz 1b                         \n\t"
-        : "+r" (s1), "+r" (s2)
-        : "r" (block+64), "r" ((long)stride)
-        : "%"REG_a
-    );
+    long offset = -128;
+    MOVQ_ZERO(mm7);
+    do {
+        asm volatile(
+            "movq (%0), %%mm0         \n\t"
+            "movq (%1), %%mm2         \n\t"
+            "movq %%mm0, %%mm1        \n\t"
+            "movq %%mm2, %%mm3        \n\t"
+            "punpcklbw %%mm7, %%mm0   \n\t"
+            "punpckhbw %%mm7, %%mm1   \n\t"
+            "punpcklbw %%mm7, %%mm2   \n\t"
+            "punpckhbw %%mm7, %%mm3   \n\t"
+            "psubw %%mm2, %%mm0       \n\t"
+            "psubw %%mm3, %%mm1       \n\t"
+            "movq %%mm0, (%2, %4)     \n\t"
+            "movq %%mm1, 8(%2, %4)    \n\t"
+            : : "r" (s1), "r" (s2), "r" (block+64), "r" (stride), "r" (offset)
+            : "memory");
+        s1 += stride;
+        s2 += stride;
+        offset += 16;
+    } while (offset < 0);
 }

the effect that has on the generated asm is:
.L143:
        .loc 3 241 0
        leaq    (%rsi,%r8), %rdx
        leaq    (%r10,%r8), %rax
#APP
# 241 "dsputil_mmx.c" 1
        movq (%rdx), %mm0
        movq (%rax), %mm2
        movq %mm0, %mm1
        movq %mm2, %mm3
        punpcklbw %mm7, %mm0
        punpckhbw %mm7, %mm1
        punpcklbw %mm7, %mm2
        punpckhbw %mm7, %mm3
        psubw %mm2, %mm0
        psubw %mm3, %mm1
        movq %mm0, (%rdi, %r9)
        movq %mm1, 8(%rdi, %r9)

# 0 "" 2
        .loc 3 258 0
#NO_APP
        addq    %rcx, %r8
        .loc 3 259 0
        addq    $16, %r9
        jne     .L143
-------------

As you can see gcc injects 2 unneeded lea instructions in the innermost loop.
And i think this is a very simple asm, if you want you can try this with some
complex code, but i recommand that you have a few bags for vomit ready ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The educated differ from the uneducated as much as the living from the
dead. -- Aristotle 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080221/e062ef54/attachment.pgp>



More information about the ffmpeg-devel mailing list