[FFmpeg-devel] [PATCH] faster vp6 decoding

Jason Garrett-Glaser darkshikari
Mon Feb 9 10:27:58 CET 2009


+    "punpcklbw %%mm7, %%mm0\n\t"                                \
+    "punpcklbw %%mm7, %%mm1\n\t"                                \
+    "punpckhbw %%mm7, %%mm3\n\t"                                \
+    "punpckhbw %%mm7, %%mm4\n\t"                                \
+    "pmullw  0(%2), %%mm0\n\t" /* src[x-8 ] * biweight [0] */   \
+    "pmullw  8(%2), %%mm1\n\t" /* src[x   ] * biweight [1] */   \
+    "pmullw  0(%2), %%mm3\n\t" /* src[x-8 ] * biweight [0] */   \
+    "pmullw  8(%2), %%mm4\n\t" /* src[x   ] * biweight [1] */   \
+    "paddw %%mm1, %%mm0\n\t"                                    \
+    "paddw %%mm4, %%mm3\n\t"                                    \

This can be done faster with pmaddubsw (SSSE3-only, but worth making
another version surely).  Worthwhile if you make an SSE version.
Works by interleaving the weights, allowing you to avoid the unpacks,
use only two multiplies, and avoid the adds, too, I think.  If I'm
right, that makes the entire thing quite a bit less than half the
instructions.

Dark Shikari




More information about the ffmpeg-devel mailing list