[FFmpeg-devel] [RFC/PATCH] More flexible variafloat_to_int16 , WMA optimization, Vorbis

Loren Merritt lorenm
Tue Jul 15 23:01:12 CEST 2008


On Tue, 15 Jul 2008, Michael Niedermayer wrote:
> On Tue, Jul 15, 2008 at 08:58:23AM -0600, Loren Merritt wrote:
>> On Tue, 15 Jul 2008, Michael Niedermayer wrote:
> [...]
>>> It also might be worth to look at mplayer/liba52/resample_mmx.c, maybe
>>> some
>>> of that code could be reused. Especially as we do not have a MMX
>>> float_to_int16, besides the trick used could be tried with SSE2.
>>
>> I'm not very interested in optimizing for pentium2 / k6-1. I'm not sure I
>> could, anyway; that's so far removed from anything I can benchmark on.
>
> Well, maybe you are interrested an a Merom-2M
> Your SSE2                           : 16009
> My ancient MMX trick ported to SSE2 : 14764

Don't forget to include the cost of add_bias, since you're returning to 
[384.0,386.0] scale.

Merom-2M (T5470), 1024 samples, 2 channels
svn sse2 : 14751
your sse2: 13630 + bias during windowing or something
below    : 17237

@@ -2223,9 +2225,15 @@
)

FLOAT_TO_INT16_INTERLEAVE(sse2,
+    "movdqa ff_pd_0x43c08000, %%xmm7 \n"
+    "movdqa ff_ps_385, %%xmm6   \n"
      "1:                         \n"
-    "cvtps2dq  (%2,%0), %%xmm0  \n"
-    "cvtps2dq  (%3,%0), %%xmm1  \n"
+    "movdqa    (%2,%0), %%xmm0  \n"
+    "movdqa    (%3,%0), %%xmm1  \n"
+    "addps      %%xmm6, %%xmm0  \n"
+    "addps      %%xmm6, %%xmm1  \n"
+    "psubd      %%xmm7, %%xmm0  \n"
+    "psubd      %%xmm7, %%xmm1  \n"
      "packssdw   %%xmm1, %%xmm0  \n"
      "movhlps    %%xmm0, %%xmm1  \n"
      "punpcklwd  %%xmm1, %%xmm0  \n"


--Loren Merritt




More information about the ffmpeg-devel mailing list