[FFmpeg-devel] [RFC] SSE3/4 implementation of flac_encode_residual_lpc

Jason Garrett-Glaser darkshikari
Sat Apr 25 09:50:32 CEST 2009


>> >"cvtdq2pd -8(%3,%0), %%xmm5 ? ? ? ? \n\t" ? // xmm5 = smp ?[i-2, i-1]
>>
>> Is it really required to constantly convert in and out of floating
>> point here? ?Mubench ( http://akuvian.org/src/mubench_results.txt )
>> says that this operation is horrifically slow on Athlon 64, for
>> example. ?Why not use integer math?
>
> I realize it's slow -- I only have an Athlon 64 X2 here to test on. ?But
> I either need signed 32x32 multiplication (which AFAICT SSE3 doesn't
> offer) or to implement it myself on top of what is offered.
> Conversion seemed easier, but I'll try to make integer math work next.
> FWIW, my friend with an Intel chip (not sure exact model) reports what
> sounds like slower performance relative to the C code than I get on the
> Athlon 64.

You could try emulating a larger multiply with a smaller one by doing
high bits then low bits, then shifting and adding the results
appropriately.  This would probably still be faster than float
conversion.

>
> Thanks for that link. ?It looks handy.

Also, you will probably find the 4th manual here useful:
http://www.agner.org/optimize/

Dark Shikari



More information about the ffmpeg-devel mailing list