[FFmpeg-devel] [RFC] SSE3/4 implementation of flac_encode_residual_lpc
Bobby Bingham
uhmmmm
Fri Apr 24 23:28:35 CEST 2009
This is my first serious attempt at any sort of SIMD optimization, so
I hope the SSE gurus will be gentle in their review.
Attached are patches to move flac_encode_residual_lpc to dsputils, and
to add SSE3 and SSE4 implementations. I wrote the SSE3 first, but
since it doesn't have signed 32x32 multiplication AFAICT, I ended up
using double precision floats for it, and the result is code that's
slower than the C version. Unless somebody has a suggestion of how to
fix this, I'll drop the SSE3 version.
I tried an SSE4 version because it does have signed 32x32->32
multiplication, like the C version uses. Unfortunately, I don't have an
SSE4-capable processor to test it with, so I can't check its speed or
even its correctness. Benchmarks welcome.
--
Bobby Bingham
??????????????????????
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flac_dsputil.patch
Type: text/x-patch
Size: 4445 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090424/3ab9aaf0/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flac_sse.patch
Type: text/x-patch
Size: 14538 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090424/3ab9aaf0/attachment-0001.bin>
More information about the ffmpeg-devel
mailing list