[FFmpeg-devel] [RFC] SSE3/4 implementation of flac_encode_residual_lpc

Fri Apr 24 23:28:35 CEST 2009

This is my first serious attempt at any sort of SIMD optimization, so
I hope the SSE gurus will be gentle in their review.

Attached are patches to move flac_encode_residual_lpc to dsputils, and
to add SSE3 and SSE4 implementations.  I wrote the SSE3 first, but
since it doesn't have signed 32x32 multiplication AFAICT, I ended up
using double precision floats for it, and the result is code that's
slower than the C version.  Unless somebody has a suggestion of how to
fix this, I'll drop the SSE3 version.

I tried an SSE4 version because it does have signed 32x32->32
multiplication, like the C version uses.  Unfortunately, I don't have an
SSE4-capable processor to test it with, so I can't check its speed or
even its correctness.  Benchmarks welcome.

-- 
Bobby Bingham
??????????????????????
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flac_dsputil.patch
Type: text/x-patch
Size: 4445 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090424/3ab9aaf0/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flac_sse.patch
Type: text/x-patch
Size: 14538 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090424/3ab9aaf0/attachment-0001.bin>