[FFmpeg-devel] [RFC] SSE3/4 implementation of flac_encode_residual_lpc

Jason Garrett-Glaser darkshikari
Mon May 4 07:02:11 CEST 2009

> pmaddwd is your 16x16->32 signed multiply instruction. ?It will do
> just as much work as pmulld in the case where the data is limited to
> 16 bits--except at twice the speed.

Also note about this: if you know that adding the results of any two
multiplies won't overflow 32 bits, pmaddwd will do twice as much work
as pmulld, and for a bonus, it even adds each pair of values together,
finishing part of your horizontal sum.

Dark Shikari

More information about the ffmpeg-devel mailing list