[FFmpeg-devel] [PATCH] H.264: x264 SSE2 iDCT functions

Jason Garrett-Glaser darkshikari
Fri Jan 2 20:03:48 CET 2009


$subject

Benchmarks:

Cathedral:
idct_add16: 293 -> 282 clocks
idct_add16intra: 343 -> 257 clocks

"300" sample (contains almost no i16x16 blocks so I didn't test add16intra):
idct_add16: 518 -> 433

Higher benefit is due to higher bitrate, most likely.

idct_DC was ommitted from idct_add16 because the extra branching logic
turned out to make it significantly slower (the branching becomes much
more complicated and less likely as *both* 4x4 DCT blocks have to be
DC-only for it to work).

x264 iDCT code was modified to add a stride parameter, required for ffh264.

x86util.asm was included from x264 in full for simplicity's sake and
ease of use for adding future x264 assembly that uses it.

Dark Shikari
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x264_idct.diff
Type: text/x-diff
Size: 13224 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090102/89b5063f/attachment.diff>



More information about the ffmpeg-devel mailing list