[FFmpeg-devel] [PATCH] MMX2/SSSE3 VC1 loop filter

Wed Apr 1 11:06:59 CEST 2009

Hi,

Overall 17% faster decode on my Penryn, including the first function  
to use SSE4 instructions in ffmpeg! (which shave an entire 2 clocks  
off of vc1_h_loop_filter8 for me)

One thing I don't understand is why the PSIGNW_SRA_MMX macro is  
necessary for correct results. I know that psignw isn't equivalent to  
((a ^ b) - b), but the only difference I'm aware of is when b is 0,  
psignw sets a to 0 as well. It's probably a stupidly simple case that  
I'm missing...

700 dezicycles in vc1_v_loop_filter4_mmx2, 1048506 runs, 70 skips.
639 dezicycles in vc1_v_loop_filter4_ssse3, 1048447 runs, 129 skips

977 dezicycles in vc1_h_loop_filter4_mmx2, 2097069 runs, 83 skips.
951 dezicycles in vc1_h_loop_filter4_ssse3, 2097040 runs, 112 skips

1116 dezicycles in vc1_v_loop_filter8_mmx2, 33552803 runs, 1629 skips
677 dezicycles in vc1_v_loop_filter8_ssse3, 33552817 runs, 1615 skips

1648 dezicycles in vc1_h_loop_filter8_mmx2, 33552806 runs, 1626 skips
1158 dezicycles in vc1_h_loop_filter8_ssse3, 33552878 runs, 1554 skips
1137 dezicycles in vc1_h_loop_filter8_sse4, 33553447 runs, 985 skips

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vc1-sse-lf.txt
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090401/ab929e72/attachment.txt>
-------------- next part --------------