[FFmpeg-devel] [PATCH] MMX2/SSSE3 VC1 loop filter

Kostya kostya.shishkov
Sun Jul 4 16:12:52 CEST 2010


On Sun, Jul 04, 2010 at 12:54:15PM +0200, Reimar D?ffinger wrote:
> On Wed, Apr 01, 2009 at 05:06:59AM -0400, David Conrad wrote:
> > Overall 17% faster decode on my Penryn, including the first function
> > to use SSE4 instructions in ffmpeg! (which shave an entire 2 clocks
> > off of vc1_h_loop_filter8 for me)
> > 
> > One thing I don't understand is why the PSIGNW_SRA_MMX macro is
> > necessary for correct results. I know that psignw isn't equivalent
> > to ((a ^ b) - b), but the only difference I'm aware of is when b is
> > 0, psignw sets a to 0 as well. It's probably a stupidly simple case
> > that I'm missing...
> > 
> > 
> > 700 dezicycles in vc1_v_loop_filter4_mmx2, 1048506 runs, 70 skips.
> > 639 dezicycles in vc1_v_loop_filter4_ssse3, 1048447 runs, 129 skips
> > 
> > 977 dezicycles in vc1_h_loop_filter4_mmx2, 2097069 runs, 83 skips.
> > 951 dezicycles in vc1_h_loop_filter4_ssse3, 2097040 runs, 112 skips
> > 
> > 1116 dezicycles in vc1_v_loop_filter8_mmx2, 33552803 runs, 1629 skips
> > 677 dezicycles in vc1_v_loop_filter8_ssse3, 33552817 runs, 1615 skips
> > 
> > 1648 dezicycles in vc1_h_loop_filter8_mmx2, 33552806 runs, 1626 skips
> > 1158 dezicycles in vc1_h_loop_filter8_ssse3, 33552878 runs, 1554 skips
> > 1137 dezicycles in vc1_h_loop_filter8_sse4, 33553447 runs, 985 skips
> 
> It seems this got lost?

probably it just slipped past me - I was visiting my homeland then

If it still works then ok.



More information about the ffmpeg-devel mailing list