[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions
Sun Jul 8 15:26:06 CEST 2007
Zuxy Meng a ?crit :
> I did a quick test on 64-bit K8 tonight thanks to Stephan's testbed.
And myself on a x86-64 core2 system.
> The result wasn't promising. In short, from fastest to slowest:
> MMX > SSE2 w/o sw pipeling > SSE2 w/ sw pipeling
I haven't tested the mid-performer, but I can confirm this. Using
START/STOP_TIMER, the figures are (on a 1080p sequence): ~2800
dezicycles for MMX, ~3800 for SSE2.
> So the conclusion is that I can't make a conclusion. Any suggestions?
Maybe have a look at the attached opannotate (based on 4 runs) for your
s/w pipelined SSE2 functions?
The 1/4 and 3/4 seem well pipelined, with only the output that's costly.
However, if opannotate is to be believed (because some timings are very
surprising), the 1/2 gets quite a lot of stalls, probably up to the
point where they make up for most of the execution time.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
More information about the ffmpeg-devel