[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions
Thu Oct 18 03:37:48 CEST 2007
On Sat, Oct 13, 2007 at 01:28:22PM +0200, Christophe GISQUET wrote:
> Michael Niedermayer a ?crit :
> >> Agreed. However, you trade memory loads/unpacks for potentially worse
> >> code parallelism/pairing and size (there are 4 loops unrolled here). I
> >> wonder if that'll be a win. I leave that to a later patch.
> > you have unrolled the loops in the horizontal direction that also increased
> > the code size and instruction pairing is specific to the good old pentium
> > it has no relevance today
> Figures anyway will put to rest this discussion. For
> vc1_put_ver_16b_shift2_mmx, with pmullw used instead of shift+add:
> 2979 dezicycles in ver, 524174 runs, 114 skips
> (compared to ~3300 initially)
> Now if, contrary to what your suggestion hinted at, we unroll the
> vertical loop:
> 2633 dezicycles in ver, 524208 runs, 80 skips
> Is the code size 2x increase worth the 10% speed up?
of course it is, unless the codec does not get faster overall. its possible
in principle (though i dont think thats the case here) that one function gets
faster but the increase in code size would make the codec overall slower due
to code cache issues, but again i dont think thats the case here, 10% speedup
if you could make the motion compensation code from h.264 10% faster iam sure
you would get a lot of fans ;)
> All of this can be tested by checking #if 0" block in
> vc1_put_ver_16b_shift2_mmx code or, globally, VERT_PIPELINE macro.
> I also used your suggestion for the stride==offset case in
> stride==offset and pipeline (unrolled because simpler to code):
> 2162 dezicycles in norm_pipe, 262091 runs, 53 skips
> 2528 dezicycles in norm, 524200 runs, 88 skips
> This ~20% speed-up does result in also a 2x size increase for the
> function. Not unrolling would I guess yield ~10% and 1.5x code size.
> Attached patch allows to test/verify/report those figures.
iam glad its just for test/verify/report
one patch less to review :)
or did you want a review?
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel