[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try #4

Balatoni Denes dbalatoni
Thu Aug 23 21:01:01 CEST 2007


Hi!

So here is a new patch, I implemented your suggestions. One change is, that I 
used 4 fpadd16 to do the shifting left by four, because what gcc made of the 
C code didn't look all that good - or maybe I misunerstood it, I don't know. 
Anyhow it shouldn't be much slower, as the c version also needed 1 load, 1 
store, 1 shift and 1 logical and, and also increasing the loop variable, and 
checking it (although block_last_index could have made it slightly faster). I 
hope it's ok.

Thursday 23 August 2007 14:00-kor Michael Niedermayer ezt ?rta:
> > HDTV). Also as the idct is rather inaccurate, 
>
> ive not yet looked at how to make it more accurate :)

I am quite positive, that the 2 instruction fmul is the problem. Both halves 
of the multiply do rounding, so this explains everything. And as I mentioned, 
the version that used 16x16->32 bit muls had the same good accuracy as 
simple_idct.

> its like leaving 100euro laying at the street saying its not enough to buy
> a car ...
> [...]
> 2% overall speedup is huge ive rejected patches which would have introduced
> new features because they slowed the code down by 0.1%

Yes, ok, I did it after all (and it didn't hurt :) ). Unfortunatelly I can't 
benchmark properly because of many background processes, but dct-test says - 
though it seems a bit too optimistic - there is a 20% speed-up of the idct. I 
think 5-10% is more realistic and probable, but anyway there should be 
measurable improvment. BTW I do think your rejecting features because they 
slowed the code down by 0.1% is a bit harsh, but that's none of my 
business :)

> also mlib does the idct at half the speed, so i think theres more than 5% of
> gain possible

IMO the idct is not too slow right now. But also imho mlib's speed is because 
of a faster, mpeg (derived) algorithm, which uses half as many multiplies. So 
with the simple_idct algorithm, I don't expect major speedups.

bye
Denes

ps: it would be great if this could be committed as is, because I already 
spent far too much with this code (definietly more than a week, in fact)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simple_idct_vis_try4.diff
Type: text/x-diff
Size: 21978 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070823/7e08557a/attachment.diff>



More information about the ffmpeg-devel mailing list