[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms

Michael Niedermayer michaelni
Tue Jan 22 15:21:18 CET 2008


On Sun, Jan 20, 2008 at 12:37:21PM +0100, Christophe GISQUET wrote:
> Hi,
> 
> Michael Niedermayer a ?crit :
> > i think, the following is safe
> > 
> >         t1 = src[0] + src[2];
> >         t2 = src[0] - src[2];
> >         t1= 8*t1 + (t1>>1);
> >         t2= 8*t2 + (t2>>1);
> > 
> >         t3 = 11 * src[1] + 5 * src[3];
> >         t4 = 11 * src[3] - 5 * src[1];
> > 
> >         dst[0] = (t1 + t3 + 2) >> 2;
> >         dst[1] = (t2 - t4 + 2) >> 2;
> >         dst[2] = (t2 + t4 + 2) >> 2;
> >         dst[3] = (t1 - t3 + 2) >> 2;
> [...]
> 
> Ok I've implemented that. I also tried to decompose t3 and t4 as:
> t3 = 5(2s1+s3) + s1
> t4 = 5(2s3-s1) + s3
> (trading one constant loading from memory and 2 multiplies for 2 shift
> and 2 additions)
> 
> But this is slower, and in fact I can load the multiply constants in
> registers (by loading the bias from memory instead), further increasing
> the speed difference.
> 
> 1D2 ~ 1080 dezicycles
> 1D3 ~ 1120
> 
> Anyway, that's mostly for reference, as it was shown the 4x4 dct is not
> relevant speedwise and the code for transposing the zz scantables is not
> provided.

If the 4x4 dct matters speedwise depends on bitrate, vissual content and
encoder. Awnsering the question if it does matter requires to compare
vissally diverse material encoded at different bitrates and all available
encoders. Also do all profiles allow the larger dcts?

anyway a 0.1 % overall speedup is enough to justify an optimization, that
was that way in the past and still is

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Good people do not need laws to tell them to act responsibly, while bad
people will find a way around the laws. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080122/2ed4da3a/attachment.pgp>



More information about the ffmpeg-devel mailing list