[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms
Mon Jan 14 20:50:20 CET 2008
On Jan 14, 2008 1:05 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sun, Jan 13, 2008 at 05:10:30PM +0100, Christophe GISQUET wrote:
> > Each function is around 2 times faster. Functions taking the most time
> > now are top level ones or for entropy decoding.
I have few questions, not necessary conflicting with Micheal's review.
- Why you choose to transpose at all. Just to save time and effort?
It is usual to have separate version of SIMD depending if they work on
row or columns. The row and column stages are different and you pass
the differences as parameters.
- Am I wrong or you do all the math in 16 bit signed saturation mode?
According to vc1 draft in first stage the input is in the range
[-2048;2047] the multiply constants are in range [-16;16], this makes
range [-32768;32768] per multiply and you can have 8 of them.
Or multiply constants in range [-22;22], that make range
[-45056;45056] per multiply and you can have 4 of them.
In the second phase the input range is doubled to [-4096,4095]
Are you sure your transforms produce the same result as their _c equivalents?
- Have you seen how other IDCT optimizations work? I may be wrong but
vc1 transformations look like IDCT with quite simplified (smaller)
More information about the ffmpeg-devel