[FFmpeg-devel] [RFC] An improved implementation of?ARMv5TE?IDCT (simple_idct_armv5te.S)

Siarhei Siamashka siarhei.siamashka
Sat Sep 15 10:44:34 CEST 2007


On 15 September 2007, Michael Niedermayer wrote:

> > > i think this can be implemented in fewer instructions, someting based
> > > on:
> > >
> > > v2 = v1 - W4*row[4]
> > > v1 = v1 + W4*row[4]
> > >
> > > v3 = v2 - W6*row[2]
> > > v4 = v1 - W2*row[2]
> > >
> > > v3 += W2*row[6]
> > > v4 -= W6*row[6]
> > >
> > > v2 = 2*v2 - v3
> > > v1 = 2*v1 - v4
> >
> > Took a close look at it. That really should do the job (each statement
> > mapping to one instruction), so we can save whole 4 cycles thanks to it.
> > Though I'm a bit worried about possible overflows because of the *2
> > multiplication in the last two statements, so this code would be not
> > completely identical to C implementation of simple_idct on some extreme
> > cases of input data. Should we assume some sane restrictions for input
> > data for regression testing?
>
> as long as the operations are normal ANSI-C twos complement style (=not
> some weird useless saturaton stuff) the code is identical to yours, no
> matter how large the input values are
>
> you could say that any overflows always would be canceled by other
> overflows
>
> you can look in your favorite math book (or wikipedia) about rings and
> modular arithmetic ...

Yes, you are right, thanks. It was too late in the evening when I wrote that
post and I somehow messed up the difference between two complement additions,
signed/unsigned multiplications and overflows for a moment :)




More information about the ffmpeg-devel mailing list