[FFmpeg-devel] [RFC] An improved implementation of ARMv5TE?IDCT (simple_idct_armv5te.S)

Siarhei Siamashka siarhei.siamashka
Sat Sep 15 00:06:17 CEST 2007


On 14 September 2007, Michael Niedermayer wrote:

[...]

> > +        smlabb v4, a3, v6, v1          /* v4 = v1 - W2*row[2] */
> > +        smlabb v3, a4, v6, v1          /* v3 = v1 - W6*row[2] */
> > +        smlatb v2, a4, v6, v1          /* v2 = v1 + W6*row[2] */
> > +        smlatb v1, a3, v6, v1          /* v1 = v1 + W2*row[2] */
>
> [---]
>
> > +        smlabb v4, a4, v8, v4          /* v4 -= W6*row[6] */
> > +        smlatb v3, a3, v8, v3          /* v3 += W2*row[6] */
> > +        smlabb v2, a3, v8, v2          /* v2 -= W2*row[6] */
> > +        smlatb v1, a4, v8, v1          /* v1 += W6*row[6] */
> > +        ldrd   a3, w1357idct_rows_armv5te /* a3 = W1 | (W3 << 16) */
> > +                                       /* a4 = W5 | (W7 << 16) */
>
> [---]
>
> > +        smlatb v4, a2, v7, v4          /* v4 += W4*row[4] */
> > +        smlabb v3, a2, v7, v3          /* v3 -= W4*row[4] */
> > +        smlabb v2, a2, v7, v2          /* v2 -= W4*row[4] */
> > +        smlatb v1, a2, v7, v1          /* v1 += W4*row[4] */
>
> i think this can be implemented in fewer instructions, someting based on:
>
> v2 = v1 - W4*row[4]
> v1 = v1 + W4*row[4]
>
> v3 = v2 - W6*row[2]
> v4 = v1 - W2*row[2]
>
> v3 += W2*row[6]
> v4 -= W6*row[6]
>
> v2 = 2*v2 - v3
> v1 = 2*v1 - v4

Took a close look at it. That really should do the job (each statement mapping
to one instruction), so we can save whole 4 cycles thanks to it. Though I'm
a bit worried about possible overflows because of the *2 multiplication in the
last two statements, so this code would be not completely identical to C
implementation of simple_idct on some extreme cases of input data. Should we
assume some sane restrictions for input data for regression testing?

Anyway, I will try to provide an updated revision of the patch tomorrow with
this optimization included.




More information about the ffmpeg-devel mailing list