[FFmpeg-devel] [RFC] An improved implementation?of?ARMv5TE?IDCT (simple_idct_armv5te.S)
Mon Sep 17 03:36:24 CEST 2007
On 15 September 2007, Michael Niedermayer wrote:
> > > > > > + smulbt a2, a4, v5 /* b2 = W5*row */
> > > > > > + smultt v2, a4, v5 /* b3 = W7*row */
> > > > > > + smlatt a2, v1, v6, a2 /* b2 -= W1*row */
> > > > > > + smlatt v2, a3, v7, v2 /* b3 += W3*row */
> > > > > > + smlatt a2, a4, v7, a2 /* b2 += W7*row */
> > > > > > + smlatt v2, v1, v8, v2 /* b3 -= W1*row */
> > > > > > + smlatt a2, a3, v8, a2 /* b2 += W3*row */
> > > > > > + smlabt v2, v1, v6, v2 /* b3 -= W5*row */
> > > > >
> > > > > somehow i suspect that some speed could be gained by checking row
> > > > > 3/5/7 for being zero?
> > > >
> > > > Checking values for zero and branching seems to be quite expensive
> > > > when inserted in the middle of code, multiplying is very fast. A zero
> > > > check and a conditional branch (50%/50% probability) would be only
> > > > useful on ARM9E if it lets to skip over 7 or more multiply
> > > > instructions.
> > >
> > > columns have all coefficients 0 except the first in ~90% of the cases
> > This special case is already handled. Just check the main loop in
> > 'idct_rows_armv5te' function.
> no, i talk about the column code
OK, thanks, that's another good catch.
It really makes sense for column code as jumping over blocks of 8 instructions
indeed provides a good performance improvement. Though I could not confirm
your claim about ~90% cases in my tests, I got a bit different numbers (mostly
because columns are processed two at once and we check not individual
coefficients, but pairs of them).
Here is the percentage of empty pairs of coefficients on column processing for
that matrixbench_normdivx_vbrmp3.avi clip:
col_freq = 20.869%
col_freq = 52.645%
col_freq = 65.382%
col_freq = 74.502%
col_freq = 82.558%
col_freq = 89.994%
col_freq = 95.175%
col_freq = 98.087%
For higher bitrate movies the statistics is usually 'worse', with lower
percentage of zero coefficients.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 354 bytes
Desc: not available
More information about the ffmpeg-devel