[FFmpeg-devel] [RFC] An improved implementation of ARMv5TE?IDCT (simple_idct_armv5te.S)

Siarhei Siamashka siarhei.siamashka
Fri Sep 14 19:07:44 CEST 2007


On 14 September 2007, Siarhei Siamashka wrote:

[...]

> Do you suggest something like I tried in one of the older revisions
> ('idct_row_full' and 'idct_row_partial' labels)?:
> https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libavcodec/armv4l
>/simple_idct_armv5te.S?root=mplayer&rev=82&view=markup
>
> That's a somewhat gray area and the decision is not clear. As most
> videos contain a lot of empty rows (only row[0] is nonzero), code
> from 'idct_row_partial' would be executed in only about 10% of cases.
> That branch saves us 16 cycles total. On the other hand, the conditional
> branch instruction to 'idct_row_partial' will be always executed and take
> at least 1 cycle. So while 16 cycles saving is still more than 10 times per
> 1 cycle, one more concern is code size. As you could see in my previous
> post with -O2 vs. -O3 benchmarks, mpeg4 decoder actually might be short on
> instructions cache size and increasing code size without a really good
> reason might be not a very good idea.

On the second thought, that may probably provide some positive result if zero
coefficients check for row[4], row[5], row[6] and row[7] is duplicated at the
start of 'idct_row_full' keeping only one conditional jump in the main loop.
I'll try to benchmark it.




More information about the ffmpeg-devel mailing list