[FFmpeg-devel] [RFC] An improved implementation of ARMv5TE?IDCT (simple_idct_armv5te.S)
Fri Sep 14 19:07:44 CEST 2007
On 14 September 2007, Siarhei Siamashka wrote:
> Do you suggest something like I tried in one of the older revisions
> ('idct_row_full' and 'idct_row_partial' labels)?:
> That's a somewhat gray area and the decision is not clear. As most
> videos contain a lot of empty rows (only row is nonzero), code
> from 'idct_row_partial' would be executed in only about 10% of cases.
> That branch saves us 16 cycles total. On the other hand, the conditional
> branch instruction to 'idct_row_partial' will be always executed and take
> at least 1 cycle. So while 16 cycles saving is still more than 10 times per
> 1 cycle, one more concern is code size. As you could see in my previous
> post with -O2 vs. -O3 benchmarks, mpeg4 decoder actually might be short on
> instructions cache size and increasing code size without a really good
> reason might be not a very good idea.
On the second thought, that may probably provide some positive result if zero
coefficients check for row, row, row and row is duplicated at the
start of 'idct_row_full' keeping only one conditional jump in the main loop.
I'll try to benchmark it.
More information about the ffmpeg-devel