[Ffmpeg-cvslog] r7790 - trunk/libavcodec/armv4l/simple_idct_armv6.S

Måns Rullgård mru
Thu Feb 1 10:17:58 CET 2007

"Guillaume POIRIER" <poirierg at gmail.com> writes:

> Hi,
> On 2/1/07, mru <subversion at mplayerhq.hu> wrote:
>> Author: mru
>> Date: Thu Feb  1 00:04:56 2007
>> New Revision: 7790
>> Modified:
>>    trunk/libavcodec/armv4l/simple_idct_armv6.S
>> Log:
>> optimize IDCT of rows with mostly zero coefficients
> Out of curiosity, does the "mostly zero coefficients" constraint need
> to be enforced to prevent overflow? In other words, does your
> optimized routine manage to be faster because you compute everything
> in, say 16bits precision when the C code would do everything in 32
> bits (that's just an example)?

No, the optimization is in skipping some calculations.  If only the DC
coefficient is non-zero we know that all the output values will be the
same, and there is no need to multiply a bunch of zeros to find out.
Similarly, if the last four coefficients are zero we can skip the
calculations involving them.  These cases are frequent enough that
checking for them pays off.

> I'm just curious, because I don't know anything about ARM.

The instructions I've used most in the IDCT compute a += b*c +- d*e,
where b, c, d and e are 16-bit values stored pair-wise in 32-bit
registers, and a is a 32-bit result.  There are also instructions for
2*16-bit and 4*8-bit parallel add, substract and SAD.

M?ns Rullg?rd
mru at inprovide.com

More information about the ffmpeg-cvslog mailing list