[FFmpeg-devel] [PATCH] Fix mm_flags, mm_support for ARM

Måns Rullgård mans
Sat Jun 28 13:09:30 CEST 2008


Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> On Saturday 28 June 2008, M?ns Rullg?rd wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>> > Do we have someone who has a arm cpu and can look into the above issue?
>>
>> I know exactly why it's different.  In simple_idct.c, the column
>> transform contains these lines:
>>
>>         /* XXX: I did that only to give same values as previous code */
>>         a0 = W4 * (col[8*0] + ((1<<(COL_SHIFT-1))/W4));
>>
>> It's simpler to code that as a0 = W4 * col[0] + (1 << (COL_SHIFT-1)).
>> Thinking about it, it only takes one more instruction on NEON, and
>> I've fixed that in my tree.  With a little luck, the extra instruction
>> can be dual-issued with something else.
>
> This part does not have any extra overhead in my finetuned version 
> of ARMv5TE IDCT:
>
>   ldr    v1, xxx         /* v1 = (((1<<(COL_SHIFT-1))/W4)*W4) */
>   [some unrelated instructions to hide load latency]
>   smlatt v2, a2, v4, v1  /* A0t = W4 * (col_t[0] + ((1<<(COL_SHIFT-1))/W4)) */
>
> There is no reason why ARMv6 or NEON should have overhead too. So getting
> bit-identical results to C simple_idct is possible without sacrificing 
> performance. 

((1<<(COL_SHIFT-1))/W4)*W4 doesn't fit in 16 bits, so that method
can't easily be used when everything else is in 16-bit vectors.

Is your armv5te idct a total rewrite of what's in svn, or can the
changes be broken into sensible steps?  If the latter, please send a
patch series.

>> > Ideally would be the authors who claimed the code to be identical to the
>> > C code ...
>>
>> I wrote the ARMv6 version, but I never made any such claim.  In fact,
>> I believe I mentioned at the time that there was a slight difference.
>>
>> > If we have noone then we will likely have to disable these IDCTs. I do
>> > not want to create files that turn green and pink unless they are played
>> > on an ARM cpu ...
>>
>> I don't think the ARM CPUs where these apply will be used mostly for
>> playback, not encoding, and on those machines every cycle counts.
>
> Yes, that was one of the reasons why I did not strongly insist on disabling
> j_rev_dct_ARM that time (people could get a severe performance regressions 
> and complain about it) :)
>
> In any case, ARMv6 idct still needs heavy optimizations, it is not very fast
> (on its target devices with ARM11 CPUs of course).

Well, it's considerably faster than the C IDCT, but I'm not denying it
could be improved.  Are you talking about sparse data handling, or
something else?  Did you have any patches?

Remind me again what ARM devices you have.  I recently got my hands on
a Cortex-A8 (TI OMAP3530), and it's more fun to play around with than
the Nokia tablets.

-- 
M?ns Rullg?rd
mans at mansr.com




More information about the ffmpeg-devel mailing list