[FFmpeg-devel] [PATCH] Fix mm_flags, mm_support for ARM
Sat Jun 28 09:31:42 CEST 2008
On Saturday 28 June 2008, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> > Do we have someone who has a arm cpu and can look into the above issue?
> I know exactly why it's different. In simple_idct.c, the column
> transform contains these lines:
> /* XXX: I did that only to give same values as previous code */
> a0 = W4 * (col[8*0] + ((1<<(COL_SHIFT-1))/W4));
> It's simpler to code that as a0 = W4 * col + (1 << (COL_SHIFT-1)).
> Thinking about it, it only takes one more instruction on NEON, and
> I've fixed that in my tree. With a little luck, the extra instruction
> can be dual-issued with something else.
This part does not have any extra overhead in my finetuned version
of ARMv5TE IDCT:
ldr v1, xxx /* v1 = (((1<<(COL_SHIFT-1))/W4)*W4) */
[some unrelated instructions to hide load latency]
smlatt v2, a2, v4, v1 /* A0t = W4 * (col_t + ((1<<(COL_SHIFT-1))/W4)) */
There is no reason why ARMv6 or NEON should have overhead too. So getting
bit-identical results to C simple_idct is possible without sacrificing
> > Ideally would be the authors who claimed the code to be identical to the
> > C code ...
> I wrote the ARMv6 version, but I never made any such claim. In fact,
> I believe I mentioned at the time that there was a slight difference.
> > If we have noone then we will likely have to disable these IDCTs. I do
> > not want to create files that turn green and pink unless they are played
> > on an ARM cpu ...
> I don't think the ARM CPUs where these apply will be used mostly for
> playback, not encoding, and on those machines every cycle counts.
Yes, that was one of the reasons why I did not strongly insist on disabling
j_rev_dct_ARM that time (people could get a severe performance regressions
and complain about it) :)
In any case, ARMv6 idct still needs heavy optimizations, it is not very fast
(on its target devices with ARM11 CPUs of course).
More information about the ffmpeg-devel