[FFmpeg-devel] [PATCH] Fix mm_flags, mm_support for ARM

Måns Rullgård mans
Sat Jun 28 03:02:48 CEST 2008


Michael Niedermayer <michaelni at gmx.at> writes:

> On Fri, Jun 27, 2008 at 08:49:08PM +0100, M?ns Rullg?rd wrote:
>> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
>> 
>> > On Friday 27 June 2008, matthieu castet wrote:
>> >> Benoit Fouet wrote:
>> >> > Michael Niedermayer wrote:
>> >> >> On Thu, Jun 26, 2008 at 10:45:20PM +0200, matthieu castet wrote:
>> >> >
>> >> > Matthieu, could you please resend the complete patch ?
>> >>
>> >> I attach them as 2 separate patch.
>> >> fix_dct-test-arm.diff : allow to build dct-test on arm
>> >> arm-icdt-test.diff : add some arm idct (there are other version that
>> >> could be added, but I have no hardware to test).
>> >
>> >> +#ifdef ARCH_ARMV4L
>> >> + ?//{"IDCT_ARM", ? ? ? ?1, j_rev_dct_ARM, ? ? ?idct, MMX_PERM},
>> >> /* should > be MMX_PERM perm, but produce strange result */ 
>> >
>> > Yes, 'j_rev_dct_ARM' is broken, it produces quite noticeable
>> > artefacts on decoding video. I think I already reported this issue
>> > here long ago. But as it is armv4 only, I was not interested in
>> > investigating this problem further...
>> 
>> It is indeed in pretty poor shape.  It's pretty fast though, if that's
>> any excuse.
>> 
>> For completeness, here's the output of dct-test -i including all the
>> ARM IDCT variants on a 500MHz Cortex-A8.  It is interesting to note
>> that the ones targeting older ARM cores are significantly slower than
>> plain C on the Cortex.
>
> dct-test is not a good benchmark, for serious benchmarking real data
> should be used. Many idcts contain checks for zero elements ...
> You can also see this difference between dct-test -i 0 and -i 1

Yes, I know all that.  Nevertheless, dct-test is quicker as a simple
correctness test.

>>    -7    15    -3     8    16    19     5    19 
>>   -23     1   -15    10    26     4   -12    -7 
>>     2   -22    -5    -5    -4    10     4    12 
>>     4    -4   -15    -5    -2    -5     8    -6 
>>    10    17    -7    11     7    11    -1    -2 
>>   -17     1   -25    -6     1    -2    19    12 
>>    -3     7     3    21     3    -8    12   -24 
>>    21    -4    13     1    -5    10     8    -5 
>> IDCT SIMPLE-C: err_inf=1 err2=0.00667969 syserr=0.00130000 maxout=266 blockSumErr=64
>> IDCT SIMPLE-C: 452.7 kdct/s
>> 
>>    -6    15    -2     9    16    20     5    21 
>>   -23     1   -14    10    26     4   -11    -7 
>>     2   -21    -5    -5    -4    10     4    12 
>>     5    -4   -14    -2    -2    -4     9    -5 
>>    10    17    -7    12     7    11    -1    -2 
>>   -17     2   -23    -6     2    -2    19    12 
>>    -2     8     5    22     3    -8    12   -24 
>>    23    -2    13     1    -5    11     8    -3 
>> IDCT SIMPLE-ARM: err_inf=1 err2=0.00667500 syserr=0.00130000 maxout=266 blockSumErr=64
>> IDCT SIMPLE-ARM: 369.6 kdct/s
> [...]
>>    -6    15    -2     9    16    20     5    21 
>>   -23     1   -14    10    26     4   -11    -7 
>>     2   -21    -5    -5    -4    10     4    12 
>>     5    -4   -14    -2    -2    -4     9    -5 
>>    10    17    -7    12     7    11    -1    -2 
>>   -17     2   -23    -6     2    -2    19    12 
>>    -2     8     5    22     3    -8    12   -24 
>>    23    -2    13     1    -5    11     8    -3 
>> IDCT SIMPLE-ARMV6: err_inf=1 err2=0.00667500 syserr=0.00130000 maxout=266 blockSumErr=64
>> IDCT SIMPLE-ARMV6: 512.4 kdct/s
>> 
>>    -6    15    -2     9    16    20     5    21 
>>   -23     1   -14    10    26     4   -11    -7 
>>     2   -21    -5    -5    -4    10     4    12 
>>     5    -4   -14    -2    -2    -4     9    -5 
>>    10    17    -7    12     7    11    -1    -2 
>>   -17     2   -23    -6     2    -2    19    12 
>>    -2     8     5    22     3    -8    12   -24 
>>    23    -2    13     1    -5    11     8    -3 
>> IDCT SIMPLE-NEON: err_inf=1 err2=0.00667500 syserr=0.00130000 maxout=266 blockSumErr=64
>> IDCT SIMPLE-NEON: 783.9 kdct/s
>
> These IDCTs are all buggy, they differ from the C simple idct.
> As has been discussed on the ML already i belive it was 
> (skal, me and alexander).
> We should try hard to avoid introducing a new IDCT with different "error
> landscape".

The difference is much smaller than some of the other IDCTs, but yes,
I agree with the general principle.  That said, the "wrong" version
actually is closer to the reference...

> Do we have someone who has a arm cpu and can look into the above issue?

I know exactly why it's different.  In simple_idct.c, the column
transform contains these lines:

        /* XXX: I did that only to give same values as previous code */
        a0 = W4 * (col[8*0] + ((1<<(COL_SHIFT-1))/W4));

It's simpler to code that as a0 = W4 * col[0] + (1 << (COL_SHIFT-1)).
Thinking about it, it only takes one more instruction on NEON, and
I've fixed that in my tree.  With a little luck, the extra instruction
can be dual-issued with something else.

> Ideally would be the authors who claimed the code to be identical to the
> C code ...

I wrote the ARMv6 version, but I never made any such claim.  In fact,
I believe I mentioned at the time that there was a slight difference.

> If we have noone then we will likely have to disable these IDCTs. I do not
> want to create files that turn green and pink unless they are played on
> an ARM cpu ...

I don't think the ARM CPUs where these apply will be used mostly for
playback, not encoding, and on those machines every cycle counts.

-- 
M?ns Rullg?rd
mans at mansr.com




More information about the ffmpeg-devel mailing list