[FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests
Ben Avison
bavison at riscosopen.org
Tue Mar 29 23:22:48 EEST 2022
On 29/03/2022 14:13, Martin Storsjö wrote:
> On Fri, 25 Mar 2022, Ben Avison wrote:
>
>> Disable ff_add_pixels_clamped_arm, which was found to fail the test.
>
> I had a look at this function, and I see that the overflow checks are using
>
> tst r6, #0x100
>
> to see whether the addition overflowed (either above or below). However,
> if block[] was e.g. 0x200, it's possible to overflow without setting
> this bit at all.
Yes, thinking about it, that test is only valid if the signed 16-bit
value from block[] lies in the range -0x100..+0x100 inclusive, otherwise
there exists at least one unsigned 8-bit value which should have clamped
but won't.
> Secondly, the clamping seems to be done with
>
> movne r6, r5, lsr #24
>
> However that should use asr, not lsr, I think, to get proper clamping in
> both ends?
r5 is the NOTted version, so all that's doing is selecting 0x000000FF if
there was positive overflow, and 0x00000000 if there was negative
overflow. Given that bit 8 and above need to be zero to facilitate
repacking the 8-bit samples, that's the right thing to do.
> Thirdly - the added test also occasionally fails for the other existing
> functions (armv6, neon) and the newly added aarch64 neon version. If you
> have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition
> will overflow, as there's no operation that both widens and clamps at
> the same time.
So it does. I obviously just didn't hit those cases in my test runs!
I can't easily test all codecs that use this function, but I just tried
instrumenting the VC-1 case and it doesn't appear to actually use this
particular function, so I'm none the wiser!
Should I just limit the 16-bit values to +/-0x100 and re-enable the
armv4 fast path then?
Ben
More information about the ffmpeg-devel
mailing list