[FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests

Tue Mar 29 23:22:48 EEST 2022

On 29/03/2022 14:13, Martin Storsjö wrote:
> On Fri, 25 Mar 2022, Ben Avison wrote:
> 
>> Disable ff_add_pixels_clamped_arm, which was found to fail the test. 
> 
> I had a look at this function, and I see that the overflow checks are using
> 
>          tst             r6,  #0x100
> 
> to see whether the addition overflowed (either above or below). However, 
> if block[] was e.g. 0x200, it's possible to overflow without setting 
> this bit at all.

Yes, thinking about it, that test is only valid if the signed 16-bit 
value from block[] lies in the range -0x100..+0x100 inclusive, otherwise 
there exists at least one unsigned 8-bit value which should have clamped 
but won't.

> Secondly, the clamping seems to be done with
> 
>          movne           r6,  r5,  lsr #24
> 
> However that should use asr, not lsr, I think, to get proper clamping in 
> both ends?

r5 is the NOTted version, so all that's doing is selecting 0x000000FF if 
there was positive overflow, and 0x00000000 if there was negative 
overflow. Given that bit 8 and above need to be zero to facilitate 
repacking the 8-bit samples, that's the right thing to do.

> Thirdly - the added test also occasionally fails for the other existing 
> functions (armv6, neon) and the newly added aarch64 neon version. If you 
> have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition 
> will overflow, as there's no operation that both widens and clamps at 
> the same time.

So it does. I obviously just didn't hit those cases in my test runs!

I can't easily test all codecs that use this function, but I just tried 
instrumenting the VC-1 case and it doesn't appear to actually use this 
particular function, so I'm none the wiser!

Should I just limit the 16-bit values to +/-0x100 and re-enable the 
armv4 fast path then?

Ben