[FFmpeg-devel] [PATCH v2 1/2] checkasm/rv40dsp: cover more cases

Ronald S. Bultje rsbultje at gmail.com
Thu Dec 5 14:38:21 EET 2024


Hi,

Christophe asked me to chime in.

On Wed, Dec 4, 2024 at 4:14 AM <uk7b-at-foxmail.com at ffmpeg.org> wrote:

> --- a/tests/checkasm/rv40dsp.c
> +++ b/tests/checkasm/rv40dsp.c
> @@ -27,7 +27,7 @@
>  #define randomize_buffers()                  \
>      do {                                     \
>          for (int i = 0; i < 16*18*2; i++)    \
> -            src[i] = rnd() & 0x3;            \
> +            src[i] = rnd() & 0xff;           \
>      } while (0)
>
>  static void check_chroma_mc(void)
>

This is correct.


> @@ -47,8 +47,8 @@ static void check_chroma_mc(void)
>  #define CHECK_CHROMA_MC(name)
>                          \
>          do {
>                         \
>              if (check_func(h.name## <http://h.name#%23>
> _pixels_tab[size], #name "_mc%d", 1 << (3 - size))) {         \
> -                for (int x = 0; x < 2; x++) {
>                          \
> -                    for (int y = 0; y < 2; y++) {
>                          \
> +                for (int x = 0; x < 8; x++) {
>                          \
> +                    for (int y = 0; y < 8; y++) {
>                          \
>                          memcpy(dst0, src, 16 * 18);
>                          \
>                          memcpy(dst1, src, 16 * 18);
>                          \
>                          call_ref(dst0, src, 16, 16, x, y);
>                         \
> --
> 2.47.1
>

This is theoretically correct, but it's very inefficient. If you look at
the x86 mc8 mmx, for example, you'll notice that there's 3 codepaths: one
for x==0&&y==0 (no subpel), one for 1D filtering (mx!=0 ^ my!=0) and one
for 2D filtering. It can also be implemented branchless (I believe that's
what mc4 does). But it's highly unusual to have special codepaths for mx=3
vs. mx=4, for example.

So the optimal way to test this is to keep the original loop, but add the
following:
for (int x = 0, mx = 0; x < 2; x++, mx = 1 + (rnd() % 7)) {
    for (int y = 0, my = 0; y < 2; y++, my = 1 + (rnd() % 7)) {
        [..]
        cal_ref(dst0, src, 16, 16, mx, my);
        [..]
    }
}

This limits the number of tests to 4, while still covering all cases over
multiple checkasm runs, which is the correct way to test this.

Ronald


More information about the ffmpeg-devel mailing list