[FFmpeg-trac] #9149(swscale:new): bugs in Altivec code

Fri Mar 12 04:55:52 EET 2021

#9149: bugs in Altivec code
-------------------------------------+-------------------------------------
             Reporter:               |                     Type:  defect
  ilyakurdyukov                      |
               Status:  new          |                 Priority:  normal
            Component:  swscale      |                  Version:
                                     |  unspecified
             Keywords:               |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
 Is the Altivec code still in use?

 I am making a SIMD optimization patch for the Elbrus architecture (e2k).
 It's easier to port from existing code with intrinsic functions, so I'm
 rewriting the Altivec code to Elbrus.

 When doing this, I noticed three places in swscale that are definitely
 wrong in Altivec/VSX (compared to the C reference code). I can't check
 that for sure because I don't have Altivec hardware.

 I check swscale with:

 {{{
 $ libswscale/tests/swscale -cpuflags 0 2> /dev/null > ../swscale_ref.log
 $ libswscale/tests/swscale 2> /dev/null > ../swscale.log
 $ diff -u swscale_ref.log swscale.log > swscale.diff
 }}}

 It's pretty slow. After completing the tests, "swscale.diff" shows the
 problems.
 Some yuv2rgb conversions are more accurate, it's okay. But check the lines
 with big SSD values, this is what is not working correctly.

 == libswscale/ppc/swscale_vsx.c:2110

 Here's should be added "dstFormat != AV_PIX_FMT_P010LE && dstFormat !=
 AV_PIX_FMT_P010BE", because this code doesn't handle these formats.

 {{{
     if (!(c->flags & (SWS_BITEXACT | SWS_FULL_CHR_H_INT)) &&
 !c->needAlpha) {
         switch (c->dstBpc) {
         case 8:
             c->yuv2plane1 = yuv2plane1_8_vsx;
             break;
 }}}

 == libswscale/ppc/swscale_vsx.c:1060

 In yuv2rgb_full_1_vsx_template() there is a code:

 {{{
     const vec_s16 mul8 = vec_splat_s16(8);
 ...
             vu32_l = vec_mule(vu, mul8);
             vu32_r = vec_mulo(vu, mul8);
             vv32_l = vec_mule(vv, mul8);
             vv32_r = vec_mulo(vv, mul8);
 }}}

 But should be:

 {{{
     const vec_s16 mul2 = vec_splat_s16(2);
 ...
             tmp32 = vec_mule(vu, mul2);
             tmp32_2 = vec_mulo(vu, mul2);
             vu32_l = vec_mergeh(tmp32, tmp32_2);
             vu32_r = vec_mergel(tmp32, tmp32_2);
             tmp32 = vec_mule(vv, mul2);
             tmp32_2 = vec_mulo(vv, mul2);
             vv32_l = vec_mergeh(tmp32, tmp32_2);
             vv32_r = vec_mergel(tmp32, tmp32_2);
 }}}

 Actually, this code may be written a lot better, by multiplying coef
 multipliers before the for loop. And using vec_unpack instead of
 vec_mul/vec_merge.

 == libswscale/ppc/swscale_vsx.c:1513

 In yuv2422_2_vsx_template() there is a wrong macro SETUP, that multiplies
 by (4096 - alpha) and (4096 - alpha), should be (4096 - alpha) and
 (alpha).

 You can see yuv2rgb_full_2_vsx_template() for the correct code.

--
Ticket URL: <https://trac.ffmpeg.org/ticket/9149>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker