[FFmpeg-trac] #9149(swscale:new): bugs in Altivec code
FFmpeg
trac at avcodec.org
Fri Mar 12 04:55:52 EET 2021
#9149: bugs in Altivec code
-------------------------------------+-------------------------------------
Reporter: | Type: defect
ilyakurdyukov |
Status: new | Priority: normal
Component: swscale | Version:
| unspecified
Keywords: | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
Is the Altivec code still in use?
I am making a SIMD optimization patch for the Elbrus architecture (e2k).
It's easier to port from existing code with intrinsic functions, so I'm
rewriting the Altivec code to Elbrus.
When doing this, I noticed three places in swscale that are definitely
wrong in Altivec/VSX (compared to the C reference code). I can't check
that for sure because I don't have Altivec hardware.
I check swscale with:
{{{
$ libswscale/tests/swscale -cpuflags 0 2> /dev/null > ../swscale_ref.log
$ libswscale/tests/swscale 2> /dev/null > ../swscale.log
$ diff -u swscale_ref.log swscale.log > swscale.diff
}}}
It's pretty slow. After completing the tests, "swscale.diff" shows the
problems.
Some yuv2rgb conversions are more accurate, it's okay. But check the lines
with big SSD values, this is what is not working correctly.
== libswscale/ppc/swscale_vsx.c:2110
Here's should be added "dstFormat != AV_PIX_FMT_P010LE && dstFormat !=
AV_PIX_FMT_P010BE", because this code doesn't handle these formats.
{{{
if (!(c->flags & (SWS_BITEXACT | SWS_FULL_CHR_H_INT)) &&
!c->needAlpha) {
switch (c->dstBpc) {
case 8:
c->yuv2plane1 = yuv2plane1_8_vsx;
break;
}}}
== libswscale/ppc/swscale_vsx.c:1060
In yuv2rgb_full_1_vsx_template() there is a code:
{{{
const vec_s16 mul8 = vec_splat_s16(8);
...
vu32_l = vec_mule(vu, mul8);
vu32_r = vec_mulo(vu, mul8);
vv32_l = vec_mule(vv, mul8);
vv32_r = vec_mulo(vv, mul8);
}}}
But should be:
{{{
const vec_s16 mul2 = vec_splat_s16(2);
...
tmp32 = vec_mule(vu, mul2);
tmp32_2 = vec_mulo(vu, mul2);
vu32_l = vec_mergeh(tmp32, tmp32_2);
vu32_r = vec_mergel(tmp32, tmp32_2);
tmp32 = vec_mule(vv, mul2);
tmp32_2 = vec_mulo(vv, mul2);
vv32_l = vec_mergeh(tmp32, tmp32_2);
vv32_r = vec_mergel(tmp32, tmp32_2);
}}}
Actually, this code may be written a lot better, by multiplying coef
multipliers before the for loop. And using vec_unpack instead of
vec_mul/vec_merge.
== libswscale/ppc/swscale_vsx.c:1513
In yuv2422_2_vsx_template() there is a wrong macro SETUP, that multiplies
by (4096 - alpha) and (4096 - alpha), should be (4096 - alpha) and
(alpha).
You can see yuv2rgb_full_2_vsx_template() for the correct code.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/9149>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list