[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.
Michael Niedermayer
michael at niedermayer.cc
Thu Jan 14 02:11:44 EET 2021
On Mon, Jan 11, 2021 at 05:46:31PM +0100, Alan Kelly wrote:
> ---
> Fixes a bug where if there is no offset and a tail which is not processed by the
> sse3/avx2 version the dither is modified
> Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it
> to yuv2yuvX.asm to reduce code duplication and so that it may be used
> to process the tail from the larger cardinal simd versions.
> src argument of yuv2yuvX_* is now srcOffset, so that tails and offsets
> are accounted for correctly.
> Changes input size in checkasm so that this corner case is tested.
>
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c | 130 ++++++++++++----------------
> libswscale/x86/swscale_template.c | 82 ------------------
> libswscale/x86/yuv2yuvX.asm | 136 ++++++++++++++++++++++++++++++
> tests/checkasm/sw_scale.c | 100 ++++++++++++++++++++++
> 5 files changed, 291 insertions(+), 158 deletions(-)
> create mode 100644 libswscale/x86/yuv2yuvX.asm
This seems to be crashing again unless i messed up testing
(gdb) disassemble $rip-32,$rip+32
Dump of assembler code from 0x555555572f02 to 0x555555572f42:
0x0000555555572f02 <ff_yuv2yuvX_avx2+162>: int $0x71
0x0000555555572f04 <ff_yuv2yuvX_avx2+164>: out %al,$0x3
0x0000555555572f06 <ff_yuv2yuvX_avx2+166>: vpsraw $0x3,%ymm1,%ymm1
0x0000555555572f0b <ff_yuv2yuvX_avx2+171>: vpackuswb %ymm4,%ymm3,%ymm3
0x0000555555572f0f <ff_yuv2yuvX_avx2+175>: vpackuswb %ymm1,%ymm6,%ymm6
0x0000555555572f13 <ff_yuv2yuvX_avx2+179>: mov (%rdi),%rdx
0x0000555555572f16 <ff_yuv2yuvX_avx2+182>: vpermq $0xd8,%ymm3,%ymm3
0x0000555555572f1c <ff_yuv2yuvX_avx2+188>: vpermq $0xd8,%ymm6,%ymm6
=> 0x0000555555572f22 <ff_yuv2yuvX_avx2+194>: vmovdqa %ymm3,(%rcx,%rax,1)
0x0000555555572f27 <ff_yuv2yuvX_avx2+199>: vmovdqa %ymm6,0x20(%rcx,%rax,1)
0x0000555555572f2d <ff_yuv2yuvX_avx2+205>: add $0x40,%rax
0x0000555555572f31 <ff_yuv2yuvX_avx2+209>: mov %rdi,%rsi
0x0000555555572f34 <ff_yuv2yuvX_avx2+212>: cmp %r8,%rax
0x0000555555572f37 <ff_yuv2yuvX_avx2+215>: jb 0x555555572eae <ff_yuv2yuvX_avx2+78>
0x0000555555572f3d <ff_yuv2yuvX_avx2+221>: vzeroupper
0x0000555555572f40 <ff_yuv2yuvX_avx2+224>: retq
0x0000555555572f41 <ff_yuv2yuvX_avx2+225>: nopw %cs:0x0(%rax,%rax,1)
rax 0x0 0
rbx 0x30 48
rcx 0x55555583f470 93824995292272
rdx 0x55555585e500 93824995419392
#0 0x0000555555572f22 in ff_yuv2yuvX_avx2 ()
#1 0x00005555555724ee in yuv2yuvX_avx2 ()
#2 0x000055555556b4f6 in chr_planar_vscale ()
#3 0x0000555555566d41 in swscale ()
#4 0x0000555555568284 in sws_scale ()
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
What does censorship reveal? It reveals fear. -- Julian Assange
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20210114/f2684d65/attachment.sig>
More information about the ffmpeg-devel
mailing list