[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.
Alan Kelly
alankelly at google.com
Thu Jan 14 10:28:28 EET 2021
Apologies for this: when I added mmx to the yasm file, I added a macro for
the stores selecting mova for mmx and movdqu for the others. if
cpuflag(mmx) evaluates to true for all architectures so I replaced it with
if notcpuflag(sse3).
The alignment in the checkasm test has been changed to 8 from 32 so that
the test catches problems with alignment.
On Thu, Jan 14, 2021 at 1:11 AM Michael Niedermayer <michael at niedermayer.cc>
wrote:
> On Mon, Jan 11, 2021 at 05:46:31PM +0100, Alan Kelly wrote:
> > ---
> > Fixes a bug where if there is no offset and a tail which is not
> processed by the
> > sse3/avx2 version the dither is modified
> > Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it
> > to yuv2yuvX.asm to reduce code duplication and so that it may be used
> > to process the tail from the larger cardinal simd versions.
> > src argument of yuv2yuvX_* is now srcOffset, so that tails and offsets
> > are accounted for correctly.
> > Changes input size in checkasm so that this corner case is tested.
> >
> > libswscale/x86/Makefile | 1 +
> > libswscale/x86/swscale.c | 130 ++++++++++++----------------
> > libswscale/x86/swscale_template.c | 82 ------------------
> > libswscale/x86/yuv2yuvX.asm | 136 ++++++++++++++++++++++++++++++
> > tests/checkasm/sw_scale.c | 100 ++++++++++++++++++++++
> > 5 files changed, 291 insertions(+), 158 deletions(-)
> > create mode 100644 libswscale/x86/yuv2yuvX.asm
>
> This seems to be crashing again unless i messed up testing
>
> (gdb) disassemble $rip-32,$rip+32
> Dump of assembler code from 0x555555572f02 to 0x555555572f42:
> 0x0000555555572f02 <ff_yuv2yuvX_avx2+162>: int $0x71
> 0x0000555555572f04 <ff_yuv2yuvX_avx2+164>: out %al,$0x3
> 0x0000555555572f06 <ff_yuv2yuvX_avx2+166>: vpsraw $0x3,%ymm1,%ymm1
> 0x0000555555572f0b <ff_yuv2yuvX_avx2+171>: vpackuswb %ymm4,%ymm3,%ymm3
> 0x0000555555572f0f <ff_yuv2yuvX_avx2+175>: vpackuswb %ymm1,%ymm6,%ymm6
> 0x0000555555572f13 <ff_yuv2yuvX_avx2+179>: mov (%rdi),%rdx
> 0x0000555555572f16 <ff_yuv2yuvX_avx2+182>: vpermq $0xd8,%ymm3,%ymm3
> 0x0000555555572f1c <ff_yuv2yuvX_avx2+188>: vpermq $0xd8,%ymm6,%ymm6
> => 0x0000555555572f22 <ff_yuv2yuvX_avx2+194>: vmovdqa %ymm3,(%rcx,%rax,1)
> 0x0000555555572f27 <ff_yuv2yuvX_avx2+199>: vmovdqa
> %ymm6,0x20(%rcx,%rax,1)
> 0x0000555555572f2d <ff_yuv2yuvX_avx2+205>: add $0x40,%rax
> 0x0000555555572f31 <ff_yuv2yuvX_avx2+209>: mov %rdi,%rsi
> 0x0000555555572f34 <ff_yuv2yuvX_avx2+212>: cmp %r8,%rax
> 0x0000555555572f37 <ff_yuv2yuvX_avx2+215>: jb 0x555555572eae
> <ff_yuv2yuvX_avx2+78>
> 0x0000555555572f3d <ff_yuv2yuvX_avx2+221>: vzeroupper
> 0x0000555555572f40 <ff_yuv2yuvX_avx2+224>: retq
> 0x0000555555572f41 <ff_yuv2yuvX_avx2+225>: nopw %cs:0x0(%rax,%rax,1)
>
> rax 0x0 0
> rbx 0x30 48
> rcx 0x55555583f470 93824995292272
> rdx 0x55555585e500 93824995419392
>
> #0 0x0000555555572f22 in ff_yuv2yuvX_avx2 ()
> #1 0x00005555555724ee in yuv2yuvX_avx2 ()
> #2 0x000055555556b4f6 in chr_planar_vscale ()
> #3 0x0000555555566d41 in swscale ()
> #4 0x0000555555568284 in sws_scale ()
>
>
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> What does censorship reveal? It reveals fear. -- Julian Assange
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list