[FFmpeg-devel] [PATCH 3/3] avfilter/vf_framerate: add SIMD functions for frame blending
Henrik Gramner
henrik at gramner.com
Sun Jan 14 12:31:46 EET 2018
On Sat, Jan 13, 2018 at 10:57 PM, Marton Balint <cus at passwd.hu> wrote:
> + .loop:
> + movu m0, [src1q + xq]
> + movu m1, [src2q + xq]
> + punpckl%1%2 m5, m0, m2 ; 0e0f0g0h
> + punpckh%1%2 m0, m2 ; 0a0b0c0d
> + punpckl%1%2 m6, m1, m2 ; 0E0F0G0H
> + punpckh%1%2 m1, m2 ; 0A0B0C0D
> + pmull%2 m0, m3
> + pmull%2 m5, m3
> + pmull%2 m1, m4
> + pmull%2 m6, m4
> + padd%2 m0, m7
> + padd%2 m5, m7
> + padd%2 m0, m1
> + padd%2 m5, m6
pmaddubsw should work here for the 8-bit case. pmaddwd might work for
the 16-bit case depending on how many bits are actually used.
> + pinsrw xm3, r8m, 0 ; factor1
> + pinsrw xm4, r9m, 0 ; factor2
> + pinsrw xm7, r10m, 0 ; half
> + SPLATW m3, xm3
> + SPLATW m4, xm4
> + SPLATW m7, xm7
vpbroadcast* from memory on avx2, otherwise movd instead of pxor+pinsrw.
> + pxor m3, m3
> + pxor m4, m4
> + pxor m7, m7
> + pinsrw xm3, r8m, 0 ; factor1
> + pinsrw xm4, r9m, 0 ; factor2
> + pinsrw xm7, r10m, 0 ; half
> + XSPLATD 3
> + XSPLATD 4
> + XSPLATD 7
Ditto.
> + neg word r11m ; shift = -shift
> + add word r11m, 16 ; shift += 16
> + pxor m2, m2
> + pinsrw xm2, r11m, 0 ; 16 - shift
> + pslld m3, xm2
> + pslld m4, xm2
> + pslld m7, xm2
You probably want to use a temporary register instead of doing slow
load-modify-store instructions.
Doing this in SIMD might be an option as well, e.g. load data directly
into vector regs from the stack, shift, then splat.
More information about the ffmpeg-devel
mailing list