[FFmpeg-devel] [PATCH 1/2] x86/vf_blend: add sse and ssse3 extremity functions

Ivan Kalvachev ikalvachev at gmail.com
Wed Jun 28 02:19:37 EEST 2017


On 6/27/17, James Almer <jamrial at gmail.com> wrote:
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
>  libavfilter/x86/vf_blend.asm    | 25 +++++++++++++++++++++++++
>  libavfilter/x86/vf_blend_init.c |  4 ++++
>  tests/checkasm/vf_blend.c       |  1 +
>  3 files changed, 30 insertions(+)
>
> diff --git a/libavfilter/x86/vf_blend.asm b/libavfilter/x86/vf_blend.asm
> index 33b1ad1496..25f6f5affc 100644
> --- a/libavfilter/x86/vf_blend.asm
> +++ b/libavfilter/x86/vf_blend.asm
> @@ -286,6 +286,31 @@ BLEND_INIT difference, 3
>      jl .loop
>  BLEND_END
>
> +BLEND_INIT extremity, 8
> +    pxor       m2, m2
> +    mova       m4, [pw_255]
> +.nextrow:
> +    mov        xq, widthq
> +
> +    .loop:
> +        movu            m0, [topq + xq]
> +        movu            m1, [bottomq + xq]
> +        punpckhbw       m5, m0, m2
> +        punpcklbw       m0, m2
> +        punpckhbw       m6, m1, m2
> +        punpcklbw       m1, m2
> +        psubw           m3, m4, m0
> +        psubw           m7, m4, m5
> +        psubw           m3, m1
> +        psubw           m7, m6
> +        ABS1            m3, m1
> +        ABS1            m7, m6

Minor nitpick.

There exists ABS2 that takes 4 parameters and that does
two interleaved ABS1 , that are (hopefully) faster on sse2.
It should generate exactly the same code on ssse3.


More information about the ffmpeg-devel mailing list