[FFmpeg-devel] [RFC] DXVA2 decoding and FFmpeg

Michael Niedermayer michaelni at gmx.at
Tue Jun 16 14:03:22 CEST 2015


On Tue, Jun 16, 2015 at 10:35:52AM +0200, Stefano Sabatini wrote:
> On date Tuesday 2015-06-16 10:20:31 +0200, wm4 encoded:
> > On Mon, 15 Jun 2015 17:55:35 +0200
> > Stefano Sabatini <stefasab at gmail.com> wrote:
> > 
> > > On date Monday 2015-06-15 11:56:13 +0200, Stefano Sabatini encoded:
> > > [...]
> > > > From 3a75ef1e86360cd6f30b8e550307404d0d1c1dba Mon Sep 17 00:00:00 2001
> > > > From: Stefano Sabatini <stefasab at gmail.com>
> > > > Date: Mon, 15 Jun 2015 11:02:50 +0200
> > > > Subject: [PATCH] lavu/mem: add av_memcpynt() function with x86 optimizations
> > > > 
> > > > Assembly based on code from vlc dxva2.c, commit 62107e56 by Laurent Aimar
> > > > <fenrir at videolan.org>.
> > > > 
> > > > TODO: bump minor, update APIchanges
> > > > ---
> > > >  libavutil/mem.c          |  9 +++++
> > > >  libavutil/mem.h          | 14 ++++++++
> > > >  libavutil/mem_internal.h | 26 +++++++++++++++
> > > >  libavutil/x86/Makefile   |  1 +
> > > >  libavutil/x86/mem.c      | 85 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  5 files changed, 135 insertions(+)
> > > >  create mode 100644 libavutil/mem_internal.h
> > > >  create mode 100644 libavutil/x86/mem.c
> > > > 
> > > > diff --git a/libavutil/mem.c b/libavutil/mem.c
> > > > index da291fb..0e1eb01 100644
> > > > --- a/libavutil/mem.c
> > > > +++ b/libavutil/mem.c
> > > > @@ -42,6 +42,7 @@
> > > >  #include "dynarray.h"
> > > >  #include "intreadwrite.h"
> > > >  #include "mem.h"
> > > > +#include "mem_internal.h"
> > > >  
> > > >  #ifdef MALLOC_PREFIX
> > > >  
> > > > @@ -515,3 +516,11 @@ void av_fast_malloc(void *ptr, unsigned int *size, size_t min_size)
> > > >      ff_fast_malloc(ptr, size, min_size, 0);
> > > >  }
> > > >  
> > > > +void av_memcpynt(void *dst, const void *src, size_t size, int cpu_flags)
> > > > +{
> > > > +#if ARCH_X86
> > > > +    ff_memcpynt_x86(dst, src, size, cpu_flags);
> > > > +#else
> > > > +    memcpy(dst, src, size, cpu_flags);
> > > > +#endif
> > > > +}
> > > 
> > > Alternatively, what about something like:
> > > 
> > > av_memcpynt_fn av_memcpynt_get_fn(void);
> > > 
> > > modeled after av_pixelutils_get_sad_fn()? This would skip the need for
> > > a wrapper calling the right function.
> > 
> 
> > I don't see much value in this, unless determining the right function
> > causes too much overhead.
> 
> I see two advantages, 1. no branch and function call when the function
> is called, 2. the cpu_flags must not be passed around, so it's somehow
> safer.
> 
> I have no strong preference though, updated (untested patch) in
> attachment.
> -- 
> FFmpeg = Fierce and Forgiving Merciless Powered Extroverse Gargoyle

>  mem.c          |    9 +++++
>  mem.h          |   13 +++++++
>  mem_internal.h |   26 +++++++++++++++
>  x86/Makefile   |    1 
>  x86/mem.c      |   98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 147 insertions(+)
> f536b25834e0927b8cab5c996042aae697b8d773  0003-lavu-mem-add-av_memcpynt_get_fn.patch
> From c005ff5405dd48e6b0fed24ed94947f69bfe2783 Mon Sep 17 00:00:00 2001
> From: Stefano Sabatini <stefasab at gmail.com>
> Date: Mon, 15 Jun 2015 11:02:50 +0200
> Subject: [PATCH] lavu/mem: add av_memcpynt_get_fn()
> 
> Assembly based on code from vlc dxva2.c, commit 62107e56 by Laurent Aimar
> <fenrir at videolan.org>.
> 
> TODO: remove use of inline assembly, bump minor, update APIchanges
> ---
>  libavutil/mem.c          |  9 +++++
>  libavutil/mem.h          | 13 +++++++
>  libavutil/mem_internal.h | 26 +++++++++++++
>  libavutil/x86/Makefile   |  1 +
>  libavutil/x86/mem.c      | 98 ++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 147 insertions(+)
>  create mode 100644 libavutil/mem_internal.h
>  create mode 100644 libavutil/x86/mem.c
> 
> diff --git a/libavutil/mem.c b/libavutil/mem.c
> index da291fb..325bfc9 100644
> --- a/libavutil/mem.c
> +++ b/libavutil/mem.c
> @@ -42,6 +42,7 @@
>  #include "dynarray.h"
>  #include "intreadwrite.h"
>  #include "mem.h"
> +#include "mem_internal.h"
>  
>  #ifdef MALLOC_PREFIX
>  
> @@ -515,3 +516,11 @@ void av_fast_malloc(void *ptr, unsigned int *size, size_t min_size)
>      ff_fast_malloc(ptr, size, min_size, 0);
>  }
>  
> +av_memcpynt_fn av_memcpynt_get_fn(void)
> +{
> +#if ARCH_X86
> +    return ff_memcpynt_get_fn_x86();
> +#else
> +    return memcpy;
> +#endif
> +}
> diff --git a/libavutil/mem.h b/libavutil/mem.h
> index 2a1e36d..d9f1b7a 100644
> --- a/libavutil/mem.h
> +++ b/libavutil/mem.h
> @@ -382,6 +382,19 @@ void *av_fast_realloc(void *ptr, unsigned int *size, size_t min_size);
>   */
>  void av_fast_malloc(void *ptr, unsigned int *size, size_t min_size);
>  
> +typedef void* (*av_memcpynt_fn)(void *dst, const void *src, size_t size);
> +
> +/**
> + * Return possibly optimized function to copy size bytes from from src
> + * to dst, using non-temporal copy.
> + *
> + * The returned function works as memcpy, but adopts non-temporal
> + * instructios when available. This can lead to better performances
> + * when transferring data from source to destination is expensive, for
> + * example when reading from GPU memory.
> + */
> +av_memcpynt_fn av_memcpynt_get_fn(void);
> +
>  /**
>   * @}
>   */
> diff --git a/libavutil/mem_internal.h b/libavutil/mem_internal.h
> new file mode 100644
> index 0000000..de61cba
> --- /dev/null
> +++ b/libavutil/mem_internal.h
> @@ -0,0 +1,26 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef AVUTIL_MEM_INTERNAL_H
> +#define AVUTIL_MEM_INTERNAL_H
> +
> +#include "mem.h"
> +
> +av_memcpynt_fn ff_memcpynt_get_fn_x86(void);
> +
> +#endif /* AVUTIL_MEM_INTERNAL_H */
> diff --git a/libavutil/x86/Makefile b/libavutil/x86/Makefile
> index a719c00..171c351 100644
> --- a/libavutil/x86/Makefile
> +++ b/libavutil/x86/Makefile
> @@ -2,6 +2,7 @@ OBJS += x86/cpu.o                                                       \
>          x86/float_dsp_init.o                                            \
>          x86/imgutils.o                                                  \
>          x86/lls_init.o                                                  \
> +        x86/mem.o                                                       \
>  
>  OBJS-$(CONFIG_PIXELUTILS) += x86/pixelutils_init.o                      \
>  
> diff --git a/libavutil/x86/mem.c b/libavutil/x86/mem.c
> new file mode 100644
> index 0000000..6326c90
> --- /dev/null
> +++ b/libavutil/x86/mem.c
> @@ -0,0 +1,98 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include <inttypes.h>
> +#include "config.h"
> +#include "libavutil/avassert.h"
> +#include "libavutil/mem_internal.h"
> +
> +#if HAVE_SSE2
> +/* Copy 16/64 bytes from srcp to dstp loading data with the SSE>=2 instruction
> + * load and storing data with the SSE>=2 instruction store.
> + */
> +#define COPY16(dstp, srcp, load, store) \
> +    __asm__ volatile (                  \
> +        load "  0(%[src]), %%xmm1\n"    \
> +        store " %%xmm1,    0(%[dst])\n" \
> +        : : [dst]"r"(dstp), [src]"r"(srcp) : "memory", "xmm1")
> +
> +#define COPY64(dstp, srcp, load, store) \
> +    __asm__ volatile (                  \
> +        load "  0(%[src]), %%xmm1\n"    \
> +        load " 16(%[src]), %%xmm2\n"    \
> +        load " 32(%[src]), %%xmm3\n"    \
> +        load " 48(%[src]), %%xmm4\n"    \
> +        store " %%xmm1,    0(%[dst])\n" \
> +        store " %%xmm2,   16(%[dst])\n" \
> +        store " %%xmm3,   32(%[dst])\n" \
> +        store " %%xmm4,   48(%[dst])\n" \
> +        : : [dst]"r"(dstp), [src]"r"(srcp) : "memory", "xmm1", "xmm2", "xmm3", "xmm4")
> +#endif
> +
> +#define COPY_LINE(dstp, srcp, size, load)                               \
> +    const unsigned unaligned = (-(uintptr_t)srcp) & 0x0f;               \
> +    unsigned x = unaligned;                                             \
> +                                                                        \
> +    av_assert0(((intptr_t)dstp & 0x0f) == 0);                           \
> +                                                                        \
> +    __asm__ volatile ("mfence");                                        \
> +    if (!unaligned) {                                                   \
> +        for (; x+63 < size; x += 64)                                    \
> +            COPY64(&dstp[x], &srcp[x], load, "movdqa");                 \
> +    } else {                                                            \
> +        COPY16(dst, src, "movdqu", "movdqa");                           \
> +        for (; x+63 < size; x += 64)                                    \
> +            COPY64(&dstp[x], &srcp[x], load, "movdqu");                 \

to use SSE registers in inline asm operands or clobber list you need
to build with -msse (which probably is default on on x86-64)

files build with -msse will result in undefined behavior if anything
in them is executed on a pre SSE cpu, as these allow gcc to put
SSE instructions directly in the code where it likes

The way out of this "design" is not to tell gcc that it passes
a string with SSE code to the assembler
that is not to use SSE registers in operands and not to put them
on the clobber list unless gcc actually is in SSE mode and can use and
need them there.
see XMM_CLOBBERS*

you could also use yasm of course if you prefer

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Old school: Use the lowest level language in which you can solve the problem
            conveniently.
New school: Use the highest level language in which the latest supercomputer
            can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150616/feefc820/attachment.asc>


More information about the ffmpeg-devel mailing list