[FFmpeg-devel] [PATCH] WIP AVX support

Måns Rullgård mans
Thu Feb 3 11:20:45 CET 2011


Jason Garrett-Glaser <jason at x264.com> writes:

> Need someone who does configure/etc stuff to finish this.
> ---
>  configure           |    2 ++
>  libavutil/cpu.h     |    1 +
>  libavutil/x86/cpu.c |   15 +++++++++++++++
>  3 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/configure b/configure
> index 46f4e44..c6ceaa8 100755
> --- a/configure
> +++ b/configure
> @@ -223,6 +223,7 @@ Advanced options (experts only):
>    --disable-mmx2           disable MMX2 optimizations
>    --disable-sse            disable SSE optimizations
>    --disable-ssse3          disable SSSE3 optimizations
> +  --disable-avx            disable AVX optimizations
>    --disable-armv5te        disable armv5te optimizations
>    --disable-armv6          disable armv6 optimizations
>    --disable-armv6t2        disable armv6t2 optimizations
> @@ -1181,6 +1182,7 @@ mmx_deps="x86"
>  mmx2_deps="mmx"
>  sse_deps="mmx"
>  ssse3_deps="sse"
> +avx_deps="ssse3"
>
>  aligned_stack_if_any="ppc x86"
>  fast_64bit_if_any="alpha ia64 mips64 parisc64 ppc64 sparc64 x86_64"

I suppose we'll need to test for AVX support in gcc/yasm.  Do
something similar to the ssse3 test.

> diff --git a/libavutil/cpu.h b/libavutil/cpu.h
> index 71cc265..2ddde26 100644
> --- a/libavutil/cpu.h
> +++ b/libavutil/cpu.h
> @@ -36,6 +36,7 @@
>  #define AV_CPU_FLAG_SSSE3        0x0080 ///< Conroe SSSE3 functions
>  #define AV_CPU_FLAG_SSE4         0x0100 ///< Penryn SSE4.1 functions
>  #define AV_CPU_FLAG_SSE42        0x0200 ///< Nehalem SSE4.2 functions
> +#define AV_CPU_FLAG_AVX          0x0400 ///< Sandy Bridge AVX functions
>  #define AV_CPU_FLAG_IWMMXT       0x0100 ///< XScale IWMMXT
>  #define AV_CPU_FLAG_ALTIVEC      0x0001 ///< standard

BTW, shouldn't that be s/functions/instructions/?

> diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c
> index 4b6cb0d..a34e18f 100644
> --- a/libavutil/x86/cpu.c
> +++ b/libavutil/x86/cpu.c
> @@ -35,6 +35,12 @@
>             "=c" (ecx), "=d" (edx)\
>           : "0" (index));
>
> +#define xgetbv(index,eax,ebx,ecx,edx)\
> +    __asm__ volatile\
> +        ("xgetbv\n\t"\
> +         : "=a" (eax), (edx)\
> +         : "c" (index));

This should be something like

  __asm__ ("xgetbv" : "=a"(eax), "=d"(edx) : "c"(index));

The volatile should not be needed since all effects of the block are
expressed in the constraints.  You should also make the macro
arguments match how you're calling it below.

>  /* Function to test if multimedia instructions are supported...  */
>  int ff_get_cpu_flags_x86(void)
>  {
> @@ -95,6 +101,15 @@ int ff_get_cpu_flags_x86(void)
>              rval |= AV_CPU_FLAG_SSE42;
>  #endif
>                    ;
> +#if HAVE_AVX
> +        /* Check OXSAVE and AVX bits */
> +        if ((ecx&0x18000000) == 0x18000000) {
> +            /* Check for OS support */
> +            xgetbv(0, eax, edx);
> +            if ((eax&0x6) == 0x6)
> +                cpu |= AV_CPU_FLAG_AVX;
> +        }
> +#endif
>      }

I can't comment on the correctness of this.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list