[FFmpeg-devel] [PATCH] WIP AVX support
Måns Rullgård
mans
Thu Feb 3 11:20:45 CET 2011
Jason Garrett-Glaser <jason at x264.com> writes:
> Need someone who does configure/etc stuff to finish this.
> ---
> configure | 2 ++
> libavutil/cpu.h | 1 +
> libavutil/x86/cpu.c | 15 +++++++++++++++
> 3 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/configure b/configure
> index 46f4e44..c6ceaa8 100755
> --- a/configure
> +++ b/configure
> @@ -223,6 +223,7 @@ Advanced options (experts only):
> --disable-mmx2 disable MMX2 optimizations
> --disable-sse disable SSE optimizations
> --disable-ssse3 disable SSSE3 optimizations
> + --disable-avx disable AVX optimizations
> --disable-armv5te disable armv5te optimizations
> --disable-armv6 disable armv6 optimizations
> --disable-armv6t2 disable armv6t2 optimizations
> @@ -1181,6 +1182,7 @@ mmx_deps="x86"
> mmx2_deps="mmx"
> sse_deps="mmx"
> ssse3_deps="sse"
> +avx_deps="ssse3"
>
> aligned_stack_if_any="ppc x86"
> fast_64bit_if_any="alpha ia64 mips64 parisc64 ppc64 sparc64 x86_64"
I suppose we'll need to test for AVX support in gcc/yasm. Do
something similar to the ssse3 test.
> diff --git a/libavutil/cpu.h b/libavutil/cpu.h
> index 71cc265..2ddde26 100644
> --- a/libavutil/cpu.h
> +++ b/libavutil/cpu.h
> @@ -36,6 +36,7 @@
> #define AV_CPU_FLAG_SSSE3 0x0080 ///< Conroe SSSE3 functions
> #define AV_CPU_FLAG_SSE4 0x0100 ///< Penryn SSE4.1 functions
> #define AV_CPU_FLAG_SSE42 0x0200 ///< Nehalem SSE4.2 functions
> +#define AV_CPU_FLAG_AVX 0x0400 ///< Sandy Bridge AVX functions
> #define AV_CPU_FLAG_IWMMXT 0x0100 ///< XScale IWMMXT
> #define AV_CPU_FLAG_ALTIVEC 0x0001 ///< standard
BTW, shouldn't that be s/functions/instructions/?
> diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c
> index 4b6cb0d..a34e18f 100644
> --- a/libavutil/x86/cpu.c
> +++ b/libavutil/x86/cpu.c
> @@ -35,6 +35,12 @@
> "=c" (ecx), "=d" (edx)\
> : "0" (index));
>
> +#define xgetbv(index,eax,ebx,ecx,edx)\
> + __asm__ volatile\
> + ("xgetbv\n\t"\
> + : "=a" (eax), (edx)\
> + : "c" (index));
This should be something like
__asm__ ("xgetbv" : "=a"(eax), "=d"(edx) : "c"(index));
The volatile should not be needed since all effects of the block are
expressed in the constraints. You should also make the macro
arguments match how you're calling it below.
> /* Function to test if multimedia instructions are supported... */
> int ff_get_cpu_flags_x86(void)
> {
> @@ -95,6 +101,15 @@ int ff_get_cpu_flags_x86(void)
> rval |= AV_CPU_FLAG_SSE42;
> #endif
> ;
> +#if HAVE_AVX
> + /* Check OXSAVE and AVX bits */
> + if ((ecx&0x18000000) == 0x18000000) {
> + /* Check for OS support */
> + xgetbv(0, eax, edx);
> + if ((eax&0x6) == 0x6)
> + cpu |= AV_CPU_FLAG_AVX;
> + }
> +#endif
> }
I can't comment on the correctness of this.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list