[FFmpeg-devel] Patch: Inline asm fixes for Intel compiler on Windows

Matt Oliver protogonoi at gmail.com
Wed Mar 19 01:51:45 CET 2014


On 19 March 2014 10:45, Michael Niedermayer <michaelni at gmx.at> wrote:

> On Tue, Mar 18, 2014 at 04:52:55PM +1100, Matt Oliver wrote:
> > > i think its better not to have unneeded code in there, so if its
> > > needed it should stay, if not it should be removed
> >
> >
> > Removed.
> >
> > These constraints are there because of limitions/shortcommings of
> > > intels asm() support on windows.
> > > IMO we should keep such workarounds to the minimum possible
> >
> >
> > Fair enough, I updated the patches accordingly. These are basically the
> > minimum workarounds possible. The main changes occurred in
> yuv2rgb_template
> > where all references were added to the YUV2RGB_OPERANDS macro.  This
> added
> > 5 constraints (although in many situations only 1 is actually used). The
> > one thing ill point out is that these additional named constraints are
> > variables that are not actually from the YUV2RGB macro (which is why they
> > weren't initially put in there) so this may make the code a little less
> > readable for other devs as all these variables are used by the other
> > functions that use the YUV2RGB macro not from within the macro itself.
> >
> > Also I had to add an #if to define RGB_PACK24_B_OPERANDS as these
> variables
> > are only defined if COMPILE_TEMPLATE_MMXEXT so having them all the time
> > will result in errors. This again may be a little unclear for other
> > developers as the variables in question (mask1101 etc.) havnt even been
> > declared at this point in code. And of course the variables are not
> > relevant to YUV2RGB macro and are only used in a couple of the later
> > functions as they are only relevant to the RGB_PACK24_B macro and the
> > functions that use it. Trying to separate the RGB_PACK24_B variables cant
> > be simply done, in fact the simplest way to separate them is pretty much
> > the way the original patch was done. So its up to you which one you
> prefer.
>
> >  configure               |    3 +++
> >  libavcodec/x86/mlpdsp.c |    4 ++--
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> > b89e0bced3342badd213b5dccc454a829cc37658
>  0002-2-6-Check-for-nonlocal-inline-asm-labels.patch
> > From 5674f153f6e45cf487907bbe67b7109f5d7a83c0 Mon Sep 17 00:00:00 2001
> > From: Matt Oliver <protogonoi at gmail.com>
> > Date: Tue, 18 Mar 2014 15:29:14 +1100
> > Subject: [PATCH 2/6] [2/6] Check for nonlocal inline asm labels.
>
> applied
>
>
> [...]
>
> >  configure                          |    3 +++
> >  libavcodec/x86/cabac.h             |    3 ++-
> >  libavcodec/x86/cavsdsp.c           |    2 ++
> >  libavcodec/x86/dsputil_mmx.c       |    1 +
> >  libavcodec/x86/h264_i386.h         |    2 ++
> >  libavcodec/x86/idct_sse2_xvid.c    |    2 +-
> >  libavcodec/x86/lpc.c               |    3 +++
> >  libavcodec/x86/motion_est.c        |    7 ++++---
> >  libavcodec/x86/simple_idct.c       |    1 +
> >  libavcodec/x86/vc1dsp_mmx.c        |    6 ++++++
> >  libavutil/x86/asm.h                |   36
> +++++++++++++++++++++++++++++++++++-
> >  libpostproc/postprocess_template.c |    7 +++++++
> >  libswresample/x86/resample_mmx.h   |    2 ++
> >  libswscale/x86/rgb2rgb_template.c  |   11 +++++++++++
> >  libswscale/x86/swscale_template.c  |   12 ++++++++++++
> >  libswscale/x86/yuv2rgb_template.c  |    9 +++++++++
> >  16 files changed, 101 insertions(+), 6 deletions(-)
> > 74d49e1ba28eff2023cc9738d5db65796c179c39
>  0003-3-6-Check-for-inline-asm-direct-symbol-reference.patch
> > From 1f3868943e761d1826f4a56e6ec9951273d47d40 Mon Sep 17 00:00:00 2001
> > From: Matt Oliver <protogonoi at gmail.com>
> > Date: Tue, 18 Mar 2014 15:53:26 +1100
> > Subject: [PATCH 3/6] [3/6] Check for inline asm direct symbol reference.
>
> applied most of it
>
>
> [...]
> >  cabac.h      |    9 +++++++--
> >  h264_i386.h  |    4 ++--
> >  vp56_arith.h |    8 +++++++-
> >  3 files changed, 16 insertions(+), 5 deletions(-)
> > 389c1c1629a1f08bb7b8b4d2b85ad6335ecf9202
>  0004-4-6-Fix-for-broken-register-allocation-issues-with-I.patch
> > From 8450aea3c8bbfabea218d9b1980eebe4bda24643 Mon Sep 17 00:00:00 2001
> > From: Matt Oliver <protogonoi at gmail.com>
> > Date: Tue, 18 Mar 2014 15:54:27 +1100
> > Subject: [PATCH 4/6] [4/6] Fix for broken register allocation issues with
> >  Intel.
> >
> > ---
> >  libavcodec/x86/cabac.h      | 9 +++++++--
> >  libavcodec/x86/h264_i386.h  | 4 ++--
> >  libavcodec/x86/vp56_arith.h | 8 +++++++-
> >  3 files changed, 16 insertions(+), 5 deletions(-)
> >
> > diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
> > index 0a68b7b..efcb192 100644
> > --- a/libavcodec/x86/cabac.h
> > +++ b/libavcodec/x86/cabac.h
> > @@ -33,6 +33,11 @@
> >  #else
> >  #       define BROKEN_COMPILER 0
> >  #endif
> > +#if   (defined(__INTEL_COMPILER) && defined(_MSC_VER))
> > +#       define BROKEN_REGISTER_ALLOCATION 1
> > +#else
> > +#       define BROKEN_REGISTER_ALLOCATION 0
> > +#endif
>
> i assume this is because the code gets miscompiled ?
> did you report the miscompilation to intel ?
>
> [...]
>
> >  mpegvideoenc_template.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 2b12c0ec5941fd82fe2c38c5646db8c6583ad574
>  0005-5-6-Fixed-64bit-conformance-with-mvzbl.patch
> > From 754b9dc487260dedfae4be38c88dc60734561f67 Mon Sep 17 00:00:00 2001
> > From: Matt Oliver <protogonoi at gmail.com>
> > Date: Sun, 9 Feb 2014 17:10:12 +1100
> > Subject: [PATCH 5/6] [5/6] Fixed 64bit conformance with mvzbl.
>
> this has aready been applied
>
> also feel free to send a patch that adds yourself to MAINTAINERs
> for all these intel compiler workarounds
>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Old school: Use the lowest level language in which you can solve the
> problem
>             conveniently.
> New school: Use the highest level language in which the latest
> supercomputer
>             can solve the problem without the user falling asleep waiting.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
thanks for that.

i assume this is because the code gets miscompiled ?
> did you report the miscompilation to intel ?


Yeh the syntax of all the instructions is correct and ive tested the
function independently and it is correct with respect to what icl
technically supports but once you actually use it within the context of
FFmpeg the compiler throws an internal error as it is not capable of
correctly allocating enough registers. Now that were moving ahead with
getting the icl patches included I will direct intel to the code and the
error. However I dont expect it be fixed for a compiler version or 2.

applied most of it


Ill see if I can find another way to get icl to compile while still using
MANGLE but if memory serves I tried basically every combination I could
think off and couldnt get it to compile any other way. Are there any issues
with fixing it the way I did?


More information about the ffmpeg-devel mailing list