[FFmpeg-devel] Patch: Inline asm fixes for Intel compiler on Windows

Sun Apr 6 17:43:52 CEST 2014

On Sun, Apr 06, 2014 at 03:29:39PM +0200, Michael Niedermayer wrote:
> On Sun, Apr 06, 2014 at 12:21:54AM +1100, Matt Oliver wrote:
> > > If you do not initialize "bit", making it an output-only paramenter,
> > > you should use =&q instead of +q as constraint.
> > > However neither is correct as-is, setae sets only a single byte,
> > > thus not initializing it will result in uninitialized data,
> > > IOW the code as is is broken.
> > > If you want to do that kind of change, you would have to use uint8_t
> > > instead as type.
> > 
> > 
> > Interestingly both compilers I tested with didnt show any errors and had
> > the upper bits zeroed. However as the function returns a 32b int and any
> > code that calls this function would also assume 32b then a xor to set to 0
> > initially is still probably the way to go. Using uint8 would save that
> > instruction but would add one later to clear the upper bytes should it ever
> > be used for anything larger than 1B. So fixed to add back the set to 0.
> 
> >  vp56_arith.h |   13 +++++--------
> >  1 file changed, 5 insertions(+), 8 deletions(-)
> > 7a259454956d08f2c3e8375715c253a3327780d4  Remove-leal-op-to-fix-icl-inline-asm.patch
> > From 995c066d7b48d2a6603bb46649badba74e757dd5 Mon Sep 17 00:00:00 2001
> > From: Matt Oliver <protogonoi at gmail.com>
> > Date: Sun, 6 Apr 2014 00:11:21 +1100
> > Subject: [PATCH] Remove leal op to fix icl inline asm.
> > 
> > ---
> >  libavcodec/x86/vp56_arith.h | 13 +++++--------
> >  1 file changed, 5 insertions(+), 8 deletions(-)
> > 
> > diff --git a/libavcodec/x86/vp56_arith.h b/libavcodec/x86/vp56_arith.h
> > index e71dbf8..6cc97dd 100644
> > --- a/libavcodec/x86/vp56_arith.h
> > +++ b/libavcodec/x86/vp56_arith.h
> > @@ -29,24 +29,21 @@
> >  static av_always_inline int vp56_rac_get_prob(VP56RangeCoder *c, uint8_t prob)
> >  {
> >      unsigned int code_word = vp56_rac_renorm(c);
> > -    unsigned int high = c->high;
> > -    unsigned int low = 1 + (((high - 1) * prob) >> 8);
> > +    unsigned int low = 1 + (((c->high - 1) * prob) >> 8);
> >      unsigned int low_shift = low << 16;
> >      int bit = 0;
> > +    c->code_word = code_word;
> >  
> >      __asm__(
> >          "subl  %4, %1      \n\t"
> >          "subl  %3, %2      \n\t"
> > -        "leal (%2, %3), %3 \n\t"
> >          "setae %b0         \n\t"
> >          "cmovb %4, %1      \n\t"
> > -        "cmovb %3, %2      \n\t"
> > -        : "+q"(bit), "+r"(high), "+r"(code_word), "+r"(low_shift)
> > -        : "r"(low)
> > +        "cmovb %5, %2      \n\t"
> > +        : "+q"(bit), "+r"(c->high), "+r"(c->code_word)
> > +        : "r"(low_shift), "r"(low), "r"(code_word)
> >      );
> >  
> > -    c->high      = high;
> > -    c->code_word = code_word;
> 
> this patch breaks fate-vp6a-skip_alpha, fate-vp6f and fate-vp6a
> possibly also others, didnt test with -k

I think all the in-out parameters must be +& instead of just +.
Note that this was incorrect even before, but it probably
shows up only now since now it is trivial for the compiler
to figure out that c->code_word and code_word are the same thing,
and without early clobber it can place them onto the same register.
This of course invalidates all previous benchmarks...