[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

Ronald S. Bultje rsbultje at gmail.com
Wed Oct 14 02:58:02 CEST 2015


Hi,

On Tue, Oct 13, 2015 at 8:09 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
wrote:

> On Tue, Oct 13, 2015 at 2:45 AM, Clément Bœsch <u at pkh.me> wrote:
> > On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote:
> >> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
> >> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos <cehoyos at ag.or.at>
> wrote:
> >> >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
> >> >>
> >> >>> Bench from libavfilter/astats on a 15 min clip.
> >> >>
> >> >> I believe that your test would indicate that the
> >> >> old variant is faster or that no result can be
> >> >> given which is what my tests show.
> >>
> >> Also, how you can possibly believe that the old variant is faster is
> >> beyond me given the astonishing amount of work by Intel, Red Hat, and
> >> others to create the absolutely best performing libc.
> >>
> >> Just have a look at
> >>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281
> ,
> >> it gives an idea of the extreme lengths they go to.
> >>
> >
> >
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD
> >
> > [/tmp]☭ cat a.c
> > #include <math.h>
> > #include <stdlib.h>
> >
> > #define FFABS(a) ((a) >= 0 ? (a) : (-(a)))
> >
> > double f1d(double x) { return fabs(x); }
> > double f2d(double x) { return FFABS(x); }
> >
> > int f1i(int x) { return abs(x); }
> > int f2i(int x) { return FFABS(x); }
> > [/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o
> >
> > a.o:     file format elf64-x86-64
> >
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <f1d>:
> >    0:   f2 0f 10 0d 00 00 00    movsd  xmm1,QWORD PTR [rip+0x0]        #
> 8 <f1d+0x8>
> >    7:   00
> >    8:   66 0f 54 c1             andpd  xmm0,xmm1
> >    c:   c3                      ret
> >    d:   0f 1f 00                nop    DWORD PTR [rax]
> >
> > 0000000000000010 <f2d>:
> >   10:   66 0f 2e 05 00 00 00    ucomisd xmm0,QWORD PTR [rip+0x0]
> # 18 <f2d+0x8>
> >   17:   00
> >   18:   72 06                   jb     20 <f2d+0x10>
> >   1a:   f3 c3                   repz ret
> >   1c:   0f 1f 40 00             nop    DWORD PTR [rax+0x0]
> >   20:   f2 0f 10 0d 00 00 00    movsd  xmm1,QWORD PTR [rip+0x0]        #
> 28 <f2d+0x18>
> >   27:   00
> >   28:   66 0f 57 c1             xorpd  xmm0,xmm1
> >   2c:   c3                      ret
> >   2d:   0f 1f 00                nop    DWORD PTR [rax]
> >
> > 0000000000000030 <f1i>:
> >   30:   89 fa                   mov    edx,edi
> >   32:   89 f8                   mov    eax,edi
> >   34:   c1 fa 1f                sar    edx,0x1f
> >   37:   31 d0                   xor    eax,edx
> >   39:   29 d0                   sub    eax,edx
> >   3b:   c3                      ret
> >   3c:   0f 1f 40 00             nop    DWORD PTR [rax+0x0]
> >
> > 0000000000000040 <f2i>:
> >   40:   89 fa                   mov    edx,edi
> >   42:   89 f8                   mov    eax,edi
> >   44:   c1 fa 1f                sar    edx,0x1f
> >   47:   31 d0                   xor    eax,edx
> >   49:   29 d0                   sub    eax,edx
> >   4b:   c3                      ret
> > [/tmp]☭
> >
> > So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is
> > essentially identical to FFABS().
> >
> > I have similar results with clang (3.7.0).
> >
> > Conclusion: using fabs() looks better with at least recent versions of
> clang
> > and GCC on x86-64 (but may introduce slight behaviour changes?)
> >
> > To be more rigorous, it would be interesting to compare on different
> arch &
> > compilers, but changing FFABS() with fabs() sounds OK to me.
>
> I noticed that is being applied piecemeal, and some of it has been
> pushed. Does that mean I am free to push (with the reduced commit
> message) as well?


You'll notice that Paul did it for the filters he maintains. I'm fine with
you doing this for any code I maintain (no further review required). You
can find maintainers for each piece of code in git log or MAINTAINERS. It
sounds like Paul is fine with this also. I think the general case, it'd be
nice to figure out why Carl's results are slightly different from yours (or
maybe it's noise?). If we can resolve that, I don't think there's any
further outstanding objections, right?

Also, is a single push preferred, or one for each
> file (like the way it is being done)?


Whatever you prefer, really. Small commits have the advantage that they're
easier to bisect/revert if something breaks, but it's obviously a lot more
work to create N small patches than 1 big patch. So both approaches have
advantages and disadvantages and we just do whatever works best for each of
us. (Not very consistent, I admit.)

Ronald


More information about the ffmpeg-devel mailing list