[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
Ganesh Ajjanagadde
gajjanag at mit.edu
Wed Oct 14 03:13:45 CEST 2015
On Tue, Oct 13, 2015 at 8:58 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Tue, Oct 13, 2015 at 8:09 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
>
>> On Tue, Oct 13, 2015 at 2:45 AM, Clément Bœsch <u at pkh.me> wrote:
>> > On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote:
>> >> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde <gajjanag at mit.edu>
>> wrote:
>> >> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos <cehoyos at ag.or.at>
>> wrote:
>> >> >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>> >> >>
>> >> >>> Bench from libavfilter/astats on a 15 min clip.
>> >> >>
>> >> >> I believe that your test would indicate that the
>> >> >> old variant is faster or that no result can be
>> >> >> given which is what my tests show.
>> >>
>> >> Also, how you can possibly believe that the old variant is faster is
>> >> beyond me given the astonishing amount of work by Intel, Red Hat, and
>> >> others to create the absolutely best performing libc.
>> >>
>> >> Just have a look at
>> >>
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281
>> ,
>> >> it gives an idea of the extreme lengths they go to.
>> >>
>> >
>> >
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD
>> >
>> > [/tmp]☭ cat a.c
>> > #include <math.h>
>> > #include <stdlib.h>
>> >
>> > #define FFABS(a) ((a) >= 0 ? (a) : (-(a)))
>> >
>> > double f1d(double x) { return fabs(x); }
>> > double f2d(double x) { return FFABS(x); }
>> >
>> > int f1i(int x) { return abs(x); }
>> > int f2i(int x) { return FFABS(x); }
>> > [/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o
>> >
>> > a.o: file format elf64-x86-64
>> >
>> >
>> > Disassembly of section .text:
>> >
>> > 0000000000000000 <f1d>:
>> > 0: f2 0f 10 0d 00 00 00 movsd xmm1,QWORD PTR [rip+0x0] #
>> 8 <f1d+0x8>
>> > 7: 00
>> > 8: 66 0f 54 c1 andpd xmm0,xmm1
>> > c: c3 ret
>> > d: 0f 1f 00 nop DWORD PTR [rax]
>> >
>> > 0000000000000010 <f2d>:
>> > 10: 66 0f 2e 05 00 00 00 ucomisd xmm0,QWORD PTR [rip+0x0]
>> # 18 <f2d+0x8>
>> > 17: 00
>> > 18: 72 06 jb 20 <f2d+0x10>
>> > 1a: f3 c3 repz ret
>> > 1c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
>> > 20: f2 0f 10 0d 00 00 00 movsd xmm1,QWORD PTR [rip+0x0] #
>> 28 <f2d+0x18>
>> > 27: 00
>> > 28: 66 0f 57 c1 xorpd xmm0,xmm1
>> > 2c: c3 ret
>> > 2d: 0f 1f 00 nop DWORD PTR [rax]
>> >
>> > 0000000000000030 <f1i>:
>> > 30: 89 fa mov edx,edi
>> > 32: 89 f8 mov eax,edi
>> > 34: c1 fa 1f sar edx,0x1f
>> > 37: 31 d0 xor eax,edx
>> > 39: 29 d0 sub eax,edx
>> > 3b: c3 ret
>> > 3c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
>> >
>> > 0000000000000040 <f2i>:
>> > 40: 89 fa mov edx,edi
>> > 42: 89 f8 mov eax,edi
>> > 44: c1 fa 1f sar edx,0x1f
>> > 47: 31 d0 xor eax,edx
>> > 49: 29 d0 sub eax,edx
>> > 4b: c3 ret
>> > [/tmp]☭
>> >
>> > So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is
>> > essentially identical to FFABS().
>> >
>> > I have similar results with clang (3.7.0).
>> >
>> > Conclusion: using fabs() looks better with at least recent versions of
>> clang
>> > and GCC on x86-64 (but may introduce slight behaviour changes?)
>> >
>> > To be more rigorous, it would be interesting to compare on different
>> arch &
>> > compilers, but changing FFABS() with fabs() sounds OK to me.
>>
>> I noticed that is being applied piecemeal, and some of it has been
>> pushed. Does that mean I am free to push (with the reduced commit
>> message) as well?
>
>
> You'll notice that Paul did it for the filters he maintains. I'm fine with
> you doing this for any code I maintain (no further review required). You
> can find maintainers for each piece of code in git log or MAINTAINERS. It
> sounds like Paul is fine with this also. I think the general case, it'd be
> nice to figure out why Carl's results are slightly different from yours (or
> maybe it's noise?). If we can resolve that, I don't think there's any
> further outstanding objections, right?
No other outstanding objections - the only serious concern is
availability (which is a non-issue since we were already using fabs,
fabsf sporadically). Carl's issues should be either noise, or a
bad/slow libc fabs implementation. Hence I requested him for his
config.
I will give respective maintainers a week for slowly adding this
stuff. To reiterate, I have not touched avcodec as it is mostly
integer math anyway - if someone could point out some "hotspots" in
avcodec with this issue, that would be great.
>
> Also, is a single push preferred, or one for each
>> file (like the way it is being done)?
>
>
> Whatever you prefer, really. Small commits have the advantage that they're
> easier to bisect/revert if something breaks, but it's obviously a lot more
> work to create N small patches than 1 big patch. So both approaches have
> advantages and disadvantages and we just do whatever works best for each of
> us. (Not very consistent, I admit.)
I think giving a week will let me do a single patch "remaining
floating point abs...".
>
> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list