[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

Ganesh Ajjanagadde gajjanag at mit.edu
Mon Oct 12 13:41:49 CEST 2015


Hi all,

Once again, gmail is misbehaving with my patches. This one is a ~50
line diff, so there should be no reason at all for it. I am trying to
switch to my university's mail smtp for this send-email as I am tired
of this nonsense (note: this is after allowing gmail access to "less
secure" apps; needed to run smtp send-email).

Unfortunately, it is not yet working. Here is what I am doing for now:
copy pasting the patch (which may or may not be tab/whitespace
mangled) so that you can reply inline. I am also attaching it as a
patch so that you have a "clean" copy. Since I now have exercised
write privileges, I can push the patch myself after review, or else
one of you can do so, but please keep in mind to do it to the
attachment, and not the body itself.

-----------------------------------------------------------------------------------------------------------

>From 1f042b72a91fa6e8d190fd8a1d4e06843f0b3951 Mon Sep 17 00:00:00 2001
From: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
Date: Mon, 12 Oct 2015 01:30:22 -0400
Subject: [PATCH] avfilter,swresample,swscale: use fabs, fabsf instead of FFABS

It is well known that fabs and fabsf are at least as fast and usually
faster than the FFABS macro, at least on the gcc+glibc combination.
For instance, see the reference:
http://patchwork.sourceware.org/patch/6735/.
This was a patch to glibc in order to remove their usages. Given their
general performance obsession (more than FFmpeg in many cases), they
have ensured that fabs and fabsf never peform worse than FFABS.
I have tested on x86-64 Haswell with GCC 5.2 - even with no strict IEEE
mode enabled, and just the standard -O3 optimizations, there is a
performance benefit.

More broadly speaking, when all of our platforms support all the C
standard abs functions (even llabs), I see no reason to use FFABS at
all due to the
general caveats surrounding macros. I would remove them in all cases myself,
but at least this change should be uncontroversial, since it offers a
performance
boost on common platforms.

I highly doubt any competent libc implementation can do worse than the
macro, but I lack the hardware/software to test such things outside the
GNU/Linux environment to confirm this. Worst case, we might need a fabs,
fabsf wrapper (av_fabs, av_fabsf?). On the other hand, any build
environment with a regression from fabs, fabsf is broken
from a performance standpoint: the macro + inline attributes are only a
few lines of code.

Please note that avcodec is not handled by this patch, as it is huge and
most things there are integer arithmetic (beyond the scope of this patch).

Tested with FATE.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
---
 libavfilter/af_agate.c             | 6 +++---
 libavfilter/af_astats.c            | 6 +++---
 libavfilter/af_sidechaincompress.c | 6 +++---
 libavfilter/af_stereotools.c       | 2 +-
 libavfilter/avf_avectorscope.c     | 4 ++--
 libavfilter/avf_showcqt.c          | 2 +-
 libavfilter/avf_showfreqs.c        | 4 ++--
 libavfilter/f_ebur128.c            | 6 +++---
 libavfilter/vf_blend.c             | 4 ++--
 libavfilter/vf_dctdnoiz.c          | 4 ++--
 libavfilter/vf_framerate.c         | 4 ++--
 libavfilter/vf_hqdn3d.c            | 2 +-
 libavformat/mux.c                  | 2 +-
 libswresample/swresample-test.c    | 4 ++--
 14 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/libavfilter/af_agate.c b/libavfilter/af_agate.c
index f9ae5da..b56f32e 100644
--- a/libavfilter/af_agate.c
+++ b/libavfilter/af_agate.c
@@ -176,17 +176,17 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in)
     dst = (double *)out->data[0];

     for (n = 0; n < in->nb_samples; n++, src += inlink->channels, dst
+= inlink->channels) {
-        double abs_sample = FFABS(src[0]), gain = 1.0;
+        double abs_sample = fabs(src[0]), gain = 1.0;

         for (c = 0; c < inlink->channels; c++)
             dst[c] = src[c] * level_in;

         if (s->link == 1) {
             for (c = 1; c < inlink->channels; c++)
-                abs_sample = FFMAX(FFABS(src[c]), abs_sample);
+                abs_sample = FFMAX(fabs(src[c]), abs_sample);
         } else {
             for (c = 1; c < inlink->channels; c++)
-                abs_sample += FFABS(src[c]);
+                abs_sample += fabs(src[c]);

             abs_sample /= inlink->channels;
         }
diff --git a/libavfilter/af_astats.c b/libavfilter/af_astats.c
index f385d2e..b3b8f28 100644
--- a/libavfilter/af_astats.c
+++ b/libavfilter/af_astats.c
@@ -163,9 +163,9 @@ static inline void update_stat(AudioStatsContext
*s, ChannelStats *p, double d)
     p->sigma_x += d;
     p->sigma_x2 += d * d;
     p->avg_sigma_x2 = p->avg_sigma_x2 * s->mult + (1.0 - s->mult) * d * d;
-    p->min_diff = FFMIN(p->min_diff == -1 ? DBL_MAX : p->min_diff,
FFABS(d - (p->min_diff == -1 ? DBL_MAX : p->last)));
-    p->max_diff = FFMAX(p->max_diff, FFABS(d - (p->max_diff == -1 ? d
: p->last)));
-    p->diff1_sum += FFABS(d - p->last);
+    p->min_diff = FFMIN(p->min_diff == -1 ? DBL_MAX : p->min_diff,
fabs(d - (p->min_diff == -1 ? DBL_MAX : p->last)));
+    p->max_diff = FFMAX(p->max_diff, fabs(d - (p->max_diff == -1 ? d
: p->last)));
+    p->diff1_sum += fabs(d - p->last);
     p->last = d;
     p->mask |= llrint(d * (UINT64_C(1) << 63));

diff --git a/libavfilter/af_sidechaincompress.c
b/libavfilter/af_sidechaincompress.c
index 0ec01e2..29b3753 100644
--- a/libavfilter/af_sidechaincompress.c
+++ b/libavfilter/af_sidechaincompress.c
@@ -154,14 +154,14 @@ static int filter_frame(AVFilterLink *link,
AVFrame *frame)
     for (i = 0; i < nb_samples; i++) {
         double abs_sample, gain = 1.0;

-        abs_sample = FFABS(scsrc[0]);
+        abs_sample = fabs(scsrc[0]);

         if (s->link == 1) {
             for (c = 1; c < sclink->channels; c++)
-                abs_sample = FFMAX(FFABS(scsrc[c]), abs_sample);
+                abs_sample = FFMAX(fabs(scsrc[c]), abs_sample);
         } else {
             for (c = 1; c < sclink->channels; c++)
-                abs_sample += FFABS(scsrc[c]);
+                abs_sample += fabs(scsrc[c]);

             abs_sample /= sclink->channels;
         }
diff --git a/libavfilter/af_stereotools.c b/libavfilter/af_stereotools.c
index e19ada4..a22efb0 100644
--- a/libavfilter/af_stereotools.c
+++ b/libavfilter/af_stereotools.c
@@ -146,7 +146,7 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in)
     double *buffer = s->buffer;
     AVFrame *out;
     double *dst;
-    int nbuf = inlink->sample_rate * (FFABS(delay) / 1000.);
+    int nbuf = inlink->sample_rate * (fabs(delay) / 1000.);
     int n;

     nbuf -= nbuf % 2;
diff --git a/libavfilter/avf_avectorscope.c b/libavfilter/avf_avectorscope.c
index 30985f3..38dd97e 100644
--- a/libavfilter/avf_avectorscope.c
+++ b/libavfilter/avf_avectorscope.c
@@ -220,7 +220,7 @@ static int filter_frame(AVFilterLink *inlink,
AVFrame *insamples)
                 cx = sx * sqrtf(1 - 0.5*sy*sy);
                 cy = sy * sqrtf(1 - 0.5*sx*sx);
                 x = hw + hw * FFSIGN(cx + cy) * (cx - cy) * .7;
-                y = s->h - s->h * FFABS(cx + cy) * .7;
+                y = s->h - s->h * fabsf(cx + cy) * .7;
             }

             draw_dot(s, x, y);
@@ -244,7 +244,7 @@ static int filter_frame(AVFilterLink *inlink,
AVFrame *insamples)
                 cx = sx * sqrtf(1 - 0.5 * sy * sy);
                 cy = sy * sqrtf(1 - 0.5 * sx * sx);
                 x = hw + hw * FFSIGN(cx + cy) * (cx - cy) * .7;
-                y = s->h - s->h * FFABS(cx + cy) * .7;
+                y = s->h - s->h * fabsf(cx + cy) * .7;
             }

             draw_dot(s, x, y);
diff --git a/libavfilter/avf_showcqt.c b/libavfilter/avf_showcqt.c
index ce42cd6..2bd772e 100644
--- a/libavfilter/avf_showcqt.c
+++ b/libavfilter/avf_showcqt.c
@@ -371,7 +371,7 @@ static int config_output(AVFilterLink *outlink)
             tlength = s->timeclamp;
         }

-        volume = FFABS(av_expr_eval(volume_expr, expr_vars_val, NULL));
+        volume = fabs(av_expr_eval(volume_expr, expr_vars_val, NULL));
         if (isnan(volume)) {
             av_log(ctx, AV_LOG_WARNING, "at freq %g: volume is nan,
setting it to 0\n", freq);
             volume = VOLUME_MIN;
diff --git a/libavfilter/avf_showfreqs.c b/libavfilter/avf_showfreqs.c
index 0f2ae22..a3665ef 100644
--- a/libavfilter/avf_showfreqs.c
+++ b/libavfilter/avf_showfreqs.c
@@ -163,7 +163,7 @@ static void generate_window_func(float *lut, int
N, int win_func, float *overlap
         break;
     case WFUNC_BARTLETT:
         for (n = 0; n < N; n++)
-            lut[n] = 1.-FFABS((n-(N-1)/2.)/((N-1)/2.));
+            lut[n] = 1.-fabs((n-(N-1)/2.)/((N-1)/2.));
         *overlap = 0.5;
         break;
     case WFUNC_HANNING:
@@ -207,7 +207,7 @@ static void generate_window_func(float *lut, int
N, int win_func, float *overlap
         break;
     case WFUNC_BHANN:
         for (n = 0; n < N; n++)
-            lut[n] =
0.62-0.48*FFABS(n/(double)(N-1)-.5)-0.38*cos(2*M_PI*n/(N-1));
+            lut[n] =
0.62-0.48*fabs(n/(double)(N-1)-.5)-0.38*cos(2*M_PI*n/(N-1));
         *overlap = 0.5;
         break;
     case WFUNC_SINE:
diff --git a/libavfilter/f_ebur128.c b/libavfilter/f_ebur128.c
index b9ea11b..9e115fc 100644
--- a/libavfilter/f_ebur128.c
+++ b/libavfilter/f_ebur128.c
@@ -558,9 +558,9 @@ static int filter_frame(AVFilterLink *inlink,
AVFrame *insamples)
             ebur128->true_peaks_per_frame[ch] = 0.0;
         for (idx_insample = 0; idx_insample < ret; idx_insample++) {
             for (ch = 0; ch < nb_channels; ch++) {
-                ebur128->true_peaks[ch] =
FFMAX(ebur128->true_peaks[ch], FFABS(*swr_samples));
+                ebur128->true_peaks[ch] =
FFMAX(ebur128->true_peaks[ch], fabs(*swr_samples));
                 ebur128->true_peaks_per_frame[ch] =
FFMAX(ebur128->true_peaks_per_frame[ch],
-                                                          FFABS(*swr_samples));
+                                                          fabs(*swr_samples));
                 swr_samples++;
             }
         }
@@ -586,7 +586,7 @@ static int filter_frame(AVFilterLink *inlink,
AVFrame *insamples)
             double bin;

             if (ebur128->peak_mode & PEAK_MODE_SAMPLES_PEAKS)
-                ebur128->sample_peaks[ch] =
FFMAX(ebur128->sample_peaks[ch], FFABS(*samples));
+                ebur128->sample_peaks[ch] =
FFMAX(ebur128->sample_peaks[ch], fabs(*samples));

             ebur128->x[ch * 3] = *samples++; // set X[i]

diff --git a/libavfilter/vf_blend.c b/libavfilter/vf_blend.c
index 7b5e51b..f2c4b84 100644
--- a/libavfilter/vf_blend.c
+++ b/libavfilter/vf_blend.c
@@ -241,7 +241,7 @@ DEFINE_BLEND8(lighten,    FFMAX(A, B))
 DEFINE_BLEND8(divide,     av_clip_uint8(((float)A / ((float)B) * 255)))
 DEFINE_BLEND8(dodge,      DODGE(A, B))
 DEFINE_BLEND8(burn,       BURN(A, B))
-DEFINE_BLEND8(softlight,  (A > 127) ? B + (255 - B) * (A - 127.5) /
127.5 * (0.5 - FFABS(B - 127.5) / 255): B - B * ((127.5 - A) / 127.5)
* (0.5 - FFABS(B - 127.5)/255))
+DEFINE_BLEND8(softlight,  (A > 127) ? B + (255 - B) * (A - 127.5) /
127.5 * (0.5 - fabs(B - 127.5) / 255): B - B * ((127.5 - A) / 127.5) *
(0.5 - fabs(B - 127.5)/255))
 DEFINE_BLEND8(exclusion,  A + B - 2 * A * B / 255)
 DEFINE_BLEND8(pinlight,   (B < 128) ? FFMIN(A, 2 * B) : FFMAX(A, 2 *
(B - 128)))
 DEFINE_BLEND8(phoenix,    FFMIN(A, B) - FFMAX(A, B) + 255)
@@ -280,7 +280,7 @@ DEFINE_BLEND16(lighten,    FFMAX(A, B))
 DEFINE_BLEND16(divide,     av_clip_uint16(((float)A / ((float)B) * 65535)))
 DEFINE_BLEND16(dodge,      DODGE(A, B))
 DEFINE_BLEND16(burn,       BURN(A, B))
-DEFINE_BLEND16(softlight,  (A > 32767) ? B + (65535 - B) * (A -
32767.5) / 32767.5 * (0.5 - FFABS(B - 32767.5) / 65535): B - B *
((32767.5 - A) / 32767.5) * (0.5 - FFABS(B - 32767.5)/65535))
+DEFINE_BLEND16(softlight,  (A > 32767) ? B + (65535 - B) * (A -
32767.5) / 32767.5 * (0.5 - fabs(B - 32767.5) / 65535): B - B *
((32767.5 - A) / 32767.5) * (0.5 - fabs(B - 32767.5)/65535))
 DEFINE_BLEND16(exclusion,  A + B - 2 * A * B / 65535)
 DEFINE_BLEND16(pinlight,   (B < 32768) ? FFMIN(A, 2 * B) : FFMAX(A, 2
* (B - 32768)))
 DEFINE_BLEND16(phoenix,    FFMIN(A, B) - FFMAX(A, B) + 65535)
diff --git a/libavfilter/vf_dctdnoiz.c b/libavfilter/vf_dctdnoiz.c
index 37306bb..6957f19 100644
--- a/libavfilter/vf_dctdnoiz.c
+++ b/libavfilter/vf_dctdnoiz.c
@@ -367,10 +367,10 @@ static av_always_inline void
filter_freq_##bsize(const float *src, int src_lines
         float *b = &tmp_block2[i];
                      \
         /* frequency filtering */
                      \
         if (expr) {
                      \
-            var_values[VAR_C] = FFABS(*b);
                      \
+            var_values[VAR_C] = fabsf(*b);
                      \
             *b *= av_expr_eval(expr, var_values, NULL);
                      \
         } else {
                      \
-            if (FFABS(*b) < sigma_th)
                      \
+            if (fabsf(*b) < sigma_th)
                      \
                 *b = 0;
                      \
         }
                      \
     }
                      \
diff --git a/libavfilter/vf_framerate.c b/libavfilter/vf_framerate.c
index e8fba28..237a487 100644
--- a/libavfilter/vf_framerate.c
+++ b/libavfilter/vf_framerate.c
@@ -223,7 +223,7 @@ static int blend_frames16(AVFilterContext *ctx,
float interpolate,
     }
     // decide if the shot-change detection allows us to blend two frames
     if (interpolate_scene_score < s->scene_score && copy_src2) {
-        uint16_t src2_factor = FFABS(interpolate) * (1 << (s->bitdepth - 8));
+        uint16_t src2_factor = fabsf(interpolate) * (1 << (s->bitdepth - 8));
         uint16_t src1_factor = s->max - src2_factor;
         const int half = s->max / 2;
         const int uv = (s->max + 1) * half;
@@ -287,7 +287,7 @@ static int blend_frames8(AVFilterContext *ctx,
float interpolate,
     }
     // decide if the shot-change detection allows us to blend two frames
     if (interpolate_scene_score < s->scene_score && copy_src2) {
-        uint16_t src2_factor = FFABS(interpolate);
+        uint16_t src2_factor = fabsf(interpolate);
         uint16_t src1_factor = 256 - src2_factor;
         int plane, line, pixel;

diff --git a/libavfilter/vf_hqdn3d.c b/libavfilter/vf_hqdn3d.c
index 6c76c5c..5b367ff 100644
--- a/libavfilter/vf_hqdn3d.c
+++ b/libavfilter/vf_hqdn3d.c
@@ -182,7 +182,7 @@ static int16_t *precalc_coefs(double dist25, int depth)

     for (i = -256<<LUT_BITS; i < 256<<LUT_BITS; i++) {
         double f = ((i<<(9-LUT_BITS)) + (1<<(8-LUT_BITS)) - 1) /
512.0; // midpoint of the bin
-        simil = FFMAX(0, 1.0 - FFABS(f) / 255.0);
+        simil = FFMAX(0, 1.0 - fabs(f) / 255.0);
         C = pow(simil, gamma) * 256.0 * f;
         ct[(256<<LUT_BITS)+i] = lrint(C);
     }
diff --git a/libavformat/mux.c b/libavformat/mux.c
index c9ef490..c60cbda 100644
--- a/libavformat/mux.c
+++ b/libavformat/mux.c
@@ -317,7 +317,7 @@ FF_ENABLE_DEPRECATION_WARNINGS
                 goto fail;
             }
             if (av_cmp_q(st->sample_aspect_ratio, codec->sample_aspect_ratio)
-                && FFABS(av_q2d(st->sample_aspect_ratio) -
av_q2d(codec->sample_aspect_ratio)) >
0.004*av_q2d(st->sample_aspect_ratio)
+                && fabs(av_q2d(st->sample_aspect_ratio) -
av_q2d(codec->sample_aspect_ratio)) >
0.004*av_q2d(st->sample_aspect_ratio)
             ) {
                 if (st->sample_aspect_ratio.num != 0 &&
                     st->sample_aspect_ratio.den != 0 &&
diff --git a/libswresample/swresample-test.c b/libswresample/swresample-test.c
index 47c54a1..9caa750 100644
--- a/libswresample/swresample-test.c
+++ b/libswresample/swresample-test.c
@@ -374,7 +374,7 @@ int main(int argc, char **argv){
                 sum_aa+= a*a;
                 sum_bb+= b*b;
                 sum_ab+= a*b;
-                maxdiff= FFMAX(maxdiff, FFABS(a-b));
+                maxdiff= FFMAX(maxdiff, fabs(a-b));
             }
             sse= sum_aa + sum_bb - 2*sum_ab;
             if(sse < 0 && sse > -0.00001) sse=0; //fix rounding error
@@ -404,7 +404,7 @@ int main(int argc, char **argv){
                     sum_aa+= a*a;
                     sum_bb+= b*b;
                     sum_ab+= a*b;
-                    maxdiff= FFMAX(maxdiff, FFABS(a-b));
+                    maxdiff= FFMAX(maxdiff, fabs(a-b));
                 }
                 sse= sum_aa + sum_bb - 2*sum_ab;
                 if(sse < 0 && sse > -0.00001) sse=0; //fix rounding error
-- 
2.6.1


Regards,
Ganesh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-avfilter-swresample-swscale-use-fabs-fabsf-instead-o.patch
Type: text/x-diff
Size: 16157 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151012/230333a8/attachment.patch>


More information about the ffmpeg-devel mailing list