[FFmpeg-cvslog] avcodec/aac_tablegen: speed up table initialization

Ganesh Ajjanagadde git at videolan.org
Fri Nov 27 12:40:27 CET 2015


ffmpeg | branch: master | Ganesh Ajjanagadde <gajjanagadde at gmail.com> | Thu Nov 26 13:50:57 2015 -0500| [96786a12f6df26990bbe7c0ca4592b3731724469] | committer: Ganesh Ajjanagadde

avcodec/aac_tablegen: speed up table initialization

This speeds up aac_tablegen to a ludicruous degree (~97%), i.e to the point
where it can be argued that runtime initialization can always be done instead of
hard-coded tables. The only cost is essentially a trivial increase in
the stack size.

Even if one does not care about this, the patch also improves accuracy
as detailed below.

Performance:
Benchmark obtained by looping 10^4 times over ff_aac_tableinit.

Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
1295292 decicycles in ff_aac_tableinit,     512 runs,      0 skips
1275981 decicycles in ff_aac_tableinit,    1024 runs,      0 skips
1272932 decicycles in ff_aac_tableinit,    2048 runs,      0 skips
1262164 decicycles in ff_aac_tableinit,    4096 runs,      0 skips
1256720 decicycles in ff_aac_tableinit,    8192 runs,      0 skips

new:
21112 decicycles in ff_aac_tableinit,     511 runs,      1 skips
21269 decicycles in ff_aac_tableinit,    1023 runs,      1 skips
21352 decicycles in ff_aac_tableinit,    2043 runs,      5 skips
21386 decicycles in ff_aac_tableinit,    4080 runs,     16 skips
21299 decicycles in ff_aac_tableinit,    8173 runs,     19 skips

Accuracy:
The previous code was resulting in needless loss of
accuracy due to the pow being called in succession. As an illustration
of this:
ff_aac_pow34sf_tab[3]
old : 0.000000000007598092294225
new : 0.000000000007598091426864
real: 0.000000000007598091778545

truncated to float
old : 0.000000000007598092294225
new : 0.000000000007598091426864
real: 0.000000000007598091426864

showing that the old value was not correctly rounded. This affects a
large number of elements of the array.

Patch tested with FATE.

Reviewed-by: Rostislav Pehlivanov <atomnuker at gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>

> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=96786a12f6df26990bbe7c0ca4592b3731724469
---

 libavcodec/aac_tablegen.h |   41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/libavcodec/aac_tablegen.h b/libavcodec/aac_tablegen.h
index 8b223f9..85e189d 100644
--- a/libavcodec/aac_tablegen.h
+++ b/libavcodec/aac_tablegen.h
@@ -35,9 +35,46 @@ float ff_aac_pow34sf_tab[428];
 av_cold void ff_aac_tableinit(void)
 {
     int i;
+
+    /* 2^(i/16) for 0 <= i <= 15 */
+    const float exp2_lut[] = {
+        1.00000000000000000000,
+        1.04427378242741384032,
+        1.09050773266525765921,
+        1.13878863475669165370,
+        1.18920711500272106672,
+        1.24185781207348404859,
+        1.29683955465100966593,
+        1.35425554693689272830,
+        1.41421356237309504880,
+        1.47682614593949931139,
+        1.54221082540794082361,
+        1.61049033194925430818,
+        1.68179283050742908606,
+        1.75625216037329948311,
+        1.83400808640934246349,
+        1.91520656139714729387,
+    };
+    float t1 = 8.8817841970012523233890533447265625e-16; // 2^(-50)
+    float t2 = 3.63797880709171295166015625e-12; // 2^(-38)
+    int t1_inc_cur, t2_inc_cur;
+    int t1_inc_prev = 0;
+    int t2_inc_prev = 8;
+
     for (i = 0; i < 428; i++) {
-        ff_aac_pow2sf_tab[i] = pow(2, (i - POW_SF2_ZERO) / 4.0);
-        ff_aac_pow34sf_tab[i] = pow(ff_aac_pow2sf_tab[i], 3.0/4.0);
+        t1_inc_cur = 4 * (i % 4);
+        t2_inc_cur = (8 + 3*i) % 16;
+        if (t1_inc_cur < t1_inc_prev)
+            t1 *= 2;
+        if (t2_inc_cur < t2_inc_prev)
+            t2 *= 2;
+        // A much more efficient and accurate way of doing:
+        // ff_aac_pow2sf_tab[i] = pow(2, (i - POW_SF2_ZERO) / 4.0);
+        // ff_aac_pow34sf_tab[i] = pow(ff_aac_pow2sf_tab[i], 3.0/4.0);
+        ff_aac_pow2sf_tab[i] = t1 * exp2_lut[t1_inc_cur];
+        ff_aac_pow34sf_tab[i] = t2 * exp2_lut[t2_inc_cur];
+        t1_inc_prev = t1_inc_cur;
+        t2_inc_prev = t2_inc_cur;
     }
 }
 #endif /* CONFIG_HARDCODED_TABLES */



More information about the ffmpeg-cvslog mailing list