[FFmpeg-devel] [PATCHv2] lavc/cbrt_tablegen: speed up tablegen
Michael Niedermayer
michael at niedermayer.cc
Fri Jan 8 01:48:05 CET 2016
On Mon, Jan 04, 2016 at 06:33:59PM -0800, Ganesh Ajjanagadde wrote:
> This exploits an approach based on the sieve of Eratosthenes, a popular
> method for generating prime numbers.
>
> Tables are identical to previous ones.
>
> Tested with FATE with/without --enable-hardcoded-tables.
>
> Sample benchmark (Haswell, GNU/Linux+gcc):
> prev:
> 7860100 decicycles in cbrt_tableinit, 1 runs, 0 skips
> 7777490 decicycles in cbrt_tableinit, 2 runs, 0 skips
> [...]
> 7582339 decicycles in cbrt_tableinit, 256 runs, 0 skips
> 7563556 decicycles in cbrt_tableinit, 512 runs, 0 skips
>
> new:
> 2099480 decicycles in cbrt_tableinit, 1 runs, 0 skips
> 2044470 decicycles in cbrt_tableinit, 2 runs, 0 skips
> [...]
> 1796544 decicycles in cbrt_tableinit, 256 runs, 0 skips
> 1791631 decicycles in cbrt_tableinit, 512 runs, 0 skips
>
> Both small and large run count given as this is called once so small run
> count may give a better picture, small numbers are fairly consistent,
> and there is a consistent downward trend from small to large runs,
> at which point it stabilizes to a new value.
>
> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
> ---
> libavcodec/aacdec_fixed.c | 4 +--
> libavcodec/aacdec_template.c | 2 +-
> libavcodec/cbrt_tablegen.h | 53 ++++++++++++++++++++++++++-----------
> libavcodec/cbrt_tablegen_template.c | 12 ++++++++-
> 4 files changed, 51 insertions(+), 20 deletions(-)
>
> diff --git a/libavcodec/aacdec_fixed.c b/libavcodec/aacdec_fixed.c
> index 396a874..f7b882b 100644
> --- a/libavcodec/aacdec_fixed.c
> +++ b/libavcodec/aacdec_fixed.c
> @@ -155,9 +155,9 @@ static void vector_pow43(int *coefs, int len)
> for (i=0; i<len; i++) {
> coef = coefs[i];
> if (coef < 0)
> - coef = -(int)cbrt_tab[-coef];
> + coef = -(int)cbrt_tab[-coef].i;
> else
> - coef = (int)cbrt_tab[coef];
> + coef = (int)cbrt_tab[coef].i;
> coefs[i] = coef;
> }
> }
> diff --git a/libavcodec/aacdec_template.c b/libavcodec/aacdec_template.c
> index d819958..1380510 100644
> --- a/libavcodec/aacdec_template.c
> +++ b/libavcodec/aacdec_template.c
> @@ -1791,7 +1791,7 @@ static int decode_spectrum_and_dequant(AACContext *ac, INTFLOAT coef[1024],
> v = -v;
> *icf++ = v;
> #else
> - *icf++ = cbrt_tab[n] | (bits & 1U<<31);
> + *icf++ = cbrt_tab[n].i | (bits & 1U<<31);
> #endif /* USE_FIXED */
> bits <<= 1;
> } else {
> diff --git a/libavcodec/cbrt_tablegen.h b/libavcodec/cbrt_tablegen.h
> index 59b5a1d..e3d6634 100644
> --- a/libavcodec/cbrt_tablegen.h
> +++ b/libavcodec/cbrt_tablegen.h
> @@ -26,14 +26,13 @@
> #include <stdint.h>
> #include <math.h>
> #include "libavutil/attributes.h"
> +#include "libavutil/intfloat.h"
> #include "libavcodec/aac_defines.h"
>
> -#if USE_FIXED
> -#define CBRT(x) lrint((x).f * 8192)
> -#else
> -#define CBRT(x) x.i
> -#endif
> -
> +union ff_int32float64 {
> + uint32_t i;
> + double f;
> +};
> #if CONFIG_HARDCODED_TABLES
> #if USE_FIXED
> #define cbrt_tableinit_fixed()
> @@ -43,20 +42,42 @@
> #include "libavcodec/cbrt_tables.h"
> #endif
> #else
> -static uint32_t cbrt_tab[1 << 13];
> +static union ff_int32float64 cbrt_tab[1 << 13];
this doubles the size of the cpu cache needed at runtime to store
the same number of elements
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
it is not once nor twice but times without number that the same ideas make
their appearance in the world. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160108/c2ae4772/attachment.sig>
More information about the ffmpeg-devel
mailing list