[FFmpeg-devel] [PATCH] lavu/lfg: add 64 bit random number generator
dserpell at gmail.com
Tue Mar 15 21:10:57 CET 2016
El Mon, Mar 14, 2016 at 08:42:32PM -0400, Ganesh Ajjanagadde escribio:
> On Sun, Mar 13, 2016 at 11:08 PM, Michael Niedermayer
> <michael at niedermayer.cc> wrote:
> > On Sun, Mar 13, 2016 at 07:12:50PM -0400, Ganesh Ajjanagadde wrote:
> >> This is based on the relatively well known xorshift128+ of Sebastiano
> >> Vigna (https://en.wikipedia.org/wiki/Xorshift) that performs very well on the
> >> BigCrush suite, is very efficient, and is thus used by a number of
> >> clients: http://xorshift.di.unimi.it/ (see Introduction).
> >> Moreover, the implementation is in the public domain:
> >> http://xorshift.di.unimi.it/xorshift128plus.c.
> >> Concretely, it is nearly as fast as av_lfg_get (which only returns 32 bits),
> >> and has a much smaller cache (128 bits). Thus, the timings should also
> >> be more stable.
> >> This is needed because av_lfg_get<<32 | av_lfg_get is far slower, and
> >> likely less random as measured by BigCrush - most LFG's perform
> >> quite poorly on the BigCrush suite:
> >> http://www6.tw.freebsd.org/distfiles/testu01.pdf.
> >> In particular, FFmpeg's seems to be Ran055 in the paper, see pg31.
> >> Sample benchmark (Haswell, GCC + -march=native):
> >> 23200 decicycles in 624 calls of av_lfg_get, 1 runs, 0 skips
> >> 23040 decicycles in 624 calls of av_lfg_get, 2 runs, 0 skips
> >> 22810 decicycles in 624 calls of av_lfg_get, 4 runs, 0 skips
> >> [...]
> >> 19884 decicycles in 624 calls of av_lfg_get, 65532 runs, 4 skips
> >> 19880 decicycles in 624 calls of av_lfg_get, 131067 runs, 5 skips
> >> 19884 decicycles in 624 calls of av_lfg_get, 262136 runs, 8 skips
> >> 19877 decicycles in 624 calls of av_lfg_get, 524276 runs, 12 skips
> >> 25380 decicycles in 624 calls of av_rand64_get, 1 runs, 0 skips
> >> 28560 decicycles in 624 calls of av_rand64_get, 2 runs, 0 skips
> >> 30112 decicycles in 624 calls of av_rand64_get, 4 runs, 0 skips
> >> [...]
> >> 22627 decicycles in 624 calls of av_rand64_get, 65536 runs, 0 skips
> >> 22625 decicycles in 624 calls of av_rand64_get, 131072 runs, 0 skips
> >> 22625 decicycles in 624 calls of av_rand64_get, 262143 runs, 1 skips
> >> 22624 decicycles in 624 calls of av_rand64_get, 524286 runs, 2 skips
> >> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
> >> ---
> >> libavutil/lfg.c | 33 ++++++++++++++++++++++++++++++++-
> >> libavutil/lfg.h | 26 ++++++++++++++++++++++++++
> >> 2 files changed, 58 insertions(+), 1 deletion(-)
> > why do you add this code to lfg.* (Lagged Fibonacci Generator)?
> > its not a lfg
> I just wanted to lay out the implementation, it was easier for me to
> test and compare that way. Also, don't really know if it should be
> public or private. I really felt that all random number stuff should
> go into e.g lavu/rand.h, right now it is quite scattered across
> random_seed.c, lfg.h (which is not great from a discoverability
> Anyway, if you are flexible about where it could go, I will resubmit
> as lavu/rand.h, lavu/rand.c containing this 64 bit gen. Is this fine
> with you?
> Some day or the other this can be cleaned up into e.g a single header,
> but that maybe best handled later.
> > also the LFG could be trivially extended/changed to 64bit if one wants
> > only needs uint64_t being used
> I can believe the extendability, but note that I do not know about the
> quality of such generators. The linked TestU01 paper did not seem too
> happy with the additive lagged Fibonacci generators, and instead
> favored multiplicative ones like av_mlfg_get.
> > also theres av_mlfg_get() which passes all tests, though slower of
> > course
> Are you sure it passes all of BigCrush? What test suite are you referring to?
> > and does xorshift128+ really pass all tests ?
> This was a claim on wikipedia that is I believe incorrect, Vigna
> himself makes no such absolute claims. All he claims is (from
> /* This is the fastest generator passing BigCrush without
> systematic failures, but due to the relatively short period it is
> acceptable only for applications with a mild amount of parallelism;
> otherwise, use a xorshift1024* generator. */
> As such, he does not rule out all possible failures.
> In fact, his page http://xorshift.di.unimi.it/#quality suggest certain failures.
> I therefore did not mention explicit absolute claims in the message either.
> > what if the bits are reversed so that the least significant and most
> > significant are exchanged ?
> > the text seems unclear about that
> Not sure exactly which text you are referring to, I did not look at
> this in too much depth because it is claimed on the page that Chrome,
> Firefox, and others use it - I doubt they would have hunted down and
> used this if this was bad.
> > anyway, iam fine with the addition of xorshift128plus but please
> > put it in a different file
> > above questions are more due to curiousity and not a objection
> Thanks, unfortunately I am not a PRNG expert and have not dug too
> deeply into this.
I normally use the PRNG from Bob Jenkins,
In my tests, this generator is extremely fast and passes all of TestU01.
Note that it you want a really fast implementation, you need to generate
a table of numbers (256 numbers is a good amount) and then return the
numbers from the table on subsequent calls until depleted.
And if you want an SSE2 version, there is one available inside pixman
sources (MIT license):
(probably needs conversion to ASM to include inside ffmpeg).
More information about the ffmpeg-devel