[FFmpeg-devel] [PATCH] lavu/lfg: add 64 bit random number generator
gajjanag at gmail.com
Tue Mar 15 23:19:14 CET 2016
On Tue, Mar 15, 2016 at 12:10 PM, Michael Niedermayer
<michael at niedermayer.cc> wrote:
> On Mon, Mar 14, 2016 at 08:42:32PM -0400, Ganesh Ajjanagadde wrote:
>> On Sun, Mar 13, 2016 at 11:08 PM, Michael Niedermayer
>> <michael at niedermayer.cc> wrote:
>> > On Sun, Mar 13, 2016 at 07:12:50PM -0400, Ganesh Ajjanagadde wrote:
>> >> This is based on the relatively well known xorshift128+ of Sebastiano
>> >> Vigna (https://en.wikipedia.org/wiki/Xorshift) that performs very well on the
>> >> BigCrush suite, is very efficient, and is thus used by a number of
>> >> clients: http://xorshift.di.unimi.it/ (see Introduction).
>> >> Moreover, the implementation is in the public domain:
>> >> http://xorshift.di.unimi.it/xorshift128plus.c.
>> >> Concretely, it is nearly as fast as av_lfg_get (which only returns 32 bits),
>> >> and has a much smaller cache (128 bits). Thus, the timings should also
>> >> be more stable.
>> >> This is needed because av_lfg_get<<32 | av_lfg_get is far slower, and
>> >> likely less random as measured by BigCrush - most LFG's perform
>> >> quite poorly on the BigCrush suite:
>> >> http://www6.tw.freebsd.org/distfiles/testu01.pdf.
>> >> In particular, FFmpeg's seems to be Ran055 in the paper, see pg31.
>> >> Sample benchmark (Haswell, GCC + -march=native):
>> >> 23200 decicycles in 624 calls of av_lfg_get, 1 runs, 0 skips
>> >> 23040 decicycles in 624 calls of av_lfg_get, 2 runs, 0 skips
>> >> 22810 decicycles in 624 calls of av_lfg_get, 4 runs, 0 skips
>> >> [...]
>> >> 19884 decicycles in 624 calls of av_lfg_get, 65532 runs, 4 skips
>> >> 19880 decicycles in 624 calls of av_lfg_get, 131067 runs, 5 skips
>> >> 19884 decicycles in 624 calls of av_lfg_get, 262136 runs, 8 skips
>> >> 19877 decicycles in 624 calls of av_lfg_get, 524276 runs, 12 skips
>> >> 25380 decicycles in 624 calls of av_rand64_get, 1 runs, 0 skips
>> >> 28560 decicycles in 624 calls of av_rand64_get, 2 runs, 0 skips
>> >> 30112 decicycles in 624 calls of av_rand64_get, 4 runs, 0 skips
>> >> [...]
>> >> 22627 decicycles in 624 calls of av_rand64_get, 65536 runs, 0 skips
>> >> 22625 decicycles in 624 calls of av_rand64_get, 131072 runs, 0 skips
>> >> 22625 decicycles in 624 calls of av_rand64_get, 262143 runs, 1 skips
>> >> 22624 decicycles in 624 calls of av_rand64_get, 524286 runs, 2 skips
>> >> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
>> >> ---
>> >> libavutil/lfg.c | 33 ++++++++++++++++++++++++++++++++-
>> >> libavutil/lfg.h | 26 ++++++++++++++++++++++++++
>> >> 2 files changed, 58 insertions(+), 1 deletion(-)
>> > why do you add this code to lfg.* (Lagged Fibonacci Generator)?
>> > its not a lfg
>> I just wanted to lay out the implementation, it was easier for me to
>> test and compare that way. Also, don't really know if it should be
>> public or private. I really felt that all random number stuff should
>> go into e.g lavu/rand.h, right now it is quite scattered across
>> random_seed.c, lfg.h (which is not great from a discoverability
>> Anyway, if you are flexible about where it could go, I will resubmit
>> as lavu/rand.h, lavu/rand.c containing this 64 bit gen. Is this fine
>> with you?
>> Some day or the other this can be cleaned up into e.g a single header,
>> but that maybe best handled later.
>> > also the LFG could be trivially extended/changed to 64bit if one wants
>> > only needs uint64_t being used
>> I can believe the extendability, but note that I do not know about the
>> quality of such generators. The linked TestU01 paper did not seem too
>> happy with the additive lagged Fibonacci generators, and instead
>> favored multiplicative ones like av_mlfg_get.
>> > also theres av_mlfg_get() which passes all tests, though slower of
>> > course
>> Are you sure it passes all of BigCrush? What test suite are you referring to?
> it lists several multiplicative LFGs they all pass all tests
> See "Table I. Results of test batteries applied to well-known RNGs"
> unless i misunderstand
> of course all generators of this class are expected to have defects
> in the least significant bits if one searches hard enough.
> I assumed that these tests dont search hard enough
> also every PRNG fails a test otherwise it wouldnt be a PRNG but a
> RNG ...
> that said, i would be somewhat surprised if the more significant side
> of the mlfg values fail many tests that arent designed specifically
> for mlfgs
>> > and does xorshift128+ really pass all tests ?
>> This was a claim on wikipedia that is I believe incorrect,
> wikipedia ... sigh
>> himself makes no such absolute claims. All he claims is (from
>> /* This is the fastest generator passing BigCrush without
>> systematic failures, but due to the relatively short period it is
>> acceptable only for applications with a mild amount of parallelism;
>> otherwise, use a xorshift1024* generator. */
>> As such, he does not rule out all possible failures.
>> In fact, his page http://xorshift.di.unimi.it/#quality suggest certain failures.
>> I therefore did not mention explicit absolute claims in the message either.
> I would have been very surprised if this basic generator didnt have
> defects in stronger tests.
> the LSBs should fail basic linearity tests
> IIUC/IIRC marsaglia suggested to use the xorshift with a multipliction
> as non linear step. If thats done id have little doubt in the generator
> but i didnt read the xorshift papers so i might be half wrong on some
> things ...
> with just an addition as non linear step i have a odd feeling.
> This is all no doubt purely academic and xorshift128+ is likely
> more then good enough for us. It just feels a bit "weak" to have a
> purely linear generator that is scrabled by a simple addition.
> also iam thinking a bit here in relation to a LFG which too uses
> just an addition and is weak too.
> i think what iam trying to say is that xorshift128plus isnt faster
> than the LFG and while its maybe better its not much better
It is definitely better than the current av_lfg_get() << 32 |
av_lfg_get in the context of Gaussian number generation. I noticed
this while running the Anderson Darling tests (patch 3). Of course
this does not mean a whole lot, ultimately there will always exist
some number of samples for which a Gaussian test will fail.
Nevertheless, if we want good, fast Gaussian samples, I would like
this in. But see also the thread, atomnuker has other ideas, Derek is
not too happy, etc.
> about the performance comparission.
> LFGs should be optimizeable in SIMD, is that worth it i dont know
Doubt it, if one needs that level of performance, I doubt once is
terribly interested in quality at that point and can simply use the
suggested fast alternatives.
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> Breaking DRM is a little like attempting to break through a door even
> though the window is wide open and the only thing in the house is a bunch
> of things you dont want and which you would get tomorrow for free anyway
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
More information about the ffmpeg-devel