[FFmpeg-devel] [PATCH] lavu/lfg: add 64 bit random number generator

Ganesh Ajjanagadde gajjanag at gmail.com
Tue Mar 15 23:19:14 CET 2016


On Tue, Mar 15, 2016 at 12:10 PM, Michael Niedermayer
<michael at niedermayer.cc> wrote:
> On Mon, Mar 14, 2016 at 08:42:32PM -0400, Ganesh Ajjanagadde wrote:
>> On Sun, Mar 13, 2016 at 11:08 PM, Michael Niedermayer
>> <michael at niedermayer.cc> wrote:
>> > On Sun, Mar 13, 2016 at 07:12:50PM -0400, Ganesh Ajjanagadde wrote:
>> >> This is based on the relatively well known xorshift128+ of Sebastiano
>> >> Vigna (https://en.wikipedia.org/wiki/Xorshift) that performs very well on the
>> >> BigCrush suite, is very efficient, and is thus used by a number of
>> >> clients: http://xorshift.di.unimi.it/ (see Introduction).
>> >>
>> >> Moreover, the implementation is in the public domain:
>> >> http://xorshift.di.unimi.it/xorshift128plus.c.
>> >>
>> >> Concretely, it is nearly as fast as av_lfg_get (which only returns 32 bits),
>> >> and has a much smaller cache (128 bits). Thus, the timings should also
>> >> be more stable.
>> >>
>> >> This is needed because av_lfg_get<<32 | av_lfg_get is far slower, and
>> >> likely less random as measured by BigCrush - most LFG's perform
>> >> quite poorly on the BigCrush suite:
>> >> http://www6.tw.freebsd.org/distfiles/testu01.pdf.
>> >> In particular, FFmpeg's seems to be Ran055 in the paper, see pg31.
>> >>
>> >> Sample benchmark (Haswell, GCC + -march=native):
>> >>   23200 decicycles in 624 calls of av_lfg_get,       1 runs,      0 skips
>> >>   23040 decicycles in 624 calls of av_lfg_get,       2 runs,      0 skips
>> >>   22810 decicycles in 624 calls of av_lfg_get,       4 runs,      0 skips
>> >> [...]
>> >>   19884 decicycles in 624 calls of av_lfg_get,   65532 runs,      4 skips
>> >>   19880 decicycles in 624 calls of av_lfg_get,  131067 runs,      5 skips
>> >>   19884 decicycles in 624 calls of av_lfg_get,  262136 runs,      8 skips
>> >>   19877 decicycles in 624 calls of av_lfg_get,  524276 runs,     12 skips
>> >>
>> >>   25380 decicycles in 624 calls of av_rand64_get,       1 runs,      0 skips
>> >>   28560 decicycles in 624 calls of av_rand64_get,       2 runs,      0 skips
>> >>   30112 decicycles in 624 calls of av_rand64_get,       4 runs,      0 skips
>> >> [...]
>> >>   22627 decicycles in 624 calls of av_rand64_get,   65536 runs,      0 skips
>> >>   22625 decicycles in 624 calls of av_rand64_get,  131072 runs,      0 skips
>> >>   22625 decicycles in 624 calls of av_rand64_get,  262143 runs,      1 skips
>> >>   22624 decicycles in 624 calls of av_rand64_get,  524286 runs,      2 skips
>> >>
>> >> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
>> >> ---
>> >>  libavutil/lfg.c | 33 ++++++++++++++++++++++++++++++++-
>> >>  libavutil/lfg.h | 26 ++++++++++++++++++++++++++
>> >>  2 files changed, 58 insertions(+), 1 deletion(-)
>> >
>> > why do you add this code to lfg.* (Lagged Fibonacci Generator)?
>> > its not a lfg
>>
>> I just wanted to lay out the implementation, it was easier for me to
>> test and compare that way. Also, don't really know if it should be
>> public or private. I really felt that all random number stuff should
>> go into e.g lavu/rand.h, right now it is quite scattered across
>> random_seed.c, lfg.h (which is not great from a discoverability
>> standpoint).
>>
>> Anyway, if you are flexible about where it could go, I will resubmit
>> as lavu/rand.h, lavu/rand.c containing this 64 bit gen. Is this fine
>> with you?
>>
>> Some day or the other this can be cleaned up into e.g a single header,
>> but that maybe best handled later.
>>
>> >
>> > also the LFG could be trivially extended/changed to 64bit if one wants
>> > only needs uint64_t being used
>>
>> I can believe the extendability, but note that I do not know about the
>> quality of such generators. The linked TestU01 paper did not seem too
>> happy with the additive lagged Fibonacci generators, and instead
>> favored multiplicative ones like av_mlfg_get.
>>
>> >
>> > also theres av_mlfg_get() which passes all tests, though slower of
>> > course
>>
>> Are you sure it passes all of BigCrush? What test suite are you referring to?
>
> http://www6.tw.freebsd.org/distfiles/testu01.pdf
> it lists several multiplicative LFGs they all pass all tests
> See "Table I. Results of test batteries applied to well-known RNGs"
> unless i misunderstand
>
> of course all generators of this class are expected to have defects
> in the least significant bits if one searches hard enough.
> I assumed that these tests dont search hard enough
> also every PRNG fails a test otherwise it wouldnt be a PRNG but a
> RNG ...
>
> that said, i would be somewhat surprised if the more significant side
> of the mlfg values fail many tests that arent designed specifically
> for  mlfgs
>
>
>>
>> >
>> > and does xorshift128+ really pass all tests ?
>>
>> This was a claim on wikipedia that is I believe incorrect,
>
> wikipedia ... sigh
>
>
>> Vigna
>> himself makes no such absolute claims. All he claims is (from
>> http://xorshift.di.unimi.it/xorshift128plus.c):
>> "
>>
>> /* This is the fastest generator passing BigCrush without
>>    systematic failures, but due to the relatively short period it is
>>    acceptable only for applications with a mild amount of parallelism;
>>    otherwise, use a xorshift1024* generator. */
>> "
>> As such, he does not rule out all possible failures.
>> In fact, his page http://xorshift.di.unimi.it/#quality suggest certain failures.
>>
>>  I therefore did not mention explicit absolute claims in the message either.
>
> I would have been very surprised if this basic generator didnt have
> defects in stronger tests.
> the LSBs should fail basic linearity tests
>
> IIUC/IIRC marsaglia suggested to use the xorshift with a multipliction
> as non linear step. If thats done id have little doubt in the generator
> but i didnt read the xorshift papers so i might be half wrong on some
> things ...
>
> with just an addition as non linear step i have a odd feeling.
> This is all no doubt purely academic and xorshift128+ is likely
> more then good enough for us. It just feels a bit "weak" to have a
> purely linear generator that is scrabled by a simple addition.
> also iam thinking a bit here in relation to a LFG which too uses
> just an addition and is weak too.
>
> i think what iam trying to say is that xorshift128plus isnt faster
> than the LFG and while its maybe better its not much better

It is definitely better than the current av_lfg_get() << 32 |
av_lfg_get in the context of Gaussian number generation. I noticed
this while running the Anderson Darling tests (patch 3). Of course
this does not mean a whole lot, ultimately there will always exist
some number of samples for which a Gaussian test will fail.
Nevertheless, if we want good, fast Gaussian samples, I would like
this in. But see also the thread, atomnuker has other ideas, Derek is
not too happy, etc.

>
> about the performance comparission.
> LFGs should be optimizeable in SIMD, is that worth it i dont know

Doubt it, if one needs that level of performance, I doubt once is
terribly interested in quality at that point and can simply use the
suggested fast alternatives.

>
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Breaking DRM is a little like attempting to break through a door even
> though the window is wide open and the only thing in the house is a bunch
> of things you dont want and which you would get tomorrow for free anyway
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>


More information about the ffmpeg-devel mailing list