RandStream generators & dieharder validation suite?

Hi,
Has anyone investigated the various MATLAB RandStream generator algorithms (see table in middle of page) using the dieharder validation suite? I'm seeking to generate large numbers (in the millions) of exceptionally uniformly random values & would appreciate seeing the detailed tabular output that dieharder provides.
I'm not super-handy with Visual Studio, but can pull off the basics. I would like to build the dieharder project myself, but as far as I can tell it seems to be targeted towards linux. Is this correct? Has anyone successfully modified the project for Visual Studio? If so, do you have general results you could share please about how any of the MATLAB random number generators fare?
I understand that my question relates in large part to a 3rd party product (Visual Studio) & thus isn't eligible for help from MATLAB support. Additionally, perhaps only a small fraction of MATLAB users are experienced with linux, Windows, & Visual Studio.
Obviously, there are other external venues where I can pose this same question. However, given the extensive MATLAB support for random number generation & numerous discussion threads about RNG, I thought I'd give my question a shot here first, hoping to find an interested & expert cross-platform compiler user.
Knowing that offering something especially real in return for help often gets faster results, I'm up for negotiating (via private message) a fair Paypal fee if s/o can provide step-by-step instructions for how to get the latest dieharder distribution to build in VS 2010. Of course, freely-offered help is always appreciated, but not expected in this case.
Due to the complexity of the issue & potential for a confusing thread, I would like to ask please that you make sure to verify a successful build at your end before posting any super-lengthy Answers.
However I get this answered, either here or via some external site, & whether by a helpful altruist or a needy graduate student, I will make sure all relevant information is posted here: a complete step-by-step solution & my test results for 2-3 chosen RandStream generators.
Thanks, Brad

3 comentarios

Jan
Jan el 13 de En. de 2013
The nature of a public forum is to share solutions with the community. A discussion is a part of finding a solution, therefore I would not see this a cluttering. The offering of a fee does not encourage me to answer and I do not think that this should happen more often.
Hi Jan,
Thanks for your comments. I'm sorry to have offended your sense of what is an appropriate posting on the MATLAB Answers site. I have gone back & extensively edited my Question, to hopefully make you feel better about seeing it on the site, as well as to clarify that all results of general interest will be posted.
Regarding your criticism of my offer of fee-for-service: as far as I can tell, this site does not prohibit offering renumeration for assistance. I don't think I have been crass about it & I have tried to show sensitivity in this regard.
Brad
Jan
Jan el 14 de En. de 2013
Dear Brad, without doubt you were very clear about your intention to share the results. You have neither been offending not rude. Offering a fee is polite, legal and fair. I do not not want to discourage you to pay anybody who assists you to solve your problem. Therefore I have no reasons to criticize the contents or tone of your question.
I got too many personal messages of cheaters, who offered some dollars for solving their homework. In opposite to this your question has a obviously a serious background. But the public appearance of money can have a bad influence to a forum, which lives from voluntary contributors. Therefore I've written, that I personally do not want this to happen more often, but not, that it should not happen at all.
In another Matlab forum there is a specific category for payed programming or assistance jobs. Unfortunately in this category about 20% of the threads must be deleted, because they violate the forum policies.
I hope my opinion got clearer now.

Iniciar sesión para comentar.

 Respuesta aceptada

Jan
Jan el 13 de En. de 2013
Editada: Jan el 13 de En. de 2013

0 votos

Asking your favorite search engine would reveal some useful instructions in the net, e.g.:
Reading the instructions in the 2nd link are important: While compiling and running DIEHARDER is more or less easy, interpreting the results is very hard science. As long as all pseudo-random-number-generators are deterministic, tests like DIEHARD and DIEHARDER can check the entropy level only.
If you need good random numbers, true random numbers are strongly recommended:
The underlying service at www.random.org is limited, see quota. Therefore getting "millions" of numbers might be either take some time (days!), or you must pay for it. Another idea is to inflate the true random numbers by using them as frequently changing seeds for your pseudo-RNG. But in this case, testing the results by DIEHARDER is a good idea again.
Creating a true-RNG hardware at home is not very hard: One idea was to let an USB camera record a lava lamp and build differences between subsequent images to obtain random bits caused by noise. In a further step you can even omit the lava lamp and use a camera which create more noisy output for darker images and stick a black sheet of paper in front of the lens. Much more detailed instructions can be found by an internet research again.

3 comentarios

Hi Jan,
Thank you for your effort in researching & posting basic information on my query, which I'm sure may be helpful to others here. Regarding the first two links you provided ("useful instructions"), I'm sorry but I'm not interested in using cygwin; & Prof. Brown's PDF does not refer to Visual Studio, nor indeed to the build process at all (as it's documented elsewhere on his web page).
I'm familiar with random.org & the general issue of physical-vs-pseudo random numbers, but your point about measurement of entropy is well-taken, thanks.
Regretfully, I can't accept your Answer. I'm still holding out for VS support, as originally requested ;)
Jan
Jan el 14 de En. de 2013
As you found out already, migrating the DIEHARDER suite to MSVC is not trivial. Installing cygwin or even Linux would be easier and it has been tested already. The same matters DIEHARD and TESTU01 also. So of course this answer cannot be accepted, but perhaps it motivates you to keep alternatives in mind.
After lengthy & convincing discussion with Jan (see comments below), I realized that he's right: porting the dieharder project to MSVC would be a very poor use of time, especially given the limited & occasional dieharder use I imagine for myself.
Jan's Answer is to learn the basics of Ubuntu Linux & then run the dieharder binary directly.
I will create test data in Windows & copy to a USB flash drive for access under Ubuntu.

Iniciar sesión para comentar.

Más respuestas (2)

Peter Perkins
Peter Perkins el 14 de En. de 2013

0 votos

Brad, if your goal is to run Dieharder, I can't help.
If your goal is to verify that the generators in MATLAB pass stringent tests of randomness, then you'll find that L'Ecuyer and Simard published a paper a few years back that includes results for their TestU01 suite on a wide variety of generators, including mt19937ar, mrg32k3a, and mlfg6331, the recommended current generators in MATLAB.
Hope this helps.

6 comentarios

Hi Peter,
Thank you very much for the reference to L'Ecuyer & Simard ("L & S"), very appreciated! It should be mentioned that there's a $15 charge by ACM for access to their article, unless one is a member, which I'm not. In any case, the free PDF offered by the authors is quite interesting & worth taking a look at.
You may have noted in the free PDF that the authors critique the original diehard suite, which Brown stated was a precursor & inspiration for dieharder (the suite I'm interested in). Weighing out L & S vs. dieharder may be a challenge for me. I note that you work at TMW & are presumably unable to get further involved on a person-to-person basis.
Ideally, I'd like to find s/o of high technical skill to (a) assist in getting dieharder & perhaps this other suite you mention ("TestU01") by L & S to compile in MSVC 2010, & (b) hopefully discuss some of the statistical results, at least to some superficial degree, which I can then summarize here in this thread.
As Jan pointed out in his comment, the philosophy, semantics & interpretation of both of these suite tests may present far greater difficulty to the user than the basic mechanics of getting them to run. You're very kind, Peter, to point me to a summary-level presentation on the relative merits of the RandStream generators, & thank you again for that. Time permitting, I really still would like to get one of the suites running "on my set-up", just to experience it all for myself..
Brad
Jan
Jan el 14 de En. de 2013
I see, that you have a strong scientific interest in testing the RNGs. Although others have investigated the algorithms already, further research is always a challenge, because there have been enough bugs in the implementations.
But why do you prefer MSVC 2010 for this? Die test suites have been developed under Linux, which is cheap, fast to install and easy to use. A migration of the OS and the compiler is a source of errors, which could be avoided. I'd expect, that the program is faster when compiled by MSVC than my MINGW, but if processing time matters, the Intel compilers would be even better. Nevertheless, the time required for porting and testing the source code will surely exceed the runtime.
Hi Jan, thank you for your insightful question. I'm embarrassed to say, I've been insisting on MSVC 2010 because that's the only compiler I know how to use! Speaking honestly, I've always hated the compiler-driven coding process, & have been working strictly in MATLAB for several years with tremendous productivity gains.
The only reason I even have MSVC is to prepare for my antipated purchase of MATLAB Coder before long, which I'll use to create optimized code that I can call from MATLAB.
But maybe you're right, if the learning curve isn't too awful for cygwin, I guess I could give it a shot. OTOH, wasting 5-10 hours figuring out a compiler that I will likely never use again would be a poor choice, compared to paying a smart grad student to spend 2-3 hours preparing me a turnkey solution.
By necessity, I have to be extremely careful with my time commitments. Linux is not at all an option for this reason. While it's highly interesting viewed "from the outside" (e.g. reading about it on Wikipedia, out of curiosity), actually getting into real Linux usage would be a "dangerous" & unnecessary distraction for me. I am able to & "have to" accomplish quite a lot every single day with my Windows-based toolset. I generally work 7 days per week as it is!
These are the trade-offs as I understand them..
Brad
Jan
Jan el 15 de En. de 2013
Thanks for you answer, Brad. I can reconsider the arguments.
You can try this: Boot the machine from a Linux live CD, e.g. an Ubuntu. Ask a student how to install the DIEHARDER rpm. As far as I can see, you get pre-compiled binaries directly, otherwise a compilation is straight. Run DIEHARDER. I estimate this will cost you 10 minutes (again: 20 in real life).
Migrating DIEHARDER successfully and reliably to MSVC will costs at least one week, this means two weeks in the real life. The compilers have a lot of tiny but evil differences - in opposite to Matlab driven on different platforms. Therefore I do not think that it is only a problem of creating an MSVC project from a make file, but the code itself will require modifications and an exhaustive testing and validation afterwards. Because any software above a certain size contains bugs, any changes might reveal some. Afterwards it is questionable, if fixing the bug causes other problems far away from the concerned code lines. It is extremely time-consuming to investigate this exhaustively and this is, in my opinion, the reason, why there is no description in the net how to compile DIEHARDER in MSVC. Linux booted from a live CD or installed in a virtual machine is much cheaper (with respect to your time-commitment) and more reliable.
On the other hand I do not want to support that you are working to death. Perhaps your decision to port it to MSVC has a much deeper sense, because it takes weeks, not although.
Hi Jan,
Thanks for your considered estimates & very constructive recommendations. Based on your specific step-by-step instructions for doing this on Linux, I will now accept your answer as a "better alternative" to what I asked for, because the logic of what you're saying is finally so completely obvious, even I can see it ;)
Thanks also for your concern about my state of mind & work-life balance. Please don't worry, things are going pretty well for me. Even though I do work 7 days per week, it's generally on my own schedule. I take frequent breaks & spend a good amount of time enjoying life. I do greatly enjoy coding & analysis though & feel very fortunate that I've found a great niche for myself where I can be productive & happy.
As far as this particular mini-project, I'm finally "connecting the dots" & understanding how easy this will be for me : simply find s/o who's comfortable & competent in Linux & just provide him/her the instructions & the data to test. Wow, brilliant! Thanks, Jan :)
I'll report back with results in the next several weeks, as it all comes together..
Brad
Jan,
I forgot to ask: would you mind please editing your original Answer or submitting a new one, with your rationale that a better idea is to run the dieharder distribution as-is under Ubuntu..? This way, readers can see straightaway what the discussion conclusion was. Alternatively, or in addition, do you think I should add an "Update" section to my Question, briefly summarizing our discussion?
Thanks, Brad

Iniciar sesión para comentar.

Jan Pospisil
Jan Pospisil el 25 de Feb. de 2013

0 votos

One of my student tested the generators in his bachelor thesis, he used the generators in Matlab as well as the true random generator /dev/urandom in linux systems, then he exported the numbers for dieharder and run all the dieharder tests. If you are interested, I can send you the PDF.

Categorías

Más información sobre Random Number Generation en Centro de ayuda y File Exchange.

Productos

Preguntada:

el 13 de En. de 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by