Cryptographically-secure pseudorandom number generator seed - encryption

Do we need to seed a CSPRNG with a truly random number? Examples and documentation use truly random numbers, but no justification is given (that I can find).
If we were to seed one with a pseudorandom number, I don't see what the difference would be compared to with a truly random seed. If someone finds either of the seeds, then the encryption fails anyway.

You are correct, but the idea is that it's easier to find a successive pseudorandom seed than it is to find a truly random seed. This is especially true if a lot of numbers are generated in succession really quickly (and if the seed ever changes, which it usually does).

Essentially, determining the seed is sufficient to determine the entire output of a pseudorandom generator.
As a result, you want a seed that isn't predictable or determinable.
Pseudorandom output is (under some circumstances as described two paragraphs ago) determinable or predictable.
Beyond that, it is a trade-off. You've already decided to use pseudorandom numbers instead of real randomness, so it is probably an acceptable trade-off in your mind.

Related

Effect of setting seeds on an algorithm

I am writing an R code where, I am using set.seed() function in the whole program to generate the data and then using it in a function , ultimately plotting the function and then using optim to get the minima. But now the issue is the graphs of the function changes if I change the seed value and sometimes doesn't even produce a concave graph but an exponential graph.
I am not able to understand why this is happening and how I can fix it. If anyone can provide me with any reference to read in this subject or any suggestions as to what can be done, that will be great.
Thanks in advance
set.seed() configures the random number generator to start from that seed. This may be a bit more complicated, depending on the precise implementation, but the effects are always the same: The sequence of numbers will be identical.
This is useful in a number of applications where you want some randomness, but you want to get the same result if you re-run the code. Say for example you need to randomly sample your data, but since you are debugging, it's useful if you get the same sample so that the bugs don't disappear on you.
Also if you want other people to replicate the results, you simply pick some random number as the seed and tell them that you used that seed. Anything in the algorithm based on random numbers will behave the same because you are both using the same sequence of numbers.
For your graph problem you need to share some code so that people understand what you are doing. It's very hard to guess what went wrong. At the outset it seems that you algorithm is very strongly influenced by the random numbers (usually not a good sign).
In simple, if you set a seed, and extract a random number, the random number will be always the same. If you not set a seed, every time you choose a number the number will be different. The seed permit you to replicate your experiment.

How correlated are i.d.d. normal numbers in julia

I noticed while doing numerical simulations a pattern in my data when I use normal numbers in julia.
I have an ensemble of random matrices. In order to do my calculations reproducible, I set the srand function per-realization. That is, each time I use the function randn(n,n) I initialize it with srand(j), where j is the number of the realization.
I would like to know how the normal numbers are generated, and if it has meaning that doing what I do, I introduce accidental correlations.
Ideally, not at all. If you have any counterexamples, please file them as bugs on the Julia issue tracker. Julia uses the state-of-the-art Mersenne Twister library, dSFMT. This library is very fast and has been considered to use best practices for pseudo-random number generation. It has, however, recently come to my attention that there may be subtle statistical issues with PRNGs like MT in general – in particular with using small, consecutive seed values. To mitigate this if you're really worried about potential correlations, you could do something like this:
julia> using SHA
julia> srand(reinterpret(UInt32,sha256(string(1))))
MersenneTwister(UInt32[0x73b2866b,0xe1fc34ff,0x4e806b9d,0x573f5aff,0xeaa4ad47,0x491d2fa2,0xdd521ec0,0x4b5b87b7],Base.dSFMT.DSFMT_state(Int32[660235548,1072895699,-1083634456,1073365654,-576407846,1073066249,1877594582,1072764549,-1511149919,1073191776 … -710638738,1073480641,-1040936331,1072742443,103117571,389938639,-499807753,414063872,382,0]),[1.5382,1.36616,1.06752,1.17428,1.93809,1.63529,1.74182,1.30015,1.54163,1.05408 … 1.67649,1.66725,1.62193,1.26964,1.37521,1.42057,1.79071,1.17269,1.37336,1.99576],382)
julia> srand(reinterpret(UInt32,sha256(string(2))))
MersenneTwister(UInt32[0x3a5e73d4,0xee165e26,0x71593fe0,0x035d9b8b,0xd8079c01,0x901fc5b6,0x6e663ada,0x35ab13ec],Base.dSFMT.DSFMT_state(Int32[-1908998566,1072999344,-843508968,1073279250,-1560550261,1073676797,1247353488,1073400397,1888738837,1073180516 … -450365168,1073182597,1421589101,1073360711,670806122,388309585,890220451,386049800,382,0]),[1.5382,1.36616,1.06752,1.17428,1.93809,1.63529,1.74182,1.30015,1.54163,1.05408 … 1.67649,1.66725,1.62193,1.26964,1.37521,1.42057,1.79071,1.17269,1.37336,1.99576],382)
In other words, hash a string representation of a small integer seed value using a strong cryptographic hash like SHA2-256, and use the resulting hash data to seed the Mersenne Twister state. Ottoboni, Rivest & Stark suggest using a strong cryptographic hash for each random number generation, but that's going to be a massive slowdown (on current hardware) and is probably overkill unless you have an application that is really very sensitive to imperfect statistical randomness.
I should perhaps point out that Julia's behavior here is not worse than other languages, some of which use far worse random number generators by default, due to backwards compatibility considerations. This is an very recent research result (not yet published even). The technique I've suggested could be used to mitigate this issue in other languages as well.

High entropy random data creating functions?

Are there functions which produce "infinite" amounts of high entropy data? Moreover, do functions exist which produce the same random data (sequentially) time after time?
I kind of know that they exist, but do they have a specific name?
Use case examples:
Using the function to generate 100 bits of random data. (Great!) But while maintaining high values of entropy.
Using the same function to generate 10000 bits of random data. (The first 100 bits generated are the same as the 100 bits of random data generated before). And while still maintaining high values of entropy
Further, how would I go about building these functions myself?
You are most likely looking for Pseudo-Random Number Generators.
They are initialized by a seed, thus taking in a finite amount of entropy.
Good generators have a decent entropy coming out, supposing you judge it only from its output (thus you ignore the seed and/or the algorithm to generate the numbers, otherwise the entropy is obviously 0).
Most PRNG algorithms produce sequences which are uniformly distributed by any of several tests. It is an open question, and one central to the theory and practice of cryptography, whether there is any way to distinguish the output of a high-quality PRNG from a truly random sequence without knowing the algorithm(s) used and the state with which it was initialized.
All PRNGs have a period, after which a generated sequence will restart.
The period of a PRNG is defined thus: the maximum, over all starting states, of the length of the repetition-free prefix of the sequence. The period is bounded by the number of the states, usually measured in bits. However, since the length of the period potentially doubles with each bit of "state" added, it is easy to build PRNGs with periods long enough for many practical applications.
Thus, to have two sequences of different lengths where one is the prefix of the other, you just have to run a PRNG with the same seed both times.
Building them yourself would be pretty tricky, but a rather good and simple one is the Mersenne Twister, which dates back to only 1998 and defined in a paper by Matsumoto and Nishimura [1].
A trivial example would be a linear congruential generator.
[1] Matsumoto, M.; Nishimura, T. (1998). "Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator". ACM Transactions on Modeling and Computer Simulation 8 (1): 3–30. doi:10.1145/272991.272995.

set.seed() function influence into random in R

Today i first met a set.seed function in R.
It's useful in same times, and i understand how to use it. But i have a small problem - how to choose a real good number as a first parameter in this function?
From that question a get another - how the first parameter from set.seed() function influence into random in R? Maybe if i understand the last, i will take the answer of first.
Thanks a lot.
In a nutshell:
By setting set.seed() you specify the starting-point for all "pseudo random number generators" that create the random numbers in R. See ?set.seed
As computers are very deterministic there is nothing like a real "random number".
Computers always have to use an algorithm to generate so called "pseudo random numbers".
These generators/algorithms work (very often) iterative so the next number is influenced by its predecessor. set.seed() defines the initial predecessor and thereby makes pseudo random numbers reproducible. Which number you choose is irrelevant in most cases.
(see here: http://en.wikipedia.org/wiki/Pseudorandom_number_generator)

Has your pseudo-random number generator (PRNG) ever not been random enough?

Have you ever written simulations or randomized algorithms where you've run into trouble because of the quality of the (pseudo)-random numbers you used?
What was happening?
How did you detect / realize your prng was the problem?
Was switching PRNGs enough to fix the problem, or did you have to switch to a source of true randomness?
I'm trying to figure out what types of applications require one to worry about the quality of their source of randomness and how one realizes when this becomes a problem.
The dated random number generator RANDU was infamous in the seventies for producing "bad" random numbers. My PhD supervisor mentioned that it affected his PhD and he had to rerun simulations. A search on Google for RANDU linear congrunetial generator brings up other examples.
When I run simulations on multiple machines, I've sometimes been tempted to generate "random seeds", rather than just use a proper parallel random number generator. For example, generate the seed using the current time in seconds. This has caused me enough problems that I avoid this at all costs.
This is mainly due to my particular interests, but other than parallel computing, the thought of creating my own random number generator would never cross my mind. Calling a well tested random number function is trivial in most languages.
It is a good practice to run your prng against DieHard. Very good and fast PRNG exist nowadays (see the work of Marsaglia), see Numerical Recipes edition 3 for a good introduction.

Resources