Turning a random number generating loop into a one equation? - math

Here's some pseudocode:
count = 0
for every item in a list
1/20 chance to add one to count
This is more or less my current code, but there could be hundreds of thousands of items in that list; therefore, it gets inefficient fast. (isn't this called like, 0(n) or something?)
Is there a way to compress this into one equation?

Let's look at the properties of the random variable you've described. Quoting Wikipedia:
The binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.
Let N be the number of items in the list, and C be a random variable that represents the count you're obtaining from your pseudocode. C will follow a binomial probability distribution (as shown in the image below), with p = 1/20:
The remaining problem is how to efficently poll a random variable with said probability distribution. There are a number of libraries that allow you to draw samples from random variables with a specified PDF. I've never had to implement it myself, so I don't exactly know the details, but many are open source and you can refer to the implementation for yourself.
Here's how you would calculate count with the numpy library in Python:
n, p = 10, 0.05 # 10 trials, probability of success is 0.05
count = np.random.binomial(n, p) # draw a single sample

Apparently the OP was asking for a more efficient way to generate random numbers with the same distribution this will give. I though the question was how to do the exact same operation as the loop, but as a one liner (and preferably with no temporary list that exists just to be iterated over).
If you sample a random number generator n times, it's going to have at best O(n) run time, regardless of how the code looks.
In some interpreted languages, using more compact syntax might make a noticeable difference in the constant factors of run time. Other things can affect the run time, like whether you store all the random values and then process them, or process them on the fly with no temporary storage.
None of this will allow you to avoid having your run time scale up linearly with n.

Related

what if the FD steps varied w.r.t output/input

I am using the finite difference scheme to find gradients.
Lets say i have 2 outputs (y1,y2) and 1 input (x) in a single component. And in advance I know that the sensitivity of y1 with respect to x is not same as the sensitivity of y2 to x. And thus i could potentially have two different steps for those as in ;
self.declare_partials(of=y1, wrt=x, method='fd',step=0.01, form='central')
self.declare_partials(of=y2, wrt=x, method='fd',step=0.05, form='central')
There is nothing that stops me (algorithmically) but it is not clear what would openmdao gradient calculation exactly do in this case?
does it exchange information from the case where the steps are different by looking at the steps ratios or simply treating them independently and therefore doubling computational time ?
I just tested this, and it does the finite difference twice with the two different step sizes, and only saves the requested outputs for each step. I don't think we could do anything with the ratios as you suggested, as the reason for using different stepsizes to resolve individual outputs is because you don't trust the accuracy of the outputs at the smaller (or large) stepsize.
This is a fair question about the effect of the API. In typical FD applications you would get only 1 function call per design variable for forward and backward difference and 2 function calls for central difference.
However in this case, you have asked for two different step sizes for two different outputs, both with central difference. So here, you'll end up with 4 function calls to compute all the derivatives. dy1_dx will be computed using the step size of .01 and dy2_dx will be computed with a step size of .05.
There is no crosstalk between the two different FD calls, and you do end up with more function calls than you would have if you just specified a single step size via:
self.declare_partials(of='*', wrt=x, method='fd',step=0.05, form='central')
If the cost is something you can bear, and you get improved accuracy, then you could use this method to get different step sizes for different outputs.

How to get seed of current state of random generator in goal to place it in set.seed() function

I have to repeat some statistical procedure based on pseudorandom numbers several times (about 100 000), this procedure is written in pure R. After each step (there are 100 000 steps or call it iterations) I would like to get current state (getting seed would be proper I suppose) of random generator, and after this one step/iteration of procedure I collect only part of the entire output because it's too large to store (it's the value of optimized goal function and a few other statistics ). After inspection of total output (which is 100 000 long) I would like to pick the best solution and run procedure corresponding to it again, for this I need to set the state of random generator which correspond for choosen solution. There is set.seed but getting seed is no straight forward, there is .Random.seed but how could it help with above problem ?
Call set.seed(x) at the beginning of each iteration. Make sure you can identify the seed that was used before you started the process, so that you can use it later. For example:
for (seed in seeds) {
set.seed(i)
print(sprintf('using seed = %d\n', seed))
do_your_stuff(...)
}
In a comment you asked:
how to choose seed in proper manner - shouldn't it be some "random" prime numbers not the simple series of integers (if we talk about vector of containing seeds) ?
I'm not sure how it matters if seeds is simply a sequence (like 1:100) or random prime numbers. As far as I know, any seed number X is just as good as any other Y. But if that's important to you, then you can grab a list of prime numbers from somewhere (for example here) and use sample to randomize them, for example:
seeds <- sample(c(7, 17, 19, 23, 1019, 1021))

Generate random small numbers with a target average

I need to write a function that returns on of the numbers (-2,-1,0,1,2) randomly, but I need the average of the output to be a specific number (say, 1.2).
I saw similar questions, but all the answers seem to rely on the target range being wide enough.
Is there a way to do this (without saving state) with this small selection of possible outputs?
UPDATE: I want to use this function for (randomized) testing, as a stub for an expensive function which I don't want to run. The consumer of this function runs it a couple of hundred times and takes an average. I've been using a simple randint function, but the average is always very close to 0, which is not realistic.
Point is, I just need something simple that won't always average to 0. I don't really care what the actual average is. I may have asked the question wrong.
Do you really mean to require that specific value to be the average, or rather the expected value? In other words, if the generated sequence were to contain an extraordinary number of small values in its initial part, should the rest of the sequence atempt to compensate for that in an attempt to get the overall average right? I assume not, I assume you want all your samples to be computed independently (after all, you said you don't want any state), in which case you can only control the expected value.
If you assign a probability pi for each of your possible choices, then the expected value will be the sum of these values, weighted by their probabilities:
EV = āˆ’ 2pāˆ’2 āˆ’ pāˆ’1 + p1 + 2p2 = 1.2
As additional constraints you have to require that each of these probabilities is non-negative, and that the above four add up to a value less than 1, with the remainder taken by the fifth probability p0.
there are many possible assignments which satisfy these requirements, and any one will do what you asked for. Which of them are reasonable for your application depends on what that application does.
You can use a PRNG which generates variables uniformly distributed in the range [0,1), and then map these to the cases you described by taking the cumulative sums of the probabilities as cut points.

How to find number of items dropped based on individual probabilities?

My goal is to independently calculate the number of items an enemy would drop after it is killed. For example, say there are 50 potions each with a 50% chance of being dropped, I'd like to randomly return a number from 0 to 50, based on independent trials.
Currently, this is the code I'm using:
int droppedItems(int n, float probability) {
int count = 0;
for (int x = 1; x <= n; ++x) {
if (random() <= probability) {
++count;
}
}
return count;
}
Where probability is a number from 0.0 to 1.0, random() returns 0.0 to 1.0, and n is the maximum number of items to be dropped. This is in C++ code, however, I'm actually using Visual Basic 6 - so there's no libraries to help with this.
This code works flawlessly. However, I'd like to optimize this so that if n happens to be 999999, it doesn't take forever (which it currently does).
Use the binomial distribution. Wiki - Binomial Distribution
Ideally, use the libraries for whatever language this pseudocode will be written in. There's no sense in reinventing the wheel unless of course you are trying to learn how to invent a wheel.
Specifically, you'll want something that will let you generate random values given a binomial distribution with a probability of success in any given trial and a number of trials.
EDIT :
I went ahead and did this (in python, since that's where I live these days). It relies on the very nice numpy library (hooray, abstraction!):
>>>import numpy
>>>numpy.random.binomial(99999,0.5)
49853
>>>numpy.random.binomial(99999,0.5)
50077
And, using timeit.Timer to check execution time:
# timing it across 10,000 iterations for 99,999 items per iteration
>>>timeit.Timer(stmt="numpy.random.binomial(99999,0.5)", setup="import numpy").timeit(10000)
0.00927[... seconds]
EDIT 2 :
As it turns out, there isn't a simple way to implement a random number generator based off of the binomial distribution.
There is an algorithm you can implement without library support which will generate random variables from the binomial distribution. You can view it here as a PDF
My guess is that given what you want to use it for (having monsters drop loot in a game), implementing the algorithm is not worth your time. There's room for fudge factor here!
I would change your code like this (note: this is not a binomial distribution):
Use your current code for small values, say n up to 100.
For n greater than one hundred, calculate the value of count for
100 using your current algorithm and then multiply the result by
n/100.
Again, if you really want to figure out how to implement the BTPE algorithm yourself, you can - I think the method I give above wins in the trade off between effort to write and getting "close enough".
As #IamChuckB pointed out already, the key word is binomial distribution. When the number of Bernoulli trials (number of items in your example) is large enough, a good approximation is the Poisson distribution, which is much simpler to calculate and draw numbers from (the exact algorithm is spelled out in the linked Wikipedia article).

Quantifying the non-randomness of a specialized random generator?

I just read this interesting question about a random number generator that never generates the same value three consecutive times. This clearly makes the random number generator different from a standard uniform random number generator, but I'm not sure how to quantitatively describe how this generator differs from a generator that didn't have this property.
Suppose that you handed me two random number generators, R and S, where R is a true random number generator and S is a true random number generator that has been modified to never produce the same value three consecutive times. If you didn't tell me which one was R or S, the only way I can think of to detect this would be to run the generators until one of them produced the same value three consecutive times.
My question is - is there a better algorithm for telling the two generators apart? Does the restriction of not producing the same number three times somehow affect the observable behavior of the generator in a way other than preventing three of the same value from coming up in a row?
As a consequence of Rice's Theorem, there is no way to tell which is which.
Proof: Let L be the output of the normal RNG. Let L' be L, but with all sequences of length >= 3 removed. Some TMs recognize L', but some do not. Therefore, by Rice's theorem, determining if a TM accepts L' is not decidable.
As others have noted, you may be able to make an assertion like "It has run for N steps without repeating three times", but you can never make the leap to "it will never repeat a digit three times." More appropriately, there exists at least one machine for which you can't determine whether or not it meets this criterion.
Caveat: if you had a truly random generator (e.g. nuclear decay), it is possible that Rice's theorem would not apply. My intuition is that the theorem still holds for these machines, but I've never heard it discussed.
EDIT: a secondary proof. Suppose P(X) determines with high probability whether or not X accepts L'. We can construct an (infinite number of) programs F like:
F(x): if x(F), then don't accept L'
else, accept L'
P cannot determine the behavior of F(P). Moreover, say P correctly predicts the behavior of G. We can construct:
F'(x): if x(F'), then don't accept L'
else, run G(x)
So for every good case, there must exist at least one bad case.
If S is defined by rejecting from R, then a sequence produced by S will be a subsequence of the sequence produced by R. For example, taking a simple random variable X with equal probability of being 1 or 0, you would have:
R = 0 1 1 0 0 0 1 0 1
S = 0 1 1 0 0 1 0 1
The only real way to differentiate these two is to look for streaks. If you are generating binary numbers, then streaks are incredibly common (so much so that one can almost always differentiate between a random 100 digit sequence and one that a student writes down trying to be random). If the numbers are taken from [0,1] uniformly, then streaks are far less common.
It's an easy exercise in probability to calculate the chance of three consecutive numbers being equal once you know the distribution, or even better, the expected number of numbers needed until the probability of three consecutive equal numbers is greater than p for your favourite choice of p.
Since you defined that they only differ with respect to that specific property there is no better algorithm to distinguish those two.
If you do triples of randum values of course the generator S will produce all other triples slightly more often than R in order to compensate the missing triples (X,X,X). But to get a significant result you'd need much more data than it will cost you to find any value three consecutive times the first time.
Probably use ENT ( http://fourmilab.ch/random/ )

Resources