Normal distribution curve and random number in LUA - math

So say I have a range of numbers 1-1000. math.random(1,1000) would give me an equal chance of getting each number. Instead of this, I want to make a distribution curve so that the chance of getting 1 equals the chance of getting 1000 but the chance of getting 500 for example is much more common. How would I go about making this?

function norm1000()
local x
repeat
x = math.ceil(math.log(1/math.random())^.5*math.cos(math.pi*math.random())*150+500)
until x >= 1 and x <= 1000
return x
end

Related

Decimal precission problems with runif

I'm running into issues when simulating low-probability events with runif in R, and wondering how to solve this.
Consider the following example for an experiment where we simulate values of TRUE with probability 5e-10 in a sample of size 10e9, and check if any of these samples got that value of TRUE. This experiment is repeated 10 times:
set.seed(123)
probability <- 0.0000000005
n_samples <- 1000000000
n_tries <- 10
for (i in 1:n_tries) {
print(any(runif(n=n_samples, min=0, max=1) < probability))
}
Code above will run relatively fast, and nearly half of the experiment replicates will return TRUE as expected.
However, as soon as the probability becomes 5e-11 (probability <- 0.00000000005), that expectation fails and no TRUE values will be returned even if the number of replicates is increased (used n_tries <- 100 twice with no luck; the whole process took 1h running).
This means runif is not returning values with as many precision as 11 decimals. This was unexpected, as R to my understanding works with as much as 16 decimals of precision, and we might need to simulate processes with probabilities that small (around 15 decimals).
Is this why runif fails to provide the expected output? are there any other alternatives/solutions to this problem?
Thank you
EDIT: I have made a test to check whether this problem could be related to boundary bias (causing a reduced density of probability near extreme values of 0 or 1). To do so, the result of runif is added a constant (e.g. k <- 0.5) and compared against the value of probability plus that same constant. However, that does not seem to fix the issue.

Generate N random integers that are sampled from a uniform distribution and sum to M in R [duplicate]

In some code I want to choose n random numbers in [0,1) which sum to 1.
I do so by choosing the numbers independently in [0,1) and normalizing them by dividing each one by the total sum:
numbers = [random() for i in range(n)]
numbers = [n/sum(numbers) for n in numbers]
My "problem" is, that the distribution I get out is quite skew. Choosing a million numbers not a single one gets over 1/2. By some effort I've calculated the pdf, and it's not nice.
Here is the weird looking pdf I get for 5 variables:
Do you have an idea for a nice algorithm to choose the numbers, that result in a more uniform or simple distribution?
You are looking to partition the distance from 0 to 1.
Choose n - 1 numbers from 0 to 1, sort them and determine the distances between each of them.
This will partition the space 0 to 1, which should yield the occasional large result which you aren't getting.
Even so, for large values of n, you can generally expect your max value to decrease as well, just not as quickly as your method.
You might be interested in the Dirichlet distribution which is used for generate quantities that sum to 1 if you're looking for probabilities. There's also a section on how to generate them using gamma distributions here.
Another way to get n random numbers which sum up to 1:
import random
def create_norm_arr(n, remaining=1.0):
random_numbers = []
for _ in range(n - 1):
r = random.random() # get a random number in [0, 1)
r = r * remaining
remaining -= r
random_numbers.append(r)
random_numbers.append(remaining)
return random_numbers
random_numbers = create_norm_arr(5)
print(random_numbers)
print(sum(random_numbers))
This makes higher numbers more likely.

Estimate the chance n rolls of m fair six-sided dice

Similar with De mere problem
I want to generate a Monte Carlo simulation to estimate the probability of rolling at least one from n rolls of m fair six-sided dice.
My code:
m<-5000
n<-3
x<-replicate(m, sample(1:6,n,TRUE)==1)
p<-sum(x)/m
p is the probability estimated. Here I get the value 0.4822.
My questions:
1) Is there any other way without using sum to do it?
2) I doubt the code is wrong as the probability maybe too high.
Although the question as stated is a little unclear, the code suggests you want to estimate the chance of obtaining at least one outcome of "1" among n independent dice and that you aim to estimate this by simulating the experiment m times.
Program simulations from the inside out. Begin with a single iteration. You started well, but to be perfectly clear let's redo it using a highly suggestive syntax. Try this:
1 %in% sample(1:6,n,TRUE)
This uses sample to realize the results of n independent fair dice and checks whether the outcome 1 appears among any of them.
Once you are satisfied that this emulates your experiment (run it a bunch of times), then indeed replicate will perform the simulation:
x <- replicate(m, 1 %in% sample(1:6,n,TRUE))
That produces m results. Each will be TRUE (interpreted as equal to 1) in all iterations where 1 appeared and otherwise will be FALSE (interpreted as 0). Consequently, the average number of times 1 appeared can be obtained as
mean(x)
This empirical frequency is a good estimate of the theoretical probability.
As a check, note that 1 will not appear on a single die with a probability of 1-1/6 = 5/6 and therefore--because the n dice are independent--will not appear on any of them with a probability of (5/6)^n. Consequently the chance a 1 will appear must be 1 - (5/6)^n. Let us output those two values: the simulation mean and theoretical result. We might also include a Z score, which is a measure of how far away from the theoretical result the mean is. Typically, Z scores between -2 and 2 aren't significant evidence of any discrepancy.
Here's the full code. Although there are faster ways to write it, this is very fast already and is about as clear as one could make it.
m <- 5000 # Number of simulation iterations
n <- 3 # Number of dice per iteration
set.seed(17) # For reproducible results
x <- replicate(m, 1 %in% sample(1:6,n,TRUE))
# Compare to a theoretical result.
theory <- 1-(5/6)^n
avg <- mean(x)
Z <- (avg - theory) / sd(x) * sqrt(length(x))
c(Mean=signif(avg, 5), Theoretical=signif(theory, 5), Z.score=signif(Z, 3))
The output is
Mean Theoretical Z.score
0.4132 0.4213 -1.1600
Notice that neither result is anywhere near n/6, which would be 1/2 = 0.500.

How do I get started with this?

So I am stuck on this problem for a long time.
I was think I should first create the two functions, like this:
n = runif(10000)
int sum = 0
estimator1_fun = function(n){
for(i in 1:10000){
sum = sum + ((n/i)*runif(1))
)
return (sum)
}
and do the same for the other function, and use the mse formula? Am I even approaching this correctly? I tried formatting it, but found that using an image would be better.
Assuming U(0,Theta_0) is the uniform distribution from 0 to Theta_0, and that Theta_0 is a fixed constant, I would proceed as follows:
1. Define Theta_0. Give it a fixed value.
2. Write the function that gives a random number from that distribution
- The distribution function is runif(0,Theta_0).
- Arguments could be Theta_0 and N.
3. Sample it a few thousand (or whatever) times into a vector X.
4. Calculate the two estimates.
5. Repeat steps 3 & 4 for more samples
6. Plot the two estimates against the number of samples and
see if it is approaching Theta_0

Combining two normal random variables

suppose I have the following 2 random variables :
X where mean = 6 and stdev = 3.5
Y where mean = -42 and stdev = 5
I would like to create a new random variable Z based on the first two and knowing that : X happens 90% of the time and Y happens 10% of the time.
It is easy to calculate the mean for Z : 0.9 * 6 + 0.1 * -42 = 1.2
But is it possible to generate random values for Z in a single function?
Of course, I could do something along those lines :
if (randIntBetween(1,10) > 1)
GenerateRandomNormalValue(6, 3.5);
else
GenerateRandomNormalValue(-42, 5);
But I would really like to have a single function that would act as a probability density function for such a random variable (Z) that is not necessary normal.
sorry for the crappy pseudo-code
Thanks for your help!
Edit : here would be one concrete interrogation :
Let's say we add the result of 5 consecutives values from Z. What would be the probability of ending with a number higher than 10?
But I would really like to have a
single function that would act as a
probability density function for such
a random variable (Z) that is not
necessary normal.
Okay, if you want the density, here it is:
rho = 0.9 * density_of_x + 0.1 * density_of_y
But you cannot sample from this density if you don't 1) compute its CDF (cumbersome, but not infeasible) 2) invert it (you will need a numerical solver for this). Or you can do rejection sampling (or variants, eg. importance sampling). This is costly, and cumbersome to get right.
So you should go for the "if" statement (ie. call the generator 3 times), except if you have a very strong reason not to (using quasi-random sequences for instance).
If a random variable is denoted x=(mean,stdev) then the following algebra applies
number * x = ( number*mean, number*stdev )
x1 + x2 = ( mean1+mean2, sqrt(stdev1^2+stdev2^2) )
so for the case of X = (mx,sx), Y= (my,sy) the linear combination is
Z = w1*X + w2*Y = (w1*mx,w1*sx) + (w2*my,w2*sy) =
( w1*mx+w2*my, sqrt( (w1*sx)^2+(w2*sy)^2 ) ) =
( 1.2, 3.19 )
link: Normal Distribution look for Miscellaneous section, item 1.
PS. Sorry for the wierd notation. The new standard deviation is calculated by something similar to the pythagorian theorem. It is the square root of the sum of squares.
This is the form of the distribution:
ListPlot[BinCounts[Table[If[RandomReal[] < .9,
RandomReal[NormalDistribution[6, 3.5]],
RandomReal[NormalDistribution[-42, 5]]], {1000000}], {-60, 20, .1}],
PlotRange -> Full, DataRange -> {-60, 20}]
It is NOT Normal, as you are not adding Normal variables, but just choosing one or the other with certain probability.
Edit
This is the curve for adding five vars with this distribution:
The upper and lower peaks represent taking one of the distributions alone, and the middle peak accounts for the mixing.
The most straightforward and generically applicable solution is to simulate the problem:
Run the piecewise function you have 1,000,000 (just a high number) of times, generate a histogram of the results (by splitting them into bins, and divide the count for each bin by your N (1,000,000 in my example). This will leave you with an approximation for the PDF of Z at every given bin.
Lots of unknowns here, but essentially you just wish to add the two (or more) probability functions to one another.
For any given probability function you could calculate a random number with that density by calculating the area under the probability curve (the integral) and then generating a random number between 0 and that area. Then move along the curve until the area is equal to your random number and use that as your value.
This process can then be generalized to any function (or sum of two or more functions).
Elaboration:
If you have a distribution function f(x) which ranges from 0 to 1. You could calculate a random number based on the distribution by calculating the integral of f(x) from 0 to 1, giving you the area under the curve, lets call it A.
Now, you generate a random number between 0 and A, let's call that number, r. Now you need to find a value t, such that the integral of f(x) from 0 to t is equal to r. t is your random number.
This process can be used for any probability density function f(x). Including the sum of two (or more) probability density functions.
I'm not sure what your functions look like, so not sure if you are able to calculate analytic solutions for all this, but worse case scenario, you could use numeric techniques to approximate the effect.

Resources