Generate random numbers with distribution e^x - r

Is there any way to generate random numbers of the distribution e^x with x defined over a certain range in R?

Your question is unclear, so I'm going to assume that you'd like to treat ex as a density. The function ex on the range (0,0.5) is not a density, because densities are required to have an area of 1. However, over any specified finite range it can be scaled by its area to turn it into a density. Integrating ex over the specified range yields ex/0.6487212707001282 as a valid density (to within roundoff error).
We then integrate the density from 0 to x for 0 ≤ x ≤ 0.5 to derive the cumulative distribution function: F(x) = (ex - 1) / 0.6487212707001282.
we can now use inverse transform sampling to generate values from this distribution. Set the CDF equal to U, a Uniform(0,1) random number, and solve for x:
x = ln(1 + 0.6487212707001282 * U)

Related

how to sample from an upside down bell curve

I can generate numbers with uniform distribution by using the code below:
runif(1,min=10,max=20)
How can I sample randomly generated numbers that fall more frequently closer to the minimum and maxium boundaries? (Aka an "upside down bell curve")
Well, bell curve is usually gaussian, meaning it doesn't have min and max. You could try Beta distribution and map it to desired interval. Along the lines
min <- 1
max <- 20
q <- min + (max-min)*rbeta(10000, 0.5, 0.5)
As #Gregor-reinstateMonica noted, Beta distribution is bounded on both ends, [0...1], so it could be easily mapped into any bounded interval just by scale and shift. It has two parameters, and symmetric if those parameters are equal. Above 1 parameters make it kind of bell distribution, but below 1 parameters make it into inverse bell, what you're looking for. You could play with them, put different values instead of 0.5 and see how it is going. Parameters equal to 1 makes it uniform.
Sampling from a beta distribution is a good idea. Another way is to sample a number of uniform numbers and then take the minimum or maximum of them.
According to the theory of order statistics, the cumulative distribution function for the maximum is F(x)^n where F is the cdf from which the sample is taken and n is the number of samples, and the cdf for the minimum is 1 - (1 - F(x))^n. For a uniform distribution, the cdf is a straight line from 0 to 1, i.e., F(x) = x, and therefore the cdf of the maximum is x^n and the cdf of the minimum is 1 - (1 - x)^n. As n increases, these become more and more curved, with most of the mass close to the ends.
A web search for "order statistics" will turn up some resources.
If you don't care about decimal places, a hacky way would be to generate a large sample of normally distributed datapoints using rnorm(), then count the number of times each given rounded value appears (n), and then substract n from the maximum value of n (max(n)) to get inverse counts.
You can then use the inverse count to make a new vector (that you can sample from), i.e.:
library(tidyverse)
x <- rnorm(100000, 100, 15)
x_tib <- round(x) %>%
tibble(x = .) %>%
count(x) %>%
mutate(new_n = max(n) - n)
new_x <- rep(x_tib$x, x_tib$new_n)
qplot(new_x, binwidth = 1)
An "upside-down bell curve" compared to the normal distribution can be sampled using the following algorithm. I write it in pseudocode because I'm not familiar with R. Notice that this sampler samples in a truncated interval (here, the interval [x0, x1]) because it's not possible for an upside-down bell curve extended to infinity to integrate to 1 (which is one of the requirements for a probability density).
In the pseudocode, RNDU01() is a uniform(0, 1) random number.
x0pdf = 1-exp(-(x0*x0))
x1pdf = 1-exp(-(x1*x1))
ymax = max(x0pdf, x1pdf)
while true
# Choose a random x-coordinate
x=RNDU01()*(x1-x0)+x0
# Choose a random y-coordinate
y=RNDU01()*ymax
# Return x if y falls within PDF
if y < 1-exp(-(x*x)): return x
end

Points uniformly distributed on unit disk (2D)

I am trying to generate 10,000 points from the uniform distribution on the unit disk and plot these points.
The method I am using has three steps. The first step is generating the magnitude of the point x. This point has cdf F(x) = x^2 min(x) = 0 and max(x) = 1. The second step involves generating a 2 dimensional vector (which I will call y) from the multivariate normal distribution with mu being the zero vector and sigma being the 2x2 identity matrix - MVN(0,I). Last I normalize the vector y to have length x. I have tried to code the solution in R but I do not think my answer is correct. I would really appreciate if I could be pointed in the right direction.
u = runif(10000)
x = u^2
y = mvrnorm(10000, mu=rep(0,2), Sigma=diag(2))
y_norm = (x*y)/sqrt(sum(y^2))
plot(y_norm, asp = 1)
I used the MASS package for mvrnorm. Also I have included the plot that I ended up with:
You need to compute the length of each of the rows in your y matrix, you are getting the square root of the sum of all the numbers in y, which is just scaling your multinomial by a constant. Also, you need x to be sqrt(u) rather than u^2 - this code normalises each row by its length and users sqrt(u) scaling and it looks nice and uniform:
plot(sqrt(u)*y/sqrt(y[,1]^2+y[,2]^2))
There are better ways of making uniform points on a disc, unless this is just an exercise to do it this way...

How to sample N points between 0 and R if they are exponentially distributed?

The density of my points x ∈ [0,R] is exponential: ρ(x)~e^x
How can I sample N points from there?
Taking your request at face value, if you want a density function that grows exponentially for x ∈ [0,R] the cumulative distribution function turns out to be (exp(x) - 1) / (exp(R) - 1). To generate this via inversion, set the CDF equal to a Uniform(0,1) and solve for x. The inversion turns out to be:
ln(1 + (exp(R) - 1) * U)
where U represents a call to the Uniform(0,1) PRNG.
If what you actually want is a truncated form of what most probability folks know as the exponential distribution, we need to determine an upper bound for the random number corresponding to your truncation point R. In that case, the inversion is:
-ln(1 - [1 - exp(-lambda * R)] * U) / lambda
As before, U represents a call to the Uniform(0,1) PRNG. This will generate exponentials at rate lambda, truncated at a max of R.
Use inverse sampling: you generate uniform distributed values and map them to the output of the cdf of your distribution.

Combining two normal random variables

suppose I have the following 2 random variables :
X where mean = 6 and stdev = 3.5
Y where mean = -42 and stdev = 5
I would like to create a new random variable Z based on the first two and knowing that : X happens 90% of the time and Y happens 10% of the time.
It is easy to calculate the mean for Z : 0.9 * 6 + 0.1 * -42 = 1.2
But is it possible to generate random values for Z in a single function?
Of course, I could do something along those lines :
if (randIntBetween(1,10) > 1)
GenerateRandomNormalValue(6, 3.5);
else
GenerateRandomNormalValue(-42, 5);
But I would really like to have a single function that would act as a probability density function for such a random variable (Z) that is not necessary normal.
sorry for the crappy pseudo-code
Thanks for your help!
Edit : here would be one concrete interrogation :
Let's say we add the result of 5 consecutives values from Z. What would be the probability of ending with a number higher than 10?
But I would really like to have a
single function that would act as a
probability density function for such
a random variable (Z) that is not
necessary normal.
Okay, if you want the density, here it is:
rho = 0.9 * density_of_x + 0.1 * density_of_y
But you cannot sample from this density if you don't 1) compute its CDF (cumbersome, but not infeasible) 2) invert it (you will need a numerical solver for this). Or you can do rejection sampling (or variants, eg. importance sampling). This is costly, and cumbersome to get right.
So you should go for the "if" statement (ie. call the generator 3 times), except if you have a very strong reason not to (using quasi-random sequences for instance).
If a random variable is denoted x=(mean,stdev) then the following algebra applies
number * x = ( number*mean, number*stdev )
x1 + x2 = ( mean1+mean2, sqrt(stdev1^2+stdev2^2) )
so for the case of X = (mx,sx), Y= (my,sy) the linear combination is
Z = w1*X + w2*Y = (w1*mx,w1*sx) + (w2*my,w2*sy) =
( w1*mx+w2*my, sqrt( (w1*sx)^2+(w2*sy)^2 ) ) =
( 1.2, 3.19 )
link: Normal Distribution look for Miscellaneous section, item 1.
PS. Sorry for the wierd notation. The new standard deviation is calculated by something similar to the pythagorian theorem. It is the square root of the sum of squares.
This is the form of the distribution:
ListPlot[BinCounts[Table[If[RandomReal[] < .9,
RandomReal[NormalDistribution[6, 3.5]],
RandomReal[NormalDistribution[-42, 5]]], {1000000}], {-60, 20, .1}],
PlotRange -> Full, DataRange -> {-60, 20}]
It is NOT Normal, as you are not adding Normal variables, but just choosing one or the other with certain probability.
Edit
This is the curve for adding five vars with this distribution:
The upper and lower peaks represent taking one of the distributions alone, and the middle peak accounts for the mixing.
The most straightforward and generically applicable solution is to simulate the problem:
Run the piecewise function you have 1,000,000 (just a high number) of times, generate a histogram of the results (by splitting them into bins, and divide the count for each bin by your N (1,000,000 in my example). This will leave you with an approximation for the PDF of Z at every given bin.
Lots of unknowns here, but essentially you just wish to add the two (or more) probability functions to one another.
For any given probability function you could calculate a random number with that density by calculating the area under the probability curve (the integral) and then generating a random number between 0 and that area. Then move along the curve until the area is equal to your random number and use that as your value.
This process can then be generalized to any function (or sum of two or more functions).
Elaboration:
If you have a distribution function f(x) which ranges from 0 to 1. You could calculate a random number based on the distribution by calculating the integral of f(x) from 0 to 1, giving you the area under the curve, lets call it A.
Now, you generate a random number between 0 and A, let's call that number, r. Now you need to find a value t, such that the integral of f(x) from 0 to t is equal to r. t is your random number.
This process can be used for any probability density function f(x). Including the sum of two (or more) probability density functions.
I'm not sure what your functions look like, so not sure if you are able to calculate analytic solutions for all this, but worse case scenario, you could use numeric techniques to approximate the effect.

Possible infinite loop on math equation?

I have the following problem, and am having trouble understanding part of the equation:
Monte Carlo methods to estimate an integral is basically, take a lot of random samples and determined a weighted average. For example, the integral of f(x) can be estimated from N independent random samples xr by
alt text http://www.goftam.com/images/area.gif
for a uniform probability distribution of xr in the range [x1, x2]. Since each
function evaluation f(xr) is independent, it is easy to distribute this work
over a set of processes.
What I don't understand is what f(xr) is supposed to do? Does it feed back into the same equation? Wouldn't that be an infinite loop?
It should say f(xi)
f() is the function we are trying to integrate via the numerical monte carlo method, which estimates an integral (and its error) by evaluating randomly choosen points from the integration region.
Ref.
Your goal is to compute the integral of f from x1 to x2. For example, you may wish to compute the integral of sin(x) from 0 to pi.
Using Monte Carlo integration, you can approximate this by sampling random points in the interval [x1,x2] and evaluating f at those points. Perhaps you'd like to call this MonteCarloIntegrate( f, x1, x2 ).
So no, MonteCarloIntegrate does not "feed back" into itself. It calls a function f, the function you are trying to numerically integrate, e.g. sin.
Replace f(x_r) by f(x_r_i) (read: f evaluated at x sub r sub i). The r_i are chosen uniformly at random from the interval [x_1, x_2].
The point is this: the area under f on [x_1, x_2] is equal to (x_2 - x_1) times the average of f on the interval [x_1, x_2]. That is
A = (x_2 - x_1) * [(1 / (x_2 - x_1)) * int_{x_1}^{x_2} f(x)\, dx]
The portion in square brackets is the average of f on [x_1, x_2] which we will denote avg(f). How can we estimate the average of f? By sampling it at N random points and taking the average value of f evaluated at those random points. To wit:
avg(f) ~ (1 / N) * sum_{i=1}^{N} f(x_r_i)
where x_r_1, x_r_2, ..., x_r_N are points chosen uniformly at random from [x_1, x_2].
Then
A = (x_2 - x_1) * avg(f) ~ (x_2 - x_1) * (1 / N) * sum_{i=1}^{N} f(x_r_i).
Here is another way to think about this equation: the area under f on the interval [x_1, x_2] is the same as the area of a rectangle with length (x_2 - x_1) and height equal to the average height of f. The average height of f is approximately
(1 / N) * sum_{i=1}^{N} f(x_r_i)
which is value that we produced previously.
Whether it's xi or xr is irrelevant - it's the random number that we're feeding into function f().
I'm more likely to write the function (aside from formatting) as follows:
(x2-x1) * sum(f(xi))/N
That way, we can see that we're taking the average of N samples of f(x) to get an average height of the function, then multiplying by the width (x2-x1).
Because, after all, integration is just calculating area under the curve. (Nice pictures at http://hyperphysics.phy-astr.gsu.edu/Hbase/integ.html#c4.
x_r is a random value from the integral's range.
Substituting Random(x_1, x_2) for x_r would give an equivalent equation.

Resources