Related
I am asked to implement an algorithm to simulate from a poisson (lambda) distribution using simulation from an exponential distribution.
I was given the following density:
P(X = k) = P(X1 + · · · + Xk ≤ 1 < X1 + · · · + Xk+1), for k = 1, 2, . . . .
P(X = k) is the poisson with lambda, and Xi is exponential distribution.
I wrote code to simulate the exponential distribution, but have no clue how to simulate a poisson. Could anybody help me about this? Thanks million.
My code:
n<-c(1:k)
u<-runif(k)
x<--log(1-u)/lambda
I'm working on the assumption you (or your instructor) want to do this from first principles rather than just calling the builtin Poisson generator. The algorithm is pretty straightforward. You count how many exponentials you can generate with the specified rate until their sum exceeds 1.
My R is rusty and this sounds like a homework anyway, so I'll express it as pseudo-code:
count <- 0
sum <- 0
repeat {
generate x ~ exp(lambda)
sum <- sum + x
if sum > 1
break
else
count <- count + 1
}
The value of count after you break from the loop is your Poisson outcome for this trial. If you wrap this as a function, return count rather than breaking from the loop.
You can improve this computationally in a couple of ways. The first is to notice that the 1-U term for generating the exponentials has a uniform distribution, and can be replaced by just U. The more significant improvement is obtained by writing the evaluation as maximize i s.t. SUM(-log(Ui) / rate) <= 1, so SUM(log(Ui)) >= -rate.
Now exponentiate both sides and simplify to get
PRODUCT(Ui) >= Exp(-rate).
The right-hand side of this is constant, and can be pre-calculated, reducing the amount of work from k+1 log evaluations and additions to one exponentiation and k+1 multiplications:
count <- 0
product <- 1
threshold = Exp(-lambda)
repeat {
generate u ~ Uniform(0,1)
product <- product * u
if product < threshold
break
else
count <- count + 1
}
Assuming you do the U for 1-U substitution for both implementations, they are algebraically equal and will yield identical answers to within the precision of floating point arithmetic for a given set of U's.
You can use rpois to generate Poisson variates as per above suggestion. However, my understanding of the question is that you wish to do so from first principles rather than using built-in functions. To do this, you need to use the property of the Poisson arrivals stating that the inter-arrival times are exponentially distributed. Therefore we proceed as follows:
Step 1: Generate a (large) sample from the exponential distribution and create vector of cumulative sums. The k-th entry of this vector is the waiting time to the k-th Poisson arrival
Step 2: Measure how many arrivals we see in a unit time interval
Step3: Repeat steps 1 and 2 many times and gather the results into a vector
This will be your sample from the Poisson distribution with the correct rate parameter.
The code:
lambda=20 # for example
out=sapply(1:100000, function(i){
u<-runif(100)
x<--log(1-u)/lambda
y=cumsum(x)
length(which(y<=1))
})
Then you can test the validity vs the built-in function via the Kolmogorov-Smirnov test:
ks.test(out, rpois(100000, lambda))
I am trying to replicate one part of the simulating programming results in a working paper, which says the authors 'generate random values from the right-skewed "inverse log-normal" distribution with an expected median value of w_median = 0.85, with an additional condition, 0 <= w <= 1,' which obviously means random values are within 0 and 1. I am using R, and there are functions for generating "log-normal" distributions like dlnorm, plnorm, qlnorm, rlnorm, and it's quite obvious to generate random values from log-normal distributions with those functions like:
rand_val <- rlnorm(1000, meanlog=log(0.85))
hist(rand_val, breaks=100)
median(rand_val) # 0.8856299
min(rand_val) # 0.04660691
max(rand_val) # 23.33998
But I have no idea about how to generate the random values from "inverse log-normal" distribution. There was essentially the same question raised before (Inverse of the lognormal distribution), and they suggested using qlnorm function, but I am not sure how that function works for generating random values from inverse log-normal distribution, especially with additional conditions of mine as mentioned: 1) expected median value = 0.85; 2) random values are within 0 to 1. Thanks in advance!
I am trying to generate random numbers from a custom distribution, i already found this question:
Simulate from an (arbitrary) continuous probability distribution
but unfortunatly it does not help me since the approach suggested there requires a formula for the distribution function. My distribution is a combination of multiple uniform distributions, basically the distribution function looks like a histogram. An example would be:
f(x) = {
0 for x < 1
0.5 for 1 <= x < 2
0.25 for 2 <= x < 4
0 for 4 <= x
}
You just need inverse CDF method:
samplef <- function (n) {
x <- runif(n)
ifelse(x < 0.5, 2 * x + 1, 4 * x)
}
Compute CDF yourself to verify that:
F(x) = 0 x < 1
0.5 * x - 0.5 1 < x < 2
0.25 * x 2 < x < 4
1 x > 4
so that its inverse is:
invF(x) = 2 * x + 1 0 < x < 0.5
4 * x 0.5 < x < 1
You can combine various efficient methods for sampling from discrete distributions with a continuous uniform.
That is, simulate from the integer part Y=[X] of your variable, which has a discrete distribution with probability equal to the probability of being in each interval (such as via the table method - a.k.a. the alias method), and then simply add a random uniform [0,1$, X = Y+U.
In your example, you have Y taking the values 1,2,3 with probability 0.5,0.25 and 0.25 (which is equivalent to sampling 1,1,2,3 with equal probability) and then add a random uniform.
If your "histogram" is really large this can be a very fast approach.
In R you could do a simple (if not especially efficient) version of this via
sample(c(1,1,2,3))+runif(1)
or
sample(c(1,1,2,3),n,replace=TRUE)+runif(n)
and more generally you could use the probability weights argument in sample.
If you need more speed than this gets you (and for some applications you might, especially with big histograms and really big sample sizes), you can speed up the discrete part quite a bit using approaches mentioned at the link, and programming the workhorse part of that function in a lower level language (in C, say).
That said, even just using the above code with a considerably "bigger" histogram -- dozens to hundreds of bins -- this approach seems - even on my fairly nondescript laptop - to be able to generate a million random values in well under a second, so for many applications this will be fine.
If the custom distribution from which you are drawing a random number is defined by empirical observations, then you can also just draw from an empirical distribution using the fishmethods::remp() package and function.
https://rdrr.io/cran/fishmethods/man/remp.html
I have the pdf of a distribution. This distribution is not a standard distribution and no functions exist in R to sample from it. How to I sample from this pdf using R?
This is more of a statistics question, as it requires sampling, but in general, you can take this approach to the problem:
Find a distribution f, whose pdf, when multiplied by any given constant k, is always greater than the pdf of the distribution in question, g.
For each sample, do the following steps:
Sample a random number x from the distribution f.
Calculate C = f(x)*k/g(x). This should be equal to or less than 1.
Draw a random number u from a uniform distribution U(0,1). If C < u, then go back to step 3. Otherwise keep x as the number and continue sampling if desired.
This process is known as rejection sampling, and is often used in random number generators that are not uniform.
The normal distribution and the uniform distribution are some of the more common distributions to sample from, but you can do other ones. Generally you want the shapes of k*f(x) and g(x) to be very close, so you don't have to reject a lot of samples.
Here's an example implementation:
#n is sample size
#g is pdf you want to sample from
#rf is sampling function for f
#df is density function for f
#k is multiplicative constant
#... is any necessary parameters for f
function.sample <- function(n,g,rf,df,k,...){
results = numeric(n)
counter = 0
while(counter < n){
x = rf(1,...)
x.pdf = df(x,...)
if (runif(0,1) >= x.pdf * k/g(x)){
results[counter+1] = x
counter = counter + 1
}
}
}
There are other methods to do random sampling, but this is usually the easiest, and it works well for most functions (unless their PDF is hard to calculate but their CDF isn't).
suppose I have the following 2 random variables :
X where mean = 6 and stdev = 3.5
Y where mean = -42 and stdev = 5
I would like to create a new random variable Z based on the first two and knowing that : X happens 90% of the time and Y happens 10% of the time.
It is easy to calculate the mean for Z : 0.9 * 6 + 0.1 * -42 = 1.2
But is it possible to generate random values for Z in a single function?
Of course, I could do something along those lines :
if (randIntBetween(1,10) > 1)
GenerateRandomNormalValue(6, 3.5);
else
GenerateRandomNormalValue(-42, 5);
But I would really like to have a single function that would act as a probability density function for such a random variable (Z) that is not necessary normal.
sorry for the crappy pseudo-code
Thanks for your help!
Edit : here would be one concrete interrogation :
Let's say we add the result of 5 consecutives values from Z. What would be the probability of ending with a number higher than 10?
But I would really like to have a
single function that would act as a
probability density function for such
a random variable (Z) that is not
necessary normal.
Okay, if you want the density, here it is:
rho = 0.9 * density_of_x + 0.1 * density_of_y
But you cannot sample from this density if you don't 1) compute its CDF (cumbersome, but not infeasible) 2) invert it (you will need a numerical solver for this). Or you can do rejection sampling (or variants, eg. importance sampling). This is costly, and cumbersome to get right.
So you should go for the "if" statement (ie. call the generator 3 times), except if you have a very strong reason not to (using quasi-random sequences for instance).
If a random variable is denoted x=(mean,stdev) then the following algebra applies
number * x = ( number*mean, number*stdev )
x1 + x2 = ( mean1+mean2, sqrt(stdev1^2+stdev2^2) )
so for the case of X = (mx,sx), Y= (my,sy) the linear combination is
Z = w1*X + w2*Y = (w1*mx,w1*sx) + (w2*my,w2*sy) =
( w1*mx+w2*my, sqrt( (w1*sx)^2+(w2*sy)^2 ) ) =
( 1.2, 3.19 )
link: Normal Distribution look for Miscellaneous section, item 1.
PS. Sorry for the wierd notation. The new standard deviation is calculated by something similar to the pythagorian theorem. It is the square root of the sum of squares.
This is the form of the distribution:
ListPlot[BinCounts[Table[If[RandomReal[] < .9,
RandomReal[NormalDistribution[6, 3.5]],
RandomReal[NormalDistribution[-42, 5]]], {1000000}], {-60, 20, .1}],
PlotRange -> Full, DataRange -> {-60, 20}]
It is NOT Normal, as you are not adding Normal variables, but just choosing one or the other with certain probability.
Edit
This is the curve for adding five vars with this distribution:
The upper and lower peaks represent taking one of the distributions alone, and the middle peak accounts for the mixing.
The most straightforward and generically applicable solution is to simulate the problem:
Run the piecewise function you have 1,000,000 (just a high number) of times, generate a histogram of the results (by splitting them into bins, and divide the count for each bin by your N (1,000,000 in my example). This will leave you with an approximation for the PDF of Z at every given bin.
Lots of unknowns here, but essentially you just wish to add the two (or more) probability functions to one another.
For any given probability function you could calculate a random number with that density by calculating the area under the probability curve (the integral) and then generating a random number between 0 and that area. Then move along the curve until the area is equal to your random number and use that as your value.
This process can then be generalized to any function (or sum of two or more functions).
Elaboration:
If you have a distribution function f(x) which ranges from 0 to 1. You could calculate a random number based on the distribution by calculating the integral of f(x) from 0 to 1, giving you the area under the curve, lets call it A.
Now, you generate a random number between 0 and A, let's call that number, r. Now you need to find a value t, such that the integral of f(x) from 0 to t is equal to r. t is your random number.
This process can be used for any probability density function f(x). Including the sum of two (or more) probability density functions.
I'm not sure what your functions look like, so not sure if you are able to calculate analytic solutions for all this, but worse case scenario, you could use numeric techniques to approximate the effect.