Generate Poisson process using R - r

I want to generate a process where in every step there is a realisation of a Poisson random variable, this realisation should be saved and then it should be realize the next Poisson random variable and add it to the sum of all realisations before. Furthermore there should be a chance that in every step this process stops. Hope that makes sense to you guys... Any thought is appreciated!

More compactly, pick a single geometrically distributed random number for the total number of steps achieved before stopping, then use cumsum to sum that many Poisson deviates:
stopping.prob <- 0.3 ## for example
lambda <- 3.5 ## for example
n <- rgeom(1,1-stopping.prob)+1 ## constant probability per step of stopping
cumsum(rpois(n,lambda))

You are very vague on the parameters of your simulation but how's this?
Lambda for random Poisson number.
lambda <- 5
This is the threshold value when the function exits.
th <- 0.999
Create a vector of length 1000.
bin <- numeric(1000)
Run the darn thing. It basically rolls a "dice" (values generated are between 0 and 1). If the values is below th, it returns a random Poisson number. If the value is above th (but not equal), the function stops.
for (i in 1:length(bin)) {
if (runif(1) < th) {
bin[i] <- rpois(1, lambda = lambda)
} else {
stop("didn't meet criterion, exiting")
}
}
Remove zeros if any.
bin <- bin[bin != 0]
You can use cumsum to cumulatively sum values.
cumsum(bin)

Related

Create Simulation in R

I have following problem:
A casualty insurance company has 1000 policyholders, each of whom will
independently present a claim in the next month with probability 5%.
Assuming that the amounts of the claims made are independent exponential
random variables with mean 800 Dollars.
Does anyone know how to create simulation in R to estimate the probability
that the sum of those claims exceeds 50,000 Dollars?
This sounds like a homework assignment, so it's probably best to consult with your teacher(s) if you're unsure about how to approach this. Bearing that in mind, here's how I'd go about simulating this:
First, create a function that generates values from an exponential distribution and sums those values, based on the values you give in your problem description.
get_sum_claims <- function(n_policies, prob_claim, mean_claim) {
sum(rexp(n = n_policies*prob_claim, rate = 1/mean_claim))
}
Next, make this function return the sum of all claims lots of times, and store the results. The line with map_dbl does this, essentially instructing R to return 100000 simulated sums of claims from the get_sum_claims function.
library(tidyverse)
claim_sums <- map_dbl(1:100000, ~ get_sum_claims(1000, 0.05, 800))
Finally, we can calculate the probability that the sum of claims is greater than 50000 by using the code below:
sum(claim_sums > 50000)/length(claim_sums)
This gives a fairly reliable estimate of ~ 0.046 as the probability that the sum of claims exceeds 50000 in a given month.
I'm a bit inexperienced with R, but here is my solution.
First, construct a function which simulates a single trial. To do so, one needs to determine how many claims are filed n. I hope it is clear that n ~ Binomial(1000, 0.05). Note that, you cannot simply assume n = 1000 * 0.05 = 50. By doing so, you would decrease the variance, which will result in a lower probability. I can explain why this is the case if needed. Then, generate and sum n values based on an exponential distribution with mean 800.
simulate_total_claims <- function(){
claim_amounts <- rexp(rbinom(n=1,size=1000, prob = 0.05), rate = 1/800)
total <- sum(claim_amounts)
return(total)
}
Now, all that needs to be done is run the above function a lot and determine the proportion of runs which have values greater than 50000.
totals <- rerun(.n = 100000, simulate_total_claims())
estimated_prob <- mean(unlist(totals) > 50000)

Is there a R function to find the maximize run within a given interval for a coin simulation?

hope you are doing well. So I was asked to design and conduct a simulation study to estimate the probability that the observed maximum run length (A run is a sequence of consecutive
heads or tails) for a fair coin flipping experiment is in the interval [9, 11] in a sample size of n = 1000, this is my attempt so far
result <-replicate(10000,{ #replicate 10000 times
experiment <- sample(c("T","H"),size=1000,replace=TRUE) #1000 flips
expe_run <- rle(experiment) #find the run
expe_val <- expe_run$values #values of run
expe_length <- expe_run$lengths #length of run
as<-list(expe_length,expe_val) #make a list for sapply function
max_run <-sapply(as, FUN=max) #apply max function through out for both
head_run <-expe_length[which(expe_val=='H')] # show the head run
tail_run <-expe_length[which(expe_val=='T')] #show the tail run
max_run
})
probability <-table(result)/10000 #probability for run
probability
The problem is I don't know how to finish the question, which is estimates the probability that the observed maximum run length for a fair coin flipping experiment is in the interval [9, 11], even though I got the table for every possible probability. Can you please help me out? Thank you
Are you looking for the probability, across the 10,000 trials, that the maximum value in expe_length is contained in the interval [9,11]?
If so, something like:
result <-replicate(10000, {
...
expe_length <- expe_run$lengths #length of run
max(expe_length) %in% 9:11
})
should give you result as a vector of TRUE/FALSE values indicating whether that trial had a maximum run length in that interval.
Afterwards,
sum(result) / length(result)
will give you the proportion you're after.

How to simulate from poisson distribution using simulations from exponential distribution

I am asked to implement an algorithm to simulate from a poisson (lambda) distribution using simulation from an exponential distribution.
I was given the following density:
P(X = k) = P(X1 + · · · + Xk ≤ 1 < X1 + · · · + Xk+1), for k = 1, 2, . . . .
P(X = k) is the poisson with lambda, and Xi is exponential distribution.
I wrote code to simulate the exponential distribution, but have no clue how to simulate a poisson. Could anybody help me about this? Thanks million.
My code:
n<-c(1:k)
u<-runif(k)
x<--log(1-u)/lambda
I'm working on the assumption you (or your instructor) want to do this from first principles rather than just calling the builtin Poisson generator. The algorithm is pretty straightforward. You count how many exponentials you can generate with the specified rate until their sum exceeds 1.
My R is rusty and this sounds like a homework anyway, so I'll express it as pseudo-code:
count <- 0
sum <- 0
repeat {
generate x ~ exp(lambda)
sum <- sum + x
if sum > 1
break
else
count <- count + 1
}
The value of count after you break from the loop is your Poisson outcome for this trial. If you wrap this as a function, return count rather than breaking from the loop.
You can improve this computationally in a couple of ways. The first is to notice that the 1-U term for generating the exponentials has a uniform distribution, and can be replaced by just U. The more significant improvement is obtained by writing the evaluation as maximize i s.t. SUM(-log(Ui) / rate) <= 1, so SUM(log(Ui)) >= -rate.
Now exponentiate both sides and simplify to get
PRODUCT(Ui) >= Exp(-rate).
The right-hand side of this is constant, and can be pre-calculated, reducing the amount of work from k+1 log evaluations and additions to one exponentiation and k+1 multiplications:
count <- 0
product <- 1
threshold = Exp(-lambda)
repeat {
generate u ~ Uniform(0,1)
product <- product * u
if product < threshold
break
else
count <- count + 1
}
Assuming you do the U for 1-U substitution for both implementations, they are algebraically equal and will yield identical answers to within the precision of floating point arithmetic for a given set of U's.
You can use rpois to generate Poisson variates as per above suggestion. However, my understanding of the question is that you wish to do so from first principles rather than using built-in functions. To do this, you need to use the property of the Poisson arrivals stating that the inter-arrival times are exponentially distributed. Therefore we proceed as follows:
Step 1: Generate a (large) sample from the exponential distribution and create vector of cumulative sums. The k-th entry of this vector is the waiting time to the k-th Poisson arrival
Step 2: Measure how many arrivals we see in a unit time interval
Step3: Repeat steps 1 and 2 many times and gather the results into a vector
This will be your sample from the Poisson distribution with the correct rate parameter.
The code:
lambda=20 # for example
out=sapply(1:100000, function(i){
u<-runif(100)
x<--log(1-u)/lambda
y=cumsum(x)
length(which(y<=1))
})
Then you can test the validity vs the built-in function via the Kolmogorov-Smirnov test:
ks.test(out, rpois(100000, lambda))

How do I get started with this?

So I am stuck on this problem for a long time.
I was think I should first create the two functions, like this:
n = runif(10000)
int sum = 0
estimator1_fun = function(n){
for(i in 1:10000){
sum = sum + ((n/i)*runif(1))
)
return (sum)
}
and do the same for the other function, and use the mse formula? Am I even approaching this correctly? I tried formatting it, but found that using an image would be better.
Assuming U(0,Theta_0) is the uniform distribution from 0 to Theta_0, and that Theta_0 is a fixed constant, I would proceed as follows:
1. Define Theta_0. Give it a fixed value.
2. Write the function that gives a random number from that distribution
- The distribution function is runif(0,Theta_0).
- Arguments could be Theta_0 and N.
3. Sample it a few thousand (or whatever) times into a vector X.
4. Calculate the two estimates.
5. Repeat steps 3 & 4 for more samples
6. Plot the two estimates against the number of samples and
see if it is approaching Theta_0

Create specified correlation between variables (by permutation) in R

I am looking for a way to create a specified correlation between 2 variables, regardless of their distribution, given that the ordening is allowed to change. The motivation has to do with Bayesian statistics.
Imagine variable a which holds 100 random normal numbers, while
variable b holds the numbers 1...100.
There will be 100 factorial permutations possible, and most of the time correlations between -0.95 and 0.95 will exist among all possible permutations of variable b.
I wrote a little script in R to try to find the correlation in an iterative way.
Iterate through all the indexes, checking whether the previous correlation is
lower or higher than the sought correlation.
If the correlation is too low it will switch the number belonging to the index with the number belonging to a random index lower.
If the correlation is too high it will switch the number belonging to the index with the number belonging to a random index higher.
It will then check whether the new correlation is better than the old one, and keep the one closest to the wanted correlation.
It will keep going over all the indices in order (from 1 to 100), and after every iteration it then checks whether it is within the wanted correlation +/- tolerance and return the permuted variable.
Usually in around 2000 iterations the specified correlation will be found by a tolerance of 0.0005.
Index in the picture represents iterations.
My question is how to do this permutation in a smarter way, such that the correlation will be quicker found.
Based on flodel's idea to, at each iteration, propose several candidates. Here it actually tests all candidates; while this is fine for my variables of length 100, a sample should be preferred later for more cases.
AnnealCor <- function(x, y, corpop, tol) {
while(abs(cor(x,y) - corpop) > tol) {
for (i in 1:length(y)) {
numbers <- 1:length(y)
correlation <- 1:length(y)
for (j in numbers) {
switcher <- y
switcher[c(i,j)] <- y[c(j,i)]
correlation[j] <- cor(x, switcher)
}
tokeep <- which(abs(correlation - corpop) == min(abs(correlation - corpop)))[1]
y[c(i, tokeep)] <- y[c(tokeep,i)]
if (abs(cor(x,y) - corpop) < tol) {break}
}
}
return(y)
}
Benchmark time based on 100 repetitions has a median of 200 miliseconds.

Resources