I have a stochastic SIR model that has the variables time T, contact rate beta, recovery rate gamma and number of susceptible n plus number of infectious m. And I would like to find a way to get a time varying value of beta.
I would like to have (at least) two different values of beta that change depending on the time T. My thought is to have beta == beta1 for t <= T/2 and then beta == beta2 for t > T, which means that I have one value for beta of the first half of the time and another value for the last half of the time. And I wonder if this can be implemented in any way? The following is my code for the model:
rSIR <- function(T, beta, gamma, n, m) {
t <- 0
x <- n # Susceptibles
y <- m # Infectious
# Possible events
eventLevels <- c("S->I","I->R")
# Initialize result
events <- data.frame(t=t,x=x,y=y,event=NA)
# Loop
while (t < T & (y>0)) {
# Draw
wait <- rexp(2,c("S->I"=beta*x*y,"I->R"=gamma*y))
# Which event occurs first
i <- which.min(wait)
# Advance time
t <- t+wait[i]
# Update population
if (eventLevels[i] == "S->I") {x <- x-1; y <- y+1}
if (eventLevels[i] == "I->R") {y <- y-1}
# Store results
events <- rbind(events,c(t,x,y,i))
}
events$event<- factor(eventLevels[events$event], levels=eventLevels)
return(events)
}
I have tried to do this by adding two if statements in the while loop where I repeat the code, one for beta1 which I define straight after the while loop is created, and another for beta2 where the if statements are the ones I mentioned in the beginning, i.e. if t <= T/2 and if t > T/2 but this does not seem to work. So I'm wondering if there is a more convenient way to specify this, maybe a statement at the beginning of the loop that directly specifies what the beta should be depending on where in time T we are?
Related
I am simulating a basic Galton-Watson process (GWP) using a geometric distribution. I'm using this to find the probability of extinction for each generation. My question is, how do I find the generation at which the probability of extinction is equal to 1?
For example, I can create a function for the GWP like so:
# Galton-Watson Process for geometric distribution
GWP <- function(n, p) {
Sn <- c(1, rep(0, n))
for (i in 2:(n + 1)) {
Sn[i] <- sum(rgeom(Sn[i - 1], p))
}
return(Sn)
}
where, n is the number of generations.
Then, if I set the geometric distribution parameter p = 0.25... then to calculate the probability of extinction for, say, generation 10, I just do this:
N <- 10 # Number of elements in the initial population.
GWn <- replicate(N, GWP(10, 0.25)[10])
probExtinction <- sum(GWn==0)/N
probExtinction
This will give me the probability of extinction for generation 10... to find the probability of extinction for each generation I have to change the index value (to the corresponding generation number) when creating GWn... But what I'm trying to do is find at which generation will the probability of extinction = 1.
Any suggestions as to how I might go about solving this problem?
I can tell you how you would do this problem in principle, but I'm going to suggest that you may run into some difficulties (if you already know everything I'm about to say, just take it as advice to the next reader ...)
theoretically, the Galton-Watson process extinction probability never goes exactly to 1 (unless prob==1, or in the infinite-time limit)
of course, for any given replicate and random-number seed you can compute the first time point (if any) at which all of your lineages have gone extinct. This will be highly variable across runs, depending on the random-number seed ...
the distribution of extinction times is extremely skewed; lineages that don't go extinct immediately will last a loooong time ...
I modified your GWP function in two ways to make it more efficient: (1) stop the simulation when the lineage goes extinct; (2) replace the sum of geometric deviates with a single negative binomial deviate (see here)
GWP <- function(n, p) {
Sn <- c(1, rep(0, n))
for (i in 2:(n + 1)) {
Sn[i] <- rnbinom(1, size=Sn[i - 1], prob=p)
if (Sn[i]==0) break ## extinct, bail out
}
return(Sn)
}
The basic strategy now is: (1) run the simulations for a while, keep the entire trajectory; (2) compute extinction probability in every generation; (3) find the first generation such that p==1.
set.seed(101)
N <- 10 # Number of elements in the initial population.
maxgen <- 100
GWn <- replicate(N, GWP(maxgen, 0.5), simplify="array")
probExtinction <- rowSums(GWn==0)/N
which(probExtinction==1)[1]
(Subtract 1 from the last result if you want to start indexing from generation 0.) In this case the answer is NA, because there's 1/10 lineages that manages to stay alive (and indeed gets very large, so it will probably persist almost forever)
plot(0:maxgen, probExtinction, type="s") ## plot extinction probability
matplot(1+GWn,type="l",lty=1,col=1,log="y") ## plot lineage sizes (log(1+x) scale)
## demonstration that (sum(rgeom(n,...)) is equiv to rnbinom(1,size=n,...)
nmax <- 70
plot(prop.table(table(replicate(10000, sum(rgeom(10, prob=0.3))))),
xlim=c(0,nmax))
points(0:nmax,dnbinom(0:nmax, size=10, prob=0.3), col=2,pch=16)
Please help with the following question.
The experiment involved mice; feeding them two diets: high-fat diet and normal diet (control group). The data below contains the weights of all female mice (population) that received the normal diet. The data can be downloaded from GitHub running the following command lines in R:
library(downloader)
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/femaleControlsPopulation.csv"
filename <- basename(url)
download(url, destfile = filename)
x <- unlist(read.csv(filename))
Here x represents the weights of the entire population.
So, the question is:
Set the seed at 1, then using a for-loop take a random sample of 5 mice 1,000 (one thousand) times. Save the averages.
What proportion of these 1,000 averages are more than 1 gram away from the average x?
Below is what I have tried using the ‘sum’ & ‘mean()’ function:
set.seed(1)
n <- 1000
sample1 <- vector("numeric", n)
for (i in 1: n) {
sample1[i] <- mean (sample (x, 5))
}
sum(sample1 > mean(x) / n)
mean(sample1 > mean(x)+1)
So this step is where I need the help…because I am not sure how to deal with ‘1 gram away from average of x’ statement in the question.
Thank you in advance for your help.
Looks like homework, so I'll give some hints:
In your second code block, the last two statements seem off.
n <- 1000
sample1 <- vector("numeric", n)
for (i in 1: n) {
sample1[i] <- mean (sample (x, 5))
}
sum(sample1 > mean(x) / n) #<- why dividing by n here?
mean(sample1 > mean(x)+1) #<- what are you trying to do here?
Why are you dividing the mean of the overall sample by n?
The call to mean does seem to make sense.
I don't think you need the second statement, mean(sample1 > mean(x)+1) to get your answer.
You need an inequality in the sum() statement that will be TRUE for every value that is outside the range of mean(x) - 1 to mean(x) + 1. Or, the number less than mean(x) -1 plus the number greater than mean(x) + 1.
Does that help?
On the loop part you are doing correctly, for the ##What proportion of these 1,000 averages are more than 1 gram away from the average x?##sum(abs(null)-mean(population)>1)/n
Consider the Poisson distribution x1, x2, ... ~ pois(1) with lambda=1.
I want to write a function that receives a number as input (consider it a) and gives us as output the smallest n (minimum n) which is true for sum(xi)>=a, i=1:n.
I think using a while loop could be fine for this situation (but I am not sure that it's the best way). Perhaps it can be done using other loops like for loop. I do not know how to handle this situation containing a Poisson distribution in R?
A while loop is simple enough to code. It accumulates the values of rpois in s for sum while s < a, counting the iterations in n.
minpois <- function(a){
n <- 0L
s <- 0L
while(s < a) {
n <- n + 1L
s <- s + rpois(1L, lambda = 1)
}
n
}
set.seed(2020)
minpois(10)
#[1] 12
I'm trying to simulate a compound Poisson process in r. The process is defined by $ \sum_{j=1}^{N_t} Y_j $ where $Y_n$ is i.i.d sequence independent $N(0,1)$ values and $N_t$ is a Poisson process with parameter $1$. I'm trying to simulate this in r without luck. I have an algorithm to compute this as follows:
Simutale the cPp from 0 to T:
Initiate: $ k = 0 $
Repeat while $\sum_{i=1}^k T_i < T$
Set $k = k+1$
Simulate $T_k \sim exp(\lambda)$ (in my case $\lambda = 1$)
Simulate $Y_k \sim N(0,1)$ (This is just a special case, I would like to be able to change this to any distribution)
The trajectory is given by $X_t = \sum_{j=1}^{N_t} Y_j $ where $N(t) = sup(k : \sum_{i=1}^k T_i \leq t )$
Can someone help me to simulate this in r so that I can plot the process? I have tried, but can't get it done.
Use cumsum for the cumulative sums that determine the times N_t as well as the X_t. This illustrative code specifies the number of times to simulate, n, simulates the times in n.t and the values in x, and (to display what it has done) plots the trajectory.
n <- 1e2
n.t <- cumsum(rexp(n))
x <- c(0,cumsum(rnorm(n)))
plot(stepfun(n.t, x), xlab="t", ylab="X")
This algorithm, since it relies on low-level optimized functions, is fast: the six-year-old system I tested it on will generate over three million (time, value) pairs per second.
That's usually good enough for simulation, but it doesn't quite satisfy the problem, which asks to generate a simulation out to time T. We can leverage the preceding code, but the solution is a little trickier. It computes a reasonable upper limit on how many times will occur in the Poisson process before time T. It generates the inter-arrival times. This is wrapped in a loop that will repeat the procedure in the (rare) event the time T is not actually reached.
The additional complexity doesn't change the asymptotic calculation time.
T <- 1e2 # Specify the end time
T.max <- 0 # Last time encountered
n.t <- numeric(0) # Inter-arrival times
while (T.max < T) {
#
# Estimate how many random values to generate before exceeding T.
#
T.remaining <- T - T.max
n <- ceiling(T.remaining + 3*sqrt(T.remaining))
#
# Continue the Poisson process.
#
n.new <- rexp(n)
n.t <- c(n.t, n.new)
T.max <- T.max + sum(n.new)
}
#
# Sum the inter-arrival times and cut them off after time T.
#
n.t <- cumsum(n.t)
n.t <- n.t[n.t <= T]
#
# Generate the iid random values and accumulate their sums.
#
x <- c(0,cumsum(rnorm(length(n.t))))
#
# Display the result.
#
plot(stepfun(n.t, x), xlab="t", ylab="X", sub=paste("n =", length(n.t)))
I'm brand new to R and trying to implement a simple model (which I will extend later) that deals with corporate bond defaults.
For starters, I'm using only two clients.
Parameters:
- two clients (which I name "A" and "B")
- a cash flow of $10,000 will be received from each client if they do not default within 10 years
- pulling together concepts using standard normal random variables, dependent uniform random variables and Gaussian copulas
- run some number of simulations
- store the sum of Client A cash flow plus Client B cash flow and store in a vector named "result"
- finally, take the average of the result vector
My code is:
# define variables
nSim <- 5 # of simulations
rho <- 0.3 # rho
lambda <- 0.01 # default intensity
T <- 10 # time to default
for (i in 1:nSim){
# Step 1: generate 2 independent standard normal random variables
z1 <- rnorm(1, mean=0, sd=1)
z2 <- rnorm(1, mean=0, sd=1)
# Step 2: map the normals into correlated normals
# by Cholesky composition of the correlation matrix
# w1 = z1
# w2 = rho(z1)+sqrt(1-(rho^2))*z2
w1 <- z1
w2 <- rho*z1 - sqrt(1-(rho^2))*z2
# Step 3: using the correlated normals, generate two dependent uniform variables
u <- runif(1, min=0, max=1)
v <- runif(1, min=0, max=1)
# Step 4: using the dependent uniforms, generate two dependent exponentials
tau.A <- (-1/lambda)*log(u)
tau.B <- (-1/lambda)*log(v)
payout.A <- if (tau.A > 10) {10000} else {0}
payout.B <- if (tau.B > 10) {10000} else {0}
result[i] = (payout.A[i] + payout.B[i])
}
# calculate expected value of portfolio
mean(result)
When I run this code, I'm getting an error of "NA" and can't figure out why (again, I'm brand new to R). I don't think each of the simulation values is being stored in the results vector, but don't know how to diagnose the problem.
Thanks in advance to anyone who can help!
--Sarah
Everything works until the results[i] <- (payout.A[i] + payout.B[i]) line. The problem is you never set results.
Before your for loop, add the line:
results <- vector('numeric', length = nSim)
This will create a vector of 0s with a length of nSim. In R is is best to preallocate the space instead of dynamically growing a vector using c().
No the problem is the presence of the [i] assignments in the results[i] <- (payout.A[i] + payout.B[i]) line.
The [i] assignment is okay for the results parameter but not the two payout parameters because each of these are being generated in each loop. So simply remove them to form the line:
results[i] <- (payout.A + payout.B)
will solve your issue. If you wish to keep each payout in its own vector then you need to assign it as such, but it seems that you don't.