I've been simulating a population dynamics model, and added in some environmental stochasticity by making the value of one of the parameters time-varying.
(To be more specific, I made thermal performance curves that relate the temperature of the system to the growth rates of two species within the system. I then randomly sampled some temperatures to create a vector of those temperatures and their corresponding growth rates. I then set up the simulation such that the value of the growth rate parameters could change with the temperature of the system over time.)
Now, I want to add some demographic stochasticity to my system using the Gillespie algorithm, and specifically, the GillespieSSA package in R. I've run into trouble trying to integrate my existing environmentally stochastic implementation with the arguments the ssa functions takes.
This is my environmentally stochastic implementation:
ri <- QuadEqn_1(temp = temp_sequence); rj <- QuadEqn_2(temp = temp_sequence) # Where QuadEqn_1 and _2 are the thermal performance curves that give the growth rate, when given the temperature of the system, "temp_sequence", which is a result of a random draw not shown here
k <- 0.001; p <- 1 ; o <- 1000
parms <- list(ri = ri, rj = rj, k = k, p = p, o = o)
Antia_3sp_Model <- function(t,y,p1){
tt <- floor(t) + 1
Pi <- y[1]; Pj <- y[2]; I <- y[3]
ri <- p1$ri[tt]; rj <- p1$rj[tt]; k <- p1$k; p <- p1$p; o <- p1$o # This is the line that allows the values of the growth rate parameters to change with time, tt
dPi = ri*Pi - k*Pi*I # The model
dPj = rj*Pj - k*Pj*I
dI = p*I*(Pi/(Pi + o) + Pj/(Pj + o))
list(c(dPi,dPj,dI))
}
N0 <- c(Pi = 1, Pj = 0, I = 1) # Initial values of the state variables
TT <- round(seq(0.1, 50, 0.1), 1)
eventdat <- data.frame(var = "Pj", time = 1, value = 1, method = "rep") # Allows one species to be introduced at different time points
results <- lsoda(N0, TT, Antia_3sp_Model, p = parms, events=list(data=eventdat), verbose = TRUE)
Using the GillespieSSA package requires a propensity vector:
a <- c("ri*Pi",
"k*Pi*I",
"rj*Pj",
"k*Pj*I",
"p*I*(Pi/(Pi + o)",
"p*I*(Pj/(Pj + o)")
And a state change matrix:
nu <- matrix(c(+1, -1, 0, 0, 0, 0,
0, 0, +1, -1, 0, 0,
0, 0, 0, 0, +1, +1), nrow = 3, byrow = TRUE)
And is implemented using the ssa function, which should look something like this:
TestOutput <- ssa(x0, a, nu, parms1, TT, method, simName...)
So, I guess my question is: given that I pass GillespieSSA the model through the propensity vector and the state change matrix, how can I include a time-varying parameter, as I have in my environmentally stochastic implementation?
I'm pretty lost on this one, so any suggestions would be greatly appreciated :)
Related
I am trying to simulate fBm with its integral representation. I know that there are faster methods out there, but i would like to play around with the kernel function inside the integral.
enter link description here
My appraoch was to simulate the stochastic integral cummulatively.
set.seed(100)
dt = 0.01
T = seq(-50,10,0.1)
n=length(T)
m=length(T)
Gamma = matrix(0, n, m)
H=0.9
exponent <- function(a, pow) (abs(a)^pow)*sign(a)
for(j in 1:length(T)){
zeile = numeric(length(T)) #resetting our path for each j
y = sqrt(dt)*rnorm(n=1, mean = 0, sd=1) #normal distributred r.V.
zeile[1] = (max(exponent(T[j] - T[1],H-0.5),0) - max(exponent(-T[1],H-0.5),0))*y #first entry of one path
for(i in 1:(length(T)-1)){
y1 = sqrt(dt)*rnorm(n=1, mean = 0, sd=1)
y2 = sqrt(dt)*rnorm(n=1, mean = 0, sd=1)
zeile[i+1] = zeile[i] + max(exponent(T[j] - T[i],H-0.5),0)*y1 - max(exponent(-T[i],H-0.5),0)*y2
}
Gamma[j,] = zeile
}
normalV = rnorm(length(T),mean =0,sd=1)
path = Gamma%*%normalV
plot(T, path, type="l")
T is the interval over which we will plot. The first for loop is there to go over each time point and fix it in the second for loop in which we fill up one row of the matrix. After we have our matrix, I thought I have to multiply it by a N(0,1) vektor in order to get one path of fBm. Clearly I do not.
I'm trying to recreate something similar to an image in modern actuarial risk theory using R: https://www.academia.edu/37238799/Modern_Actuarial_Risk_Theory (page 89)
Click here for image
In my case, the drops are of size based on an exponential distribution with parameter 1/2000 and they are spaced apart with Poisson inter arrival times which means they are distributed exponentially with a rate parameter of 0.25 (in my model)
The value of U is given by an initial surplus plus a premium income (c) per unit time (for an amount of time determined by the inter arrival distribution) minus a claim amount which would be random from the exponential distribution mentioned above.
I have a feeling a loop will need to be used and this is what I have so far:
lambda <- 0.25
EX <- 2000
theta <- 0.5
c <- lambda*EX*(1+theta)
x <- rexp(1, 1/2000)
s <- function(t1){for(t1 in 1:10){v <- c(rep(rexp(t1,1/2000)))
print(sum(v))}}
u <- function(t){10000+c*t}
plot(u, xlab = "t", xlim = c(-1,10), ylim = c(0,20000))
abline(v=0)
for(t1 in 1:10){v <- c(rep(rexp(t1,1/2000)))
print(sum(v))}
The end goal is to run this simulation say 10,000 times over a 10 year span and use it as a visible representation as the rate of ruin for an insurance company.
Any help appreciated.
I think you're looking for something like this, all wrapped up in a neat function which by default draws the plot, but if wanted simply returns "ruin" or "safe" so you can run it in simulation:
simulate_ruin <- function(lambda = 0.25, EX = 2000,
theta = 0.5, initial_amount = 10000,
max_time = 10, draw = TRUE) {
income_per_year <- lambda * EX * (1 + theta)
# Simulate a Poisson process. Include the initial time 0,
# and replicate every other time point so we have values "before" and
# "after" each drop
times <- c(0, rep(cumsum(rexp(1000, lambda)), each = 2))
times <- c(times[times < max_time], max_time)
# This would be our income if there were no drops (a straight line)
total_without_drops <- initial_amount + (income_per_year * times)
# Now simulate some drops.
drop_size <- rexp((length(times) - 1) / 2, 1/2000)
# Starting from times[3], we apply our cumulative drops every second index:
payout_total <- rep(c(0, cumsum(drop_size)), each = 2)
total <- total_without_drops - payout_total
if(draw) {
plot(times, total, type = "l", ylim = c(-1000, 20000))
abline(h = 0, lty = 2)
} else {
if(any(total < 0))
return("ruin")
else
return("safe")
}
}
So we can call it once for a simulation:
simulate_ruin()
And again for a different simulation
simulate_ruin()
And table the results of 10,000 simulations to find the rate of ruin, which turns out to be around 3%
table(sapply(1:10000, function(x) simulate_ruin(draw = FALSE)))
#>
#> ruin safe
#> 305 9695
Created on 2022-04-06 by the reprex package (v2.0.1)
I am trying to establish a method of estimating infectious disease parameters by comparing real epidemic curves with simulations of a stochastic SIR model. To construct the stochastic SIR model, I am using the deSolve package and instead of using fixed parameter values I would like to draw the parameter value used in the equations at each time point from a Poisson distribution centered on the original parameter values.
Using the parameter beta as an example, beta represents the average number of transmission events per capita and is the product of the average number of contacts and the probability that transmission occurs upon contact. Realistically, there is variation in the number of contacts a person will have and since transmission is also a probabilistic event there is variation surrounding this too.
So even if the average transmission rate were to be 2.4 (for example), an individual can go on to infect 0, 1, 2 or 3 ... etc. people with varying probabilities.
I have tried to incorporate this into my code below using the rpois function and reassigning the parameters used in the equations to the outputs of the rpois.
I have run my code with the same initial values and parameters multiple times and all the curves are different indicating that SOMETHING "stochastic" is going on, but I am unsure whether the code is sampling using the rpois at each time point or just once at the beginning. I have only started coding very recently so do not have much experience.
I would be grateful if anyone more experienced than myself could verify what my code is ACTUALLY doing and whether it is sampling using rpois at each time point or not. If not I would be grateful for any suggestions for achieving this. Perhaps a loop is needed?
library('deSolve')
library('reshape2')
library('ggplot2')
#MODEL INPUTS
initial_state_values <- c(S = 10000,
I = 1,
R = 0)
#PARAMETERS
parameters <- c(beta = 2.4,
gamma = 0.1)
#POISSON MODELLING OF PARAMETERS
#BETA
beta_p <- rpois(1, parameters[1])
#GAMMA
infectious_period_p <- rpois(1, 1/(parameters[2]))
gamma_p <- 1/infectious_period_p
#TIMESTEPS
times <- seq(from = 0, to = 50,by = 1)
#SIR MODEL FUNCTION
sir_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
N <- S + I + R
lambda <- beta_p * I/N
dS <- -lambda * S
dI <- lambda*S - gamma_p*I
dR <- gamma_p*I
return(list(c(dS, dI, dR)))
})
}
output<- as.data.frame(ode(y= initial_state_values,
times = times,
func = sir_model,
parms = parameters))
The code given in the question runs the model with constant parameters over time. Here an example with parameters varying over time. However, this setting assumes that for a given time step, the parameters are equal for all indidividuals of the population. If you want to have individual variability, one can either use a matrix formulation for different sub-populations or use an individual model instead.
Model with fluctuating population parameters:
library('deSolve')
initial_state_values <- c(S = 10000,
I = 1,
R = 0)
parameters <- c(beta = 2.4, gamma = 0.1)
times <- seq(from = 0, to = 50, by = 1) # note time step = 1!
# +1 to add one for time = zero
beta_p <- rpois(max(times) + 1, parameters[1])
infectious_period_p <- rpois(max(times) + 1, 1/(parameters[2]))
gamma_p <- 1/infectious_period_p
sir_model <- function(time, state, parameters) {
# cat(time, "\n") # show time steps for debugging
with(as.list(c(state, parameters)), {
# this overwrites the parms passed via parameters
beta <- beta_p[floor(time) + 1]
gamma <- gamma_p[floor(time) + 1]
N <- S + I + R
lambda <- beta * I/N
dS <- -lambda * S
dI <- lambda * S - gamma * I
dR <- gamma * I
list(c(dS, dI, dR))
})
}
output <- ode(y = initial_state_values,
times = times,
func = sir_model,
parms = parameters)
plot(output)
Here another, slightly more generalized version. It is added as a second answer, to keep the original version compact and simple. The new version differs with respect to the following:
generalized, so that it can work with fixed parameters and stochastic forcing
pass parameters as list
run a basic Monte-Carlo simulation
library('deSolve')
sir_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
# this overwrites the parms passed via parameters
if (time_dependent) {
beta <- beta_p[floor(time) + 1]
gamma <- gamma_p[floor(time) + 1]
}
N <- S + I + R
lambda <- beta * I/N
dS <- -lambda * S
dI <- lambda * S - gamma * I
dR <- gamma * I
list(c(dS, dI, dR))
})
}
initial_state_values <- c(S = 10000, I = 1, R = 0)
times <- seq(from = 0, to = 50, by = 1) # note time step = 1!
## (1) standard simulation with constant parameters
parameters <- c(beta = 2.4, gamma = 0.1)
out0 <- ode(y= initial_state_values,
times = times,
func = sir_model,
parms = c(parameters, time_dependent = FALSE))
plot(out0)
## (2) single simulation with time varying parameters
beta_p <- rpois(max(times) + 1, parameters[1])
infectious_period_p <- rpois(times + 1, 1/(parameters[2]))
gamma_p <- 1/infectious_period_p
## here we need pass the vectorized parameters globally
## for simplicity, it can also be done as list
out1 <- ode(y = initial_state_values, times = times,
func = sir_model, parms = c(time_dependent = TRUE))
plot(out0, out1)
## (3) a sample of simulations
monte_carlo <- function(i) {
#parameters <- c(beta = 2.4, gamma = 0.1)
beta_p <- rpois(max(times) + 1, parameters[1])
infectious_period_p <- rpois(max(times) + 1, 1 / (parameters[2]))
gamma_p <- 1/infectious_period_p
ode(y = initial_state_values, times = times,
func = sir_model, parms = list(beta_p = beta_p,
gamma_p = gamma_p,
time_dependent = TRUE))
}
## run 10 simulations
out_mc <- lapply(1:10, monte_carlo)
plot(out0, out_mc, mfrow=c(1, 3))
I would like to understand how the R kknn package calculates weights, distances, and class probabilities for binary classification problems. In the R code below, there are three observations in the training sample and one observation in the holdout sample. The two predictor variables are height and weight. With Euclidean distance, the distances for each observation in the training sample are then:
sqrt((6-8)^2 + (4-5)^2) = 2.24
sqrt((6-3)^2 + (4-7)^2) = 4.24
sqrt((6-7)^2 + (4-3)^2) = 1.41.
With k=3 and with equal weights, I get a probability for the holdout as:
(1/3 * 1) + (1/3 * 0) + (1/3 * 1) = 0.67.
With k=2 and with equal weights, I get a probability for the holdout as:
(1/2 * 1) + (1/2 * 1) = 1.00.
I would like to understand how the R kknn package makes these same calculations with the "triangular," "gaussian," and "inverse" weights (and more generally).
library(kknn)
training <- data.frame(class = c(1, 0, 1), height = c(8, 3, 7), weight = c(5, 7, 3))
holdouts <- data.frame(class = 1, height = 6, weight = 4)
triangular_kernel <- kknn(class ~., training, holdouts, distance = 2, kernel = "triangular", k = 3)
triangular_kernel[["fitted.values"]]
triangular_kernel[["W"]]
triangular_kernel[["D"]]
gaussian_kernel <- kknn(class ~., training, holdouts, distance = 2, kernel = "gaussian", k = 3)
gaussian_kernel[["fitted.values"]]
gaussian_kernel[["W"]]
gaussian_kernel[["D"]]
inverse_kernel <- kknn(class ~., training, holdouts, distance = 2, kernel = "inv", k = 3)
inverse_kernel[["fitted.values"]]
inverse_kernel[["W"]]
inverse_kernel[["D"]]
Calling kknn::kknn prints the source code for the kknn function in the console. With it, one can go through the function line by line to see what it does.
Distance
kknn calls a compiled C code dmEuclid. To obtain its source code, we follow this guide, writing the following code in R:
untar(download.packages(pkgs = "kknn", destdir = ".", type = "source")[,2])
and then open the src directory of kknn_1.3.1.tar in your working directory (getwd()) to find and open dm.C using any text editor. Scroll about halfway to find dmEuclid. To test the exact outputs of dmEuclid, you could install the build tools, and open a C++ file in Rstudio by selecting it in the dropdown menu, and run the code with different inputs.
Following the function outputs, in your case the dmtmp$dm results in
3.779645e-01 1.133893e+00 1.000000e+150 3.685210e-156
Per your specification k, the first 3 values are chosen as distance D.
This is manually converted to maxdist = 1e-06 by the package author, as the max distance is smaller than that in your case.
Weights
The kknn function uses the following section to allocate a weight scheme, per your defined kernel.
W <- D/maxdist
W <- pmin(W, 1 - (1e-06))
W <- pmax(W, 1e-06)
at this point your W values are larger than 1, and so W is then coerced to approximately 1.
if (kernel == "inv"
W <- 1/W
if (kernel == "triangular")
W <- 1 - W
if (kernel == "gaussian") {
alpha = 1/(2 * (k + 1))
qua = abs(qnorm(alpha))
W = W * qua
W = dnorm(W, sd = 1)
}
the explanation for which can be found in the paper linked by gowerc.
W is then converted to matrix W <- matrix(W, p, k) with 1 row (p=1), 3 columns (k=3)
Fitted value
p = 1 in your case is 1, k=3, cl = c(1,0,1).
C <- matrix(dmtmp$cl, nrow = p, ncol = k + 1)
C <- C[, 1:k] + 1
CL <- matrix(cl[C], nrow = p, ncol = k)
W <- matrix(W, p, k)
fit <- rowSums(W * CL)/pmax(rowSums(W), 1e-06)
I am trying to simulate the transmission of viruses in a population using the function ode from the deSolve package. The basic of my model is a SIR model and I posted a much simpler demo of my model here, which consists of only three states S(susceptible), I(infectious) and R(recovered). Each state is represented by a m*n matrix in my code, since I have m age groups and n subpopulations in my population.
The problem is: during the simulation period, there will be several vaccination activities that transfer people in state S to state I. Each vaccination activity is characterized by a begin date, an end date, its coverage rate and duration. What I want to do is once the time t falls into the interval of begin date and end date of one vaccination activity, the code calculates the effective vaccination rate (also a m*n matrix, based on coverage rate and duration) and times it with S (m*n matrix), to get a matrix of people transited to state I. Right now, I am using if() to decide if time t is between a begin date and a end date:
#initialize the matrix of effective vaccination rate
irrate_matrix = matrix(data = rep(0, m*n), nrow = m, ncol = n)
for (i in 1:length(tbegin)){
if (t>=tbegin[i] & t<=tend[i]){
for (j in 1:n){
irrate_matrix[, j] = -log(1-covir[(j-1)*length(tbegin)+i])/duration[i]
}
}
}
Here, irrate_matrix is the m*n effective vaccination rate matrix, m = 2 is the number of age groups, n = 2 is the number of subpopulations, tbegin = c(5, 20, 35) is the begin date of 3 vaccination activities, tend = c(8, 23, 38) is the end date of 3 vaccination activities, covir = c(0.35, 0.25, 0.25, 0.225, 0.18, 0.13) is the coverage rate of each vaccination for each subpopulation (e.g., covir[1] = 0.35 is the coverage rate of the first vaccination for subpopulation1, while covir[4] = 0.225 is the coverage rate of the first vaccination for subpopulation2) and duration = c(4, 4, 4) is the duration of each vaccination (in days).
After calculating irrate_matrix, I take it into derivatives and therefore I have:
dS = as.matrix(b*N) - as.matrix(irrate_matrix*S) - as.matrix(mu*S)
dI = as.matrix(irrate_matrix*S) - as.matrix(gammaS*I) - as.matrix(mu*I)
dR = as.matrix(gammaS*I) - as.matrix(mu*R)
I want to do a simulation from day 0 to day 50, by 1-day step, thus:
times = seq(0, 50, 1)
The current issue with my code is: every time the time t comes to a time point close to a tbegin[i] or tend[i], the simulation becomes much slower since it iterates at this time point for much more rounds than at any other time point. For example, once the time t comes to tbegin[1] = 5, the model iterates at time point 5 for many rounds. I attached screenshots from printing out those iterations (screenshot1 and screenshot2). I find this is why my bigger model takes a very long running time now.
I have tried using the "events" function of deSolve mentioned by tpetzoldt in this question stackoverflow: change the value of a parameter as a function of time. However, I found it's inconvenient for me to change a matrix of parameters and change it every time there is a vaccination activity.
I am looking for solutions regarding:
How to change my irrate_matrix to non-zero matrix when there is a vaccination activity and let it be zero matrix when there is no vaccination? (it has to be calculated for each vaccination)
At the same time, how to make the code run faster by avoiding iterating at any tbegin[i] or tend[i] for many rounds? (I think I should not use if() but I do not know what I should do with my case)
If I need to use "forcing" or "events" function, could you please also tell me how to have multiple "forcing"/"events" in the model? Right now, I have had an "events" used in my bigger model to introduce a virus to the population, as:
virusevents = data.frame(var = "I1", time = 2, value = 1, method = "add")
Any good idea is welcome and directly providing some codes is much appreciated! Thank you in advance!
For reference, I post the whole demo here:
library(deSolve)
##################################
###(1) define the sir function####
##################################
sir_basic <- function (t, x, params)
{ # retrieve initial states
S = matrix(data = x[(0*m*n+1):(1*m*n)], nrow = m, ncol = n)
I = matrix(data = x[(1*m*n+1):(2*m*n)], nrow = m, ncol = n)
R = matrix(data = x[(2*m*n+1):(3*m*n)], nrow = m, ncol = n)
with(as.list(params), {
N = as.matrix(S + I + R)
# print out current iteration
print(paste0("Total population at time ", t, " is ", sum(N)))
# calculate irrate_matrix by checking time t
irrate_matrix = matrix(data = rep(0, m*n), nrow = m, ncol = n)
for (i in 1:length(tbegin)){
if (t>=tbegin[i] & t<=tend[i]){
for (j in 1:n){
irrate_matrix[, j] = -log(1-covir[(j-1)*length(tbegin)+i])/duration[i]
}
}
}
# derivatives
dS = as.matrix(b*N) - as.matrix(irrate_matrix*S) - as.matrix(mu*S)
dI = as.matrix(irrate_matrix*S) - as.matrix(gammaS*I) - as.matrix(mu*I)
dR = as.matrix(gammaS*I) - as.matrix(mu*R)
derivatives <- c(dS, dI, dR)
list(derivatives)
})
}
##################################
###(2) characterize parameters####
##################################
m = 2 # the number of age groups
n = 2 # the number of sub-populations
tbegin = c(5, 20, 35) # begin dates
tend = c(8, 23, 38) # end dates
duration = c(4, 4, 4) # duration
covir = c(0.35, 0.25, 0.25, 0.225, 0.18, 0.13) # coverage rates
b = 0.0006 # daily birth rate
mu = 0.0006 # daily death rate
gammaS = 0.05 # transition rate from I to R
parameters = c(m = m, n = n,
tbegin = tbegin, tend = tend, duration = duration, covir = covir,
b = b, mu = mu, gammaS = gammaS)
##################################
#######(3) initial states ########
##################################
inits = c(
S = c(20000, 40000, 10000, 20000),
I = rep(0, m*n),
R = rep(0, m*n)
)
##################################
#######(4) run simulations########
##################################
times = seq(0, 50, 1)
traj <- ode(func = sir_basic,
y = inits,
parms = parameters,
times = times)
plot(traj)
Element wise operations are the same for matrices and vectors, so the as.matrix conversions are redundant, as no true matrix multiplication is used. Same with the rep: the zero is recycled anyway.
In effect, CPU time reduces already to 50%. In contrast, use of an external forcing with approxTime instead of the inner if and for made the model slower (not shown).
Simplified code
sir_basic2 <- function (t, x, params)
{ # retrieve initial states
S = x[(0*m*n+1):(1*m*n)]
I = x[(1*m*n+1):(2*m*n)]
R = x[(2*m*n+1):(3*m*n)]
with(as.list(params), {
N = S + I + R
# print out current iteration
#print(paste0("Total population at time ", t, " is ", sum(N)))
# calculate irrate_matrix by checking time t
irrate_matrix = matrix(data = 0, nrow = m, ncol = n)
for (i in 1:length(tbegin)){
if (t >= tbegin[i] & t <= tend[i]){
for (j in 1:n){
irrate_matrix[, j] = -log(1-covir[(j-1) * length(tbegin)+i])/duration[i]
}
}
}
# derivatives
dS = b*N - irrate_matrix*S - mu*S
dI = irrate_matrix*S - gammaS*I - mu*I
dR = gammaS*I - mu*R
list(c(dS, dI, dR))
})
}
Benchmark
Each model version is run 10 times. Model sir_basic is the original implementation, where print line was disabled for a fair comparison.
system.time(
for(i in 1:10)
traj <- ode(func = sir_basic,
y = inits,
parms = parameters,
times = times)
)
system.time(
for(i in 1:10)
traj2 <- ode(func = sir_basic2,
y = inits,
parms = parameters,
times = times)
)
plot(traj, traj2)
summary(traj - traj2)
I observed another considerable speedup, when I use method="adams" instead of the default lsoda solver, but this may differ for your full model.