I am developing an agent-based model to simulate the spread of infectious diseases in heterogeneous landscapes composed of habitat polygons (or clumps of connected cells). To simplify the model, I consider a habitat grid (or raster) containing the polygon ID of each cell. In addition, I have epidemiological parameters associated with each polygon ID. At each time step, the parameter values change in the polygon. Thus, the data frame landscape (see below) is updated at each time step. Here is an example at t = 0:
landscape <- data.frame(polygon_ID = seq(1, 10, by = 1),
beta = sample(c(100, 200, 400, 600), 10, replace = TRUE),
gamma = sample(c(25, 26, 27, 28), 10, replace = TRUE))
To study the disease dynamics, I also am developing a compartmental model based on a system of ordinary differential equations (ODEs). Here is an example to represent the system of ODEs:
solve_sir_model <- function (times, parameters) {
sir_model <- function (times, states, parameters) {
with(as.list(c(states, parameters)), {
dSdt <- -beta*S*I
dIdt <- beta*S*I-gamma*I
dRdt <- gamma*I
dNdt <- dSdt + dIdt + dRdt
return(list(c(dSdt, dIdt, dRdt, dNdt)))
})
}
states <- c(S = 99, I = 1, R = 0, N = 100)
return(ode(y = states, times = times, func = sir_model, parms = parameters))
}
require(deSolve)
output <- as.data.frame(solve_sir_model(times = seq(0, 5, by = 1), parameters = c(beta = 400, gamma = 28)))
At each time step, is it possible to apply the system of ODEs to each habitat polygon (thus each row) in the data frame landscape? I am using lsoda as an ODE solver. Do I need to use another solver to apply the ODEs at each time step?
EDIT
It seems that the method iteration in the function ode can be useful in my case:
Method "iteration" is special in that here the function func should
return the new value of the state variables rather than the rate of
change. This can be used for individual based models, for difference
equations, or in those cases where the integration is performed within
func).
I have tested the method but I don't understand why it doesn't work with a single time step:
solve_sir_model <- function (times, parameters) {
sir_model <- function (times, states, parameters) {
with(as.list(c(states, parameters)), {
dSdt <- -beta*S*I
dIdt <- beta*S*I-gamma*I
dRdt <- gamma*I
dNdt <- dSdt + dIdt + dRdt
return(list(c(dSdt, dIdt, dRdt, dNdt)))
})
}
states <- c(S = 99, I = 1, R = 0, N = 100)
return(ode(y = states, times = times, func = sir_model, parms = parameters, method = "iteration"))
}
require(deSolve)
output <- as.data.frame(solve_sir_model(times = 1, parameters = c(beta = 400, gamma = 28)))
Error in iteration(y, times, func, parms, ...) :
times should be equally spaced In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
Something like this.
# Library *not* require
library(deSolve)
# Parameters: number of polygons, beta, & gamma
n.polys <- 10
beta <- sample(c(100, 200, 400, 600), n.polys, replace = TRUE)
gamma <- sample(c(25, 26, 27, 28), n.polys, replace = TRUE)
# Model defintion
sir_model <- function (t, Y, pars) {
# Break up state variable into parts
S <- Y[1:n.polys]
I <- Y[(n.polys+1):(2 * n.polys)]
R <- Y[(2 * n.polys+1):(3 * n.polys)]
# Calculate rate of change
dSdt <- -beta * S * I
dIdt <- beta * S * I - gamma * I
dRdt <- gamma * I
# Return rates of change after concatenating them
return(list(c(dSdt, dIdt, dRdt)))
}
# Initial conditions
Y.ini <- c(S = rep(99, n.polys), I = rep(1, n.polys), R = rep(0, n.polys))
# Solve the model
ode(y = Y.ini, times = 0:5, func = sir_model, parms = NULL)
Related
I am using the deSolve package to solve a differential equation describing predator-prey dynamics. As an example, below is a simple L-V predator-prey model. I would like some of the parameters in the model to vary through time. I can vary state variable (e.g. prey density) no problem using the event argument in the ode function.
But I cannot use the event argument to alter parameters.
Here is the simple L-V model with no events added (works fine)
# Lotka-Volterra Predator-Prey model from ?deSolve::ode
# define model
LVmod <- function(Time, State, Pars) {
with(as.list(c(State, Pars)), {
Ingestion <- rIng * Prey * Predator
GrowthPrey <- rGrow * Prey * (1 - Prey/K)
MortPredator <- rMort * Predator
dPrey <- GrowthPrey - Ingestion
dPredator <- Ingestion * assEff - MortPredator
return(list(c(dPrey, dPredator)))
})
}
# parameters
pars <- c(rIng = 0.2, # rate of ingestion
rGrow = 1.0, # growth rate of prey
rMort = 0.2 , # mortality rate of predator
assEff = 0.5, # assimilation efficiency
K = 10) # carrying capacity
# initial densities (state variables)
yini <- c(Prey = 1, Predator = 2)
# time steps
times <- seq(0, 200, by = 1)
# run model
out <- ode(yini, times, LVmod, pars)
## plot
plot(out)
Here is the L-V model with state variable Prey multiplied by some rnorm()every 5 timesteps (works fine).
# add prey every 5 timesteps using events
add_prey <- function(t, var, parms){
with(as.list(var),{
Prey <- Prey * rnorm(1, 1, 0.05)
return(c(Prey, Predator))
})
}
# run ode - works fine
out <- ode(y = yini,
times = times,
func = LVmod,
parms = pars,
method = "ode45",
events = list(func = add_prey, time = seq(0, 200, by = 5)))
plot(out)
Here is my attempt to increase K every 5 timesteps (does not work)
# vary K through time
add_k <- function(t, var, parms){
with(as.list(var),{
K <- K + 2
return(c(Prey, Predator))
})
}
# run ode
out <- ode(y = yini,
times = times,
func = LVmod,
parms = pars,
method = "ode45",
events = list(func = add_k, time = seq(0, 200, by = 5)))
Which produces this error:
Error in eval(substitute(expr), data, enclos = parent.frame()) :
object 'K' not found
Based on the error K is not being passed to add_k, in add_k the line with(as.list(var) is obviously only accessing the variables Prey and Predator. In the ode and event helpfiles I can only find information regarding altering state variables (Prey and Predator in this case), and no information about altering other parameters. I am new to ODEs, so maybe I am missing something obvious. Any advice would be much appreciated.
Events are used to modify state variables. This means that at each event time, the ode solver is stopped, then states are modified and then the solver is restarted. In case of time-dependent *parameters, this can be done much easier without events. We call this a "forcing function" (or forcing data).
A modified version of the original code is shown below. Here a few explanations:
approxfun is a function that returns a function. We name it K_t that interpolates between the data timesteps t and the parameter values K. The argument rule = 2 is important. It describes how interpolation is to take place outside the range of t, where 2means that the closest value is returned, because some solvers overshoot the end of the simulation and then interpolate back. The method can be constant or linear, whatever is more appropriate.
The interpolation function of the variable model parameter K_t can be added to the function as an optional argument at the end. It is also possible to define it globally or to include it in parmsif it is defined as a list, not a vector.
The LVmod function checks if this additional parameter exists and if yes overwrites the default of K.
At the end of the LVmodfunction we return the actual value of K as an optional (auxiliary) variable, so that it is included in the model output.
library(deSolve)
LVmod <- function(Time, State, Pars, K_t) {
with(as.list(c(State, Pars)), {
if (!is.null(K_t)) K <- K_t(Time)
Ingestion <- rIng * Prey * Predator
GrowthPrey <- rGrow * Prey * (1 - Prey/K)
MortPredator <- rMort * Predator
dPrey <- GrowthPrey - Ingestion
dPredator <- Ingestion * assEff - MortPredator
list(c(dPrey, dPredator), K = K)
})
}
pars <- c(rIng = 0.2, # rate of ingestion
rGrow = 1.0, # growth rate of prey
rMort = 0.2 , # mortality rate of predator
assEff = 0.5, # assimilation efficiency
K = 10) # carrying capacity
yini <- c(Prey = 1, Predator = 2)
times <- seq(0, 200, by = 1)
# run model with constant parameter K
out <- ode(yini, times, LVmod, pars, K_t=NULL)
## make K a function of t with given time steps
t <- seq(0, 200, by = 5)
K <- cumsum(c(10, rep(2, length(t) - 1)))
K_t <- approxfun(t, K, method = "constant", rule = 2)
out2 <- ode(yini, times, LVmod, pars, K_t=K_t)
plot(out, out2, mfrow=c(1, 3))
I am working with the R programming language. I am trying to learn about different optimization algorithms such as the "Genetic Algorithm" (e.g. https://cran.r-project.org/web/packages/GA/vignettes/GA.html, https://cran.r-project.org/web/packages/GA/index.html) vs. the "Evolutionary Grammar Algorithm" (e.g. https://cran.r-project.org/web/packages/gramEvol/gramEvol.pdf , https://www.jstatsoft.org/article/view/v071i01).
For instance, I can use the Genetic Algorithm to optimize the following function ("Rastrigin") :
#PART 1: Optimize Rastrigin Function with the Genetic Algorithm
library(GA)
#define function
Rastrigin <- function(x1, x2)
{
20 + x1^2 + x2^2 - 10*(cos(2*pi*x1) + cos(2*pi*x2))
}
#plot
x1 <- x2 <- seq(-5.12, 5.12, by = 0.1)
f <- outer(x1, x2, Rastrigin)
persp3D(x1, x2, f, theta = 50, phi = 20, col.palette = bl2gr.colors)
#run optimization algorithm and plot results
GA <- ga(type = "real-valued",
fitness = function(x) -Rastrigin(x[1], x[2]),
lower = c(-5.12, -5.12), upper = c(5.12, 5.12),
popSize = 50, maxiter = 1000, run = 100)
plot(GA)
summary(GA)
Fitness function value = -2.502466e-07
Solution =
x1 x2
[1,] 3.341508e-05 1.203355e-05
Problem: However, I am running into errors when I try to use the "Evolutionary Grammar Algorithm" to optimize the same function:
#PART 2: Optimize Rastrigin Function with the Evolutionary Grammar Algorithm:
library(gramEvol)
ruleDef <- list(expr = gsrule("<der.expr><op><der.expr>"),
der.expr = grule(func(var), var),
func = grule(log, exp, sin, cos),
op = gsrule("+", "-", "*"),
var = grule(x1, x2, n),
n = grule(1, 2, 3, 4))
# Creating the grammar object
grammarDef <- CreateGrammar(ruleDef)
#redine the same function in a format acceptable to the "gramEvol" library
Rastrigin <- function(expr) {
# expr: a string containing a symbolic expression
# returns: Symbolic regression Error
x1 <- c(5.12, 5.12)
x2 <- c(5.12, 5.12)
result <- eval(as.expression(expr))
err <- 20 + x1^2 + x2^2 - 10*(cos(2*pi*x1) + cos(2*pi*x2))
return(err)
}
#run optimization (problem)
ge <- GrammaticalEvolution(grammarDef, evalFunc, terminationCost = 0.001)
# print results (problem)
print(ge, sequence = TRUE)
The last two lines of code return the following errors:
Error in EvolutionStrategy.int(genomeLen = chromosomeLen, codonMin = 0, :
Invalid cost function return value (NA or NaN).
In addition: Warning messages:
1: In result - X :
longer object length is not a multiple of shorter object length
2: In result - X :
longer object length is not a multiple of shorter object length
#etc
Question: Can someone please show me how to fix this problem? What is result - X ? What exactly is longer object length is not a multiple of shorter object length referring to?
Thanks!
Source: https://rdrr.io/cran/gramEvol/man/GrammaticalEvolution.html
I am trying to establish a method of estimating infectious disease parameters by comparing real epidemic curves with simulations of a stochastic SIR model. To construct the stochastic SIR model, I am using the deSolve package and instead of using fixed parameter values I would like to draw the parameter value used in the equations at each time point from a Poisson distribution centered on the original parameter values.
Using the parameter beta as an example, beta represents the average number of transmission events per capita and is the product of the average number of contacts and the probability that transmission occurs upon contact. Realistically, there is variation in the number of contacts a person will have and since transmission is also a probabilistic event there is variation surrounding this too.
So even if the average transmission rate were to be 2.4 (for example), an individual can go on to infect 0, 1, 2 or 3 ... etc. people with varying probabilities.
I have tried to incorporate this into my code below using the rpois function and reassigning the parameters used in the equations to the outputs of the rpois.
I have run my code with the same initial values and parameters multiple times and all the curves are different indicating that SOMETHING "stochastic" is going on, but I am unsure whether the code is sampling using the rpois at each time point or just once at the beginning. I have only started coding very recently so do not have much experience.
I would be grateful if anyone more experienced than myself could verify what my code is ACTUALLY doing and whether it is sampling using rpois at each time point or not. If not I would be grateful for any suggestions for achieving this. Perhaps a loop is needed?
library('deSolve')
library('reshape2')
library('ggplot2')
#MODEL INPUTS
initial_state_values <- c(S = 10000,
I = 1,
R = 0)
#PARAMETERS
parameters <- c(beta = 2.4,
gamma = 0.1)
#POISSON MODELLING OF PARAMETERS
#BETA
beta_p <- rpois(1, parameters[1])
#GAMMA
infectious_period_p <- rpois(1, 1/(parameters[2]))
gamma_p <- 1/infectious_period_p
#TIMESTEPS
times <- seq(from = 0, to = 50,by = 1)
#SIR MODEL FUNCTION
sir_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
N <- S + I + R
lambda <- beta_p * I/N
dS <- -lambda * S
dI <- lambda*S - gamma_p*I
dR <- gamma_p*I
return(list(c(dS, dI, dR)))
})
}
output<- as.data.frame(ode(y= initial_state_values,
times = times,
func = sir_model,
parms = parameters))
The code given in the question runs the model with constant parameters over time. Here an example with parameters varying over time. However, this setting assumes that for a given time step, the parameters are equal for all indidividuals of the population. If you want to have individual variability, one can either use a matrix formulation for different sub-populations or use an individual model instead.
Model with fluctuating population parameters:
library('deSolve')
initial_state_values <- c(S = 10000,
I = 1,
R = 0)
parameters <- c(beta = 2.4, gamma = 0.1)
times <- seq(from = 0, to = 50, by = 1) # note time step = 1!
# +1 to add one for time = zero
beta_p <- rpois(max(times) + 1, parameters[1])
infectious_period_p <- rpois(max(times) + 1, 1/(parameters[2]))
gamma_p <- 1/infectious_period_p
sir_model <- function(time, state, parameters) {
# cat(time, "\n") # show time steps for debugging
with(as.list(c(state, parameters)), {
# this overwrites the parms passed via parameters
beta <- beta_p[floor(time) + 1]
gamma <- gamma_p[floor(time) + 1]
N <- S + I + R
lambda <- beta * I/N
dS <- -lambda * S
dI <- lambda * S - gamma * I
dR <- gamma * I
list(c(dS, dI, dR))
})
}
output <- ode(y = initial_state_values,
times = times,
func = sir_model,
parms = parameters)
plot(output)
Here another, slightly more generalized version. It is added as a second answer, to keep the original version compact and simple. The new version differs with respect to the following:
generalized, so that it can work with fixed parameters and stochastic forcing
pass parameters as list
run a basic Monte-Carlo simulation
library('deSolve')
sir_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
# this overwrites the parms passed via parameters
if (time_dependent) {
beta <- beta_p[floor(time) + 1]
gamma <- gamma_p[floor(time) + 1]
}
N <- S + I + R
lambda <- beta * I/N
dS <- -lambda * S
dI <- lambda * S - gamma * I
dR <- gamma * I
list(c(dS, dI, dR))
})
}
initial_state_values <- c(S = 10000, I = 1, R = 0)
times <- seq(from = 0, to = 50, by = 1) # note time step = 1!
## (1) standard simulation with constant parameters
parameters <- c(beta = 2.4, gamma = 0.1)
out0 <- ode(y= initial_state_values,
times = times,
func = sir_model,
parms = c(parameters, time_dependent = FALSE))
plot(out0)
## (2) single simulation with time varying parameters
beta_p <- rpois(max(times) + 1, parameters[1])
infectious_period_p <- rpois(times + 1, 1/(parameters[2]))
gamma_p <- 1/infectious_period_p
## here we need pass the vectorized parameters globally
## for simplicity, it can also be done as list
out1 <- ode(y = initial_state_values, times = times,
func = sir_model, parms = c(time_dependent = TRUE))
plot(out0, out1)
## (3) a sample of simulations
monte_carlo <- function(i) {
#parameters <- c(beta = 2.4, gamma = 0.1)
beta_p <- rpois(max(times) + 1, parameters[1])
infectious_period_p <- rpois(max(times) + 1, 1 / (parameters[2]))
gamma_p <- 1/infectious_period_p
ode(y = initial_state_values, times = times,
func = sir_model, parms = list(beta_p = beta_p,
gamma_p = gamma_p,
time_dependent = TRUE))
}
## run 10 simulations
out_mc <- lapply(1:10, monte_carlo)
plot(out0, out_mc, mfrow=c(1, 3))
I am trying to simulate the transmission of viruses in a population using the function ode from the deSolve package. The basic of my model is a SIR model and I posted a much simpler demo of my model here, which consists of only three states S(susceptible), I(infectious) and R(recovered). Each state is represented by a m*n matrix in my code, since I have m age groups and n subpopulations in my population.
The problem is: during the simulation period, there will be several vaccination activities that transfer people in state S to state I. Each vaccination activity is characterized by a begin date, an end date, its coverage rate and duration. What I want to do is once the time t falls into the interval of begin date and end date of one vaccination activity, the code calculates the effective vaccination rate (also a m*n matrix, based on coverage rate and duration) and times it with S (m*n matrix), to get a matrix of people transited to state I. Right now, I am using if() to decide if time t is between a begin date and a end date:
#initialize the matrix of effective vaccination rate
irrate_matrix = matrix(data = rep(0, m*n), nrow = m, ncol = n)
for (i in 1:length(tbegin)){
if (t>=tbegin[i] & t<=tend[i]){
for (j in 1:n){
irrate_matrix[, j] = -log(1-covir[(j-1)*length(tbegin)+i])/duration[i]
}
}
}
Here, irrate_matrix is the m*n effective vaccination rate matrix, m = 2 is the number of age groups, n = 2 is the number of subpopulations, tbegin = c(5, 20, 35) is the begin date of 3 vaccination activities, tend = c(8, 23, 38) is the end date of 3 vaccination activities, covir = c(0.35, 0.25, 0.25, 0.225, 0.18, 0.13) is the coverage rate of each vaccination for each subpopulation (e.g., covir[1] = 0.35 is the coverage rate of the first vaccination for subpopulation1, while covir[4] = 0.225 is the coverage rate of the first vaccination for subpopulation2) and duration = c(4, 4, 4) is the duration of each vaccination (in days).
After calculating irrate_matrix, I take it into derivatives and therefore I have:
dS = as.matrix(b*N) - as.matrix(irrate_matrix*S) - as.matrix(mu*S)
dI = as.matrix(irrate_matrix*S) - as.matrix(gammaS*I) - as.matrix(mu*I)
dR = as.matrix(gammaS*I) - as.matrix(mu*R)
I want to do a simulation from day 0 to day 50, by 1-day step, thus:
times = seq(0, 50, 1)
The current issue with my code is: every time the time t comes to a time point close to a tbegin[i] or tend[i], the simulation becomes much slower since it iterates at this time point for much more rounds than at any other time point. For example, once the time t comes to tbegin[1] = 5, the model iterates at time point 5 for many rounds. I attached screenshots from printing out those iterations (screenshot1 and screenshot2). I find this is why my bigger model takes a very long running time now.
I have tried using the "events" function of deSolve mentioned by tpetzoldt in this question stackoverflow: change the value of a parameter as a function of time. However, I found it's inconvenient for me to change a matrix of parameters and change it every time there is a vaccination activity.
I am looking for solutions regarding:
How to change my irrate_matrix to non-zero matrix when there is a vaccination activity and let it be zero matrix when there is no vaccination? (it has to be calculated for each vaccination)
At the same time, how to make the code run faster by avoiding iterating at any tbegin[i] or tend[i] for many rounds? (I think I should not use if() but I do not know what I should do with my case)
If I need to use "forcing" or "events" function, could you please also tell me how to have multiple "forcing"/"events" in the model? Right now, I have had an "events" used in my bigger model to introduce a virus to the population, as:
virusevents = data.frame(var = "I1", time = 2, value = 1, method = "add")
Any good idea is welcome and directly providing some codes is much appreciated! Thank you in advance!
For reference, I post the whole demo here:
library(deSolve)
##################################
###(1) define the sir function####
##################################
sir_basic <- function (t, x, params)
{ # retrieve initial states
S = matrix(data = x[(0*m*n+1):(1*m*n)], nrow = m, ncol = n)
I = matrix(data = x[(1*m*n+1):(2*m*n)], nrow = m, ncol = n)
R = matrix(data = x[(2*m*n+1):(3*m*n)], nrow = m, ncol = n)
with(as.list(params), {
N = as.matrix(S + I + R)
# print out current iteration
print(paste0("Total population at time ", t, " is ", sum(N)))
# calculate irrate_matrix by checking time t
irrate_matrix = matrix(data = rep(0, m*n), nrow = m, ncol = n)
for (i in 1:length(tbegin)){
if (t>=tbegin[i] & t<=tend[i]){
for (j in 1:n){
irrate_matrix[, j] = -log(1-covir[(j-1)*length(tbegin)+i])/duration[i]
}
}
}
# derivatives
dS = as.matrix(b*N) - as.matrix(irrate_matrix*S) - as.matrix(mu*S)
dI = as.matrix(irrate_matrix*S) - as.matrix(gammaS*I) - as.matrix(mu*I)
dR = as.matrix(gammaS*I) - as.matrix(mu*R)
derivatives <- c(dS, dI, dR)
list(derivatives)
})
}
##################################
###(2) characterize parameters####
##################################
m = 2 # the number of age groups
n = 2 # the number of sub-populations
tbegin = c(5, 20, 35) # begin dates
tend = c(8, 23, 38) # end dates
duration = c(4, 4, 4) # duration
covir = c(0.35, 0.25, 0.25, 0.225, 0.18, 0.13) # coverage rates
b = 0.0006 # daily birth rate
mu = 0.0006 # daily death rate
gammaS = 0.05 # transition rate from I to R
parameters = c(m = m, n = n,
tbegin = tbegin, tend = tend, duration = duration, covir = covir,
b = b, mu = mu, gammaS = gammaS)
##################################
#######(3) initial states ########
##################################
inits = c(
S = c(20000, 40000, 10000, 20000),
I = rep(0, m*n),
R = rep(0, m*n)
)
##################################
#######(4) run simulations########
##################################
times = seq(0, 50, 1)
traj <- ode(func = sir_basic,
y = inits,
parms = parameters,
times = times)
plot(traj)
Element wise operations are the same for matrices and vectors, so the as.matrix conversions are redundant, as no true matrix multiplication is used. Same with the rep: the zero is recycled anyway.
In effect, CPU time reduces already to 50%. In contrast, use of an external forcing with approxTime instead of the inner if and for made the model slower (not shown).
Simplified code
sir_basic2 <- function (t, x, params)
{ # retrieve initial states
S = x[(0*m*n+1):(1*m*n)]
I = x[(1*m*n+1):(2*m*n)]
R = x[(2*m*n+1):(3*m*n)]
with(as.list(params), {
N = S + I + R
# print out current iteration
#print(paste0("Total population at time ", t, " is ", sum(N)))
# calculate irrate_matrix by checking time t
irrate_matrix = matrix(data = 0, nrow = m, ncol = n)
for (i in 1:length(tbegin)){
if (t >= tbegin[i] & t <= tend[i]){
for (j in 1:n){
irrate_matrix[, j] = -log(1-covir[(j-1) * length(tbegin)+i])/duration[i]
}
}
}
# derivatives
dS = b*N - irrate_matrix*S - mu*S
dI = irrate_matrix*S - gammaS*I - mu*I
dR = gammaS*I - mu*R
list(c(dS, dI, dR))
})
}
Benchmark
Each model version is run 10 times. Model sir_basic is the original implementation, where print line was disabled for a fair comparison.
system.time(
for(i in 1:10)
traj <- ode(func = sir_basic,
y = inits,
parms = parameters,
times = times)
)
system.time(
for(i in 1:10)
traj2 <- ode(func = sir_basic2,
y = inits,
parms = parameters,
times = times)
)
plot(traj, traj2)
summary(traj - traj2)
I observed another considerable speedup, when I use method="adams" instead of the default lsoda solver, but this may differ for your full model.
I'm trying to run a MLE for an infectious disease compartmental transmission model (SEIR, in my case SSEIR) with the mle2 command, trying to fit a curve of predicted number of weekly deaths to that of observed weekly deaths similar to this:
plot of predicted vs observed weekly deaths.
However, the parameter estimates seem to always be on the (sensible) boundaries I provide and SEs, z-values, p-values are NA.
I set up the SEIR model and then solve it with the ode solver. Using that model output and the observed data, I calculate a negative log likelihood, which I then submit to the mle2 function.
When I first set it up, there were multiple errors that stopped the script from running, but now that those are resolved, I cannot seem to find the root of why the fitting doesn't work.
I am certain that the boundaries I set for the parameter estimation are sensible. The parameters are transition rates between compartments and are therefore defined as (for example) delta = 1/duration of infectiousness, so there are very real biological boundaries on what the parameters can be.
I am aware that I am trying to fit a lot of parameters with not that much data, but the same problem persists when I try only fitting one, so that cannot be the root of it.
library(deSolve)
library(bbmle)
#data
gdta <- c(0, 36.2708172419082, 1.57129615346629, 28.1146409459558, 147.701669719614, 311.876708482584, 512.401145459178, 563.798275104372, 470.731269976821, 292.716043742125, 153.604156195608, 125.760068922451, 198.755685044427, 143.847282793854, 69.2693867232681, 42.2093135487066, 17.0200426587424)
#build seir function
seir <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
dS0 <- - beta0 * S0 * (I/N)
dS1 <- - beta1 * S1 * (I/N)
dE <- beta0 * S0 * (I/N) + beta1 * S1 * (I/N) - delta * E
dI <- delta * E - gamma * I
dR <- gamma * I
return(list(c(dS0, dS1, dE, dI, dR)))
})
}
# build function to run seir, include ode solver
run_seir <- function(time, state, beta0, beta1, delta, gamma, sigma, N, startInf) {
parameters <- c(beta0, beta1, delta, gamma)
names(parameters) <- c("beta0", "beta1", "delta", "gamma")
init <- c(S0 = (N - startInf)*(sigma) ,
S1 = (N - startInf) * (1-sigma),
E = 0,
I = startInf,
R = 0)
state_est <- as.data.frame(ode(y = init, times = times, func = seir, parms = parameters))
return(state_est)
}
times <- seq(0, 16, by = 1) #sequence
states <- c("S0", "S1", "E", "I", "R")
# run the run_seir function to see if it works
run_seir(time = times, state= states, beta0 = 1/(1.9/7), beta1 = 0.3*(1/(1.9/7)), delta = 1/(4.1/7), gamma = 1/(4.68/7), sigma = 0.7, N = 1114100, startInf = 100)
#build calc likelihood function
calc_likelihood <- function(times, state, beta0, beta1, delta, gamma, sigma, N, startInf, CFR) {
model.output <- run_seir(time, state, beta0, beta1, delta, gamma, sigma, N, startInf)
LL <- sum(dpois(round(as.numeric(gdta)), (model.output$I)/(1/delta)*CFR, log = TRUE))
print(LL)
return(LL)
}
# run calc_likelihood function
calc_likelihood(time = times, state = states, beta0 = 1/(1.9/7), beta1 = 0.3*(1/(1.9/7)), delta = 1/(4.1/7), gamma = 1/(4.68/7), sigma = 0.7, N = 1114100, startInf = 100, CFR = 0.02)
#MLE
#parameters that are supposed to be fixed
fixed.pars <- c(N=1114100, startInf=100, CFR = 0.02)
#parameters that mle2 is supposed to estimate
free.pars <- c(beta0 = 1/(1.9/7), beta1 = 0.3*(1/(1.9/7)),
delta = 1/(4.1/7), gamma = 1/(4.68/7), sigma = 0.7)
#lower bound
lower_v <- c(beta0 = 0, beta1 = 0, delta = 0, gamma = 0, sigma = 0)
#upper bound
upper_v <- c(beta0 = 15, beta1 = 15, delta = 15, gamma = 15, sigma = 1)
#sigma = 1, this is not a typo
#mle function - need to use L-BFGS-B since we need to include boundaries
test2 <- mle2(calc_likelihood, start = as.list(free.pars), fixed = as.list(fixed.pars),method = "L-BFGS-B", lower = lower_v, upper = upper_v)
summary(test2)
After I run mle2, I get a warning saying:
Warning message:
In mle2(calc_likelihood, start = as.list(free.pars), fixed = as.list(fixed.pars), :
some parameters are on the boundary: variance-covariance calculations based on Hessian may be unreliable
and if I look at summary(test2):
Warning message:
In sqrt(diag(object#vcov)) : NaNs produced
Based on the research I've done so far, I understand that the second error might be due to the estimates being on the boundaries, so my question really is how to address the first one.
If I run mle2 with only lower boundaries, I get parameter estimates in the millions, which cannot be correct.
I am fairly certain that my model specification for the SEIR is correct, but after staring at this code and trying to resolve this issue for a week, I'm open to any input on how to make the fitting work.
Thanks,
JJ