I try to estimate parameters for an ode model in R using nls (and later nlme). My testing code gives me error messages.
library(deSolve)
seed=2423
dat2<-data.frame(days=(runif(20)+1)*10, X1=runif(20), X2=runif(20))
dat2$y<-0.4*exp(dat2$X1)+0.6*exp(dat2$X2)+rnorm(20, sd=0.3)
# example intentionally simple. I would usually solve it analysically
#***************************************************
#*Model definition
#***************************************************
decomp<-function(t, state, parameters){
with(as.list(c(state, parameters)), {
dX1<-a1*X1
dX2<-a2*X2#+a1*X1
list(c(dX1, dX2))
} )
}
# Testing the code to demonstrate that it works
parameters<-c(a1=0.05,a2=0.05)
state<-c(X1=0.5, X2=0.5)
times<-seq(0,100, by=1)
out<-ode(y=state, times=times, func=decomp, parms=parameters)
out[100,]
#*****************************
#* Wrapper function to be passed to nls or nlme
#**************************************
calcdecom<-function(a1,a2,t,x1, x2)
{
state<-c(X1=x1, X2=x2)
times<-c(0,t)
parameters<-c(a1=a1,a2=a2)
out<-ode(y=state, times=times, func=decomp, parms=parameters)
return(as.numeric(out[2,2]+out[2,3]))
}
# ******************** test
calcdecom(0.1,0.1,5,0.3,0.3)
test<-nls(y~calcdecom(a1, a2, days, X1, X2 ),
start=list(a1=0.02, a2=0.4), data=dat2)
My error messages for the nls function is:
Error in lsoda(y, times, func, parms, ...) :
illegal input detected before taking any integration steps - see written message
Here a possible approach. One remaining question is, whether the data should really come from independent cases (simulations or measurements) and not from a time series. If the first is intended, then ode must be called for each case separately. This can be done in a for-loop or with an applyfunction. Note also the correction of the set.seed-call, the plot function and the reduced standard deviation in the data generating process. It seems to me that the ode model and the data generating process do not match yet, so that the fitted parameters differ. It would be nice if the OP could post a corrected version by editing the question.
library("deSolve")
## use set.seed to make example reproducible
set.seed(2423)
## simulated data
dat2 <- data.frame(
days = (runif(20) + 1) + 10,
X1 = runif(20),
X2 = runif(20)
)
## reduced error for testing
dat2$y <- 0.4 * exp(dat2$X1) + 0.6 * exp(dat2$X2) + rnorm(20, sd = 0.1)
plot(dat2)
decomp <- function(t, state, parameters){
with(as.list(c(state, parameters)), {
dX1 <- a1 * X1
dX2 <- a2 * X2 + a1 * X1
list(c(dX1, dX2))
} )
}
## test the ode moel to demonstrate that it works
parameters <- c(a1 = 0.05, a2 = 0.05)
state <- c(X1 = 0.5, X2 = 0.5)
times <- seq(0, 100, by = 1)
out <- ode(y = state, times = times, func = decomp, parms = parameters)
plot(out)
out[100,]
## Wrapper function to be passed to nls or nlme
calcdecom <- function(a1, a2, t, x1, x2) {
ret <- numeric(length(t))
parameters <- c(a1 = a1, a2 = a2)
for (i in 1:length(t)) {
times <- c(0, t[i])
state <- c(X1 = x1[i], X2 = x2[i])
out <- ode(y = state, times = times, func = decomp, parms = parameters)
ret[i] <- out[2, 2] + out[2, 3]
}
return(ret)
}
## test wrapper function
calcdecom(0.1, 0.1, 5, 0.3, 0.3)
## with time and state as vectors
calcdecom(0.1, 0.1, 1:10, dat2$X1, dat2$X2)
test <- nls(y ~ calcdecom(a1, a2, days, X1, X2 ),
start = list(a1 = 0.02, a2 = 0.4), data = dat2)
summary(test)
Instead of nls one may also consider modFit from package FME, that has some more flexibility for this kind of models. Details are found in the package vignettes and the following JSS paper: https://doi.org/10.18637/jss.v033.i03
Related
I am trying to establish a method of estimating infectious disease parameters by comparing real epidemic curves with simulations of a stochastic SIR model. To construct the stochastic SIR model, I am using the deSolve package and instead of using fixed parameter values I would like to draw the parameter value used in the equations at each time point from a Poisson distribution centered on the original parameter values.
Using the parameter beta as an example, beta represents the average number of transmission events per capita and is the product of the average number of contacts and the probability that transmission occurs upon contact. Realistically, there is variation in the number of contacts a person will have and since transmission is also a probabilistic event there is variation surrounding this too.
So even if the average transmission rate were to be 2.4 (for example), an individual can go on to infect 0, 1, 2 or 3 ... etc. people with varying probabilities.
I have tried to incorporate this into my code below using the rpois function and reassigning the parameters used in the equations to the outputs of the rpois.
I have run my code with the same initial values and parameters multiple times and all the curves are different indicating that SOMETHING "stochastic" is going on, but I am unsure whether the code is sampling using the rpois at each time point or just once at the beginning. I have only started coding very recently so do not have much experience.
I would be grateful if anyone more experienced than myself could verify what my code is ACTUALLY doing and whether it is sampling using rpois at each time point or not. If not I would be grateful for any suggestions for achieving this. Perhaps a loop is needed?
library('deSolve')
library('reshape2')
library('ggplot2')
#MODEL INPUTS
initial_state_values <- c(S = 10000,
I = 1,
R = 0)
#PARAMETERS
parameters <- c(beta = 2.4,
gamma = 0.1)
#POISSON MODELLING OF PARAMETERS
#BETA
beta_p <- rpois(1, parameters[1])
#GAMMA
infectious_period_p <- rpois(1, 1/(parameters[2]))
gamma_p <- 1/infectious_period_p
#TIMESTEPS
times <- seq(from = 0, to = 50,by = 1)
#SIR MODEL FUNCTION
sir_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
N <- S + I + R
lambda <- beta_p * I/N
dS <- -lambda * S
dI <- lambda*S - gamma_p*I
dR <- gamma_p*I
return(list(c(dS, dI, dR)))
})
}
output<- as.data.frame(ode(y= initial_state_values,
times = times,
func = sir_model,
parms = parameters))
The code given in the question runs the model with constant parameters over time. Here an example with parameters varying over time. However, this setting assumes that for a given time step, the parameters are equal for all indidividuals of the population. If you want to have individual variability, one can either use a matrix formulation for different sub-populations or use an individual model instead.
Model with fluctuating population parameters:
library('deSolve')
initial_state_values <- c(S = 10000,
I = 1,
R = 0)
parameters <- c(beta = 2.4, gamma = 0.1)
times <- seq(from = 0, to = 50, by = 1) # note time step = 1!
# +1 to add one for time = zero
beta_p <- rpois(max(times) + 1, parameters[1])
infectious_period_p <- rpois(max(times) + 1, 1/(parameters[2]))
gamma_p <- 1/infectious_period_p
sir_model <- function(time, state, parameters) {
# cat(time, "\n") # show time steps for debugging
with(as.list(c(state, parameters)), {
# this overwrites the parms passed via parameters
beta <- beta_p[floor(time) + 1]
gamma <- gamma_p[floor(time) + 1]
N <- S + I + R
lambda <- beta * I/N
dS <- -lambda * S
dI <- lambda * S - gamma * I
dR <- gamma * I
list(c(dS, dI, dR))
})
}
output <- ode(y = initial_state_values,
times = times,
func = sir_model,
parms = parameters)
plot(output)
Here another, slightly more generalized version. It is added as a second answer, to keep the original version compact and simple. The new version differs with respect to the following:
generalized, so that it can work with fixed parameters and stochastic forcing
pass parameters as list
run a basic Monte-Carlo simulation
library('deSolve')
sir_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
# this overwrites the parms passed via parameters
if (time_dependent) {
beta <- beta_p[floor(time) + 1]
gamma <- gamma_p[floor(time) + 1]
}
N <- S + I + R
lambda <- beta * I/N
dS <- -lambda * S
dI <- lambda * S - gamma * I
dR <- gamma * I
list(c(dS, dI, dR))
})
}
initial_state_values <- c(S = 10000, I = 1, R = 0)
times <- seq(from = 0, to = 50, by = 1) # note time step = 1!
## (1) standard simulation with constant parameters
parameters <- c(beta = 2.4, gamma = 0.1)
out0 <- ode(y= initial_state_values,
times = times,
func = sir_model,
parms = c(parameters, time_dependent = FALSE))
plot(out0)
## (2) single simulation with time varying parameters
beta_p <- rpois(max(times) + 1, parameters[1])
infectious_period_p <- rpois(times + 1, 1/(parameters[2]))
gamma_p <- 1/infectious_period_p
## here we need pass the vectorized parameters globally
## for simplicity, it can also be done as list
out1 <- ode(y = initial_state_values, times = times,
func = sir_model, parms = c(time_dependent = TRUE))
plot(out0, out1)
## (3) a sample of simulations
monte_carlo <- function(i) {
#parameters <- c(beta = 2.4, gamma = 0.1)
beta_p <- rpois(max(times) + 1, parameters[1])
infectious_period_p <- rpois(max(times) + 1, 1 / (parameters[2]))
gamma_p <- 1/infectious_period_p
ode(y = initial_state_values, times = times,
func = sir_model, parms = list(beta_p = beta_p,
gamma_p = gamma_p,
time_dependent = TRUE))
}
## run 10 simulations
out_mc <- lapply(1:10, monte_carlo)
plot(out0, out_mc, mfrow=c(1, 3))
so let's say that we have an arbitrary system of ODEs in R, which we want to solve, for example a SIR model
dS <- -beta * I * S
dI <- beta * I * S - gamma * I
dR <- gamma * I
I want beta and gamma to have time varying parameters, for example
beta_vector <- seq(0.05, 1, by=0.05)
gamma_vector <- seq(0.05, 1, by=0.05)
User #Ben Bolker gave me the advice to use beta <- beta_vector[ceiling(time)] inside the gradient function
sir_1 <- function(beta, gamma, S0, I0, R0, times) {
require(deSolve) # for the "ode" function
# the differential equations:
sir_equations <- function(time, variables, parameters) {
beta <- beta_vector[ceiling(time)]
gamma <- gamma_vector[ceiling(time)]
with(as.list(c(variables, parameters)), {
dS <- -beta * I * S
dI <- beta * I * S - gamma * I
dR <- gamma * I
return(list(c(dS, dI, dR)))
})
}
# the parameters values:
parameters_values <- c(beta=beta, gamma = gamma)
# the initial values of variables:
initial_values <- c(S = S0, I = I0, R = R0)
# solving
out <- ode(initial_values, times, sir_equations, parameters_values)
# returning the output:
as.data.frame(out)
}
sir_1(beta = beta, gamma = gamma, S0 = 99999, I0 = 1, R0 = 0, times = seq(0, 19))
When I execute it it gives me the following error
Error in checkFunc(Func2, times, y, rho) :
The number of derivatives returned by func() (1) must equal the length of the initial
conditions vector (3)
The problem must lay somewhere here:
parameters_values <- c(beta=beta, gamma = gamma)
I have tried to change the paramters_values to a Matrix with two rows (beta in the first, gamma in the second) or two columns, it did not work. What do I have to do in order to make this work?
Your code had several issues, one is that time starts with zero while ceiling needs to start with one, and there was also some confusion with parameter names. In the following, I show one (of several) possible ways that uses approxfuns instead of ceiling. This is more robust, even if ceiling has also some advantages. The parameters are then functions that are passed toodeas a list. An even simpler approach would be to use global variables.
One additional consideration is whether the time dependent gamma and beta should be linearly interpolated or stepwise. The approxfun function allows both, below I use linear interpolation.
require(deSolve) # for the "ode" function
beta_vector <- seq(0.05, 1, by=0.05)
gamma_vector <- seq(0.05, 1, by=0.05)
sir_1 <- function(f_beta, f_gamma, S0, I0, R0, times) {
# the differential equations
sir_equations <- function(time, variables, parameters) {
beta <- f_beta(time)
gamma <- f_gamma(time)
with(as.list(variables), {
dS <- -beta * I * S
dI <- beta * I * S - gamma * I
dR <- gamma * I
# include beta and gamma as auxiliary variables for debugging
return(list(c(dS, dI, dR), beta=beta, gamma=gamma))
})
}
# time dependent parameter functions
parameters_values <- list(
f_beta = f_beta,
f_gamma = f_gamma
)
# the initial values of variables
initial_values <- c(S = S0, I = I0, R = R0)
# solving
# return the deSolve object as is, not a data.frame to ake plotting easier
out <- ode(initial_values, times, sir_equations, parameters)
}
times <- seq(0, 19)
# approxfun is a function that returns a function
f_gamma <- approxfun(x=times, y=seq(0.05, 1, by=0.05), rule=2)
f_beta <- approxfun(x=times, y=seq(0.05, 1, by=0.05), rule=2)
# check how the approxfun functions work
f_beta(5)
out <- sir_1(f_beta=f_beta, f_gamma=f_gamma, S0 = 99999, I0 = 1, R0 = 0, times = times)
# plot method of class "deSolve", plots states and auxilliary variables
plot(out)
I am trying to fit this very simple 4 species linear Lotka-Volterra competition model to observed data but for some reason when I try the optim() function something with regards to deSolve seems to fail.
# Data
data <- data.frame(Cod = c(0.1966126, 0.1989563, 0.2567677, 0.3158896, 0.4225435, 0.7219856,
1.0570824, 0.7266830, 0.6286763, 0.6389475),
Herring = c(1.988372, 2.788014, 3.397138, 2.557245, 2.627013, 3.045617,
3.161002, 3.531306, 3.432021, 3.617174),
Sprat = c(2.030273, 3.480469, 3.009277, 1.895996, 2.457520, 1.991211, 2.350098,
2.118164, 1.693359, 1.869141),
Flounder = c(0.4758220, 0.4425532, 0.4185687, 0.4967118, 0.7102515, 0.5733075,
0.7404255, 0.5996132, 0.6235977, 0.7187621))
# Model formulation
LLV <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
db1.dt = b1*(r1+a11*b1+a12*b2+a13*b3+a14*b4)
db2.dt = b2*(r2+a22*b2+a21*b1+a23*b3+a24*b4)
db3.dt = b3*(r3+a33*b3+a31*b1+a32*b2+a34*b4)
db4.dt = b4*(r4+a44*b4+a41*b1+a42*b2+a43*b3)
list(c(db1.dt, db2.dt, db3.dt, db4.dt))
})
}
# Model input and simulation
# Model input
params <- c(r1 = -0.342085, r2 = 0.6855681, r3 = 2.757769, r4 = 0.9744113,
a11 = -1.05973762, a12 = 0.09577309, a13 = -0.01915480, a14 = 1.36098939,
a21 = 0.17533326, a22 = -0.32247342, a23 = 0.03111628, a24 = 0.30212711,
a31 = 0.5303516, a32 = -0.4869761, a33 = -0.3194882, a34 = -1.5089027,
a41 = 0.004418133, a42 = 0.163716414, a43 = -0.237873378, a44 = -1.519158802)
ini <- c(b1 = data[1,1], b2 = data[1,2], b3 = data[1,3], b4 = data[1,4])
tmax <- 10
t <- seq(1,tmax,0.1)
# Results and first parameter guess is more or less okay
results <- deSolve::ode(y = ini, times = t, func = LLV, parms = params)
matplot(data, pch = 1)
matplot(x = results[,1], y = results[,-1], type = "l", add = TRUE)
Here I proceed and write a function that minimises the residual sum of squares that when included in optim() with the above initial parameter guess should produce what I am looking for.
min.RSS <- function(data, params) {
output <- deSolve::ode(y = ini, times = t, func = LLV, parms = params)
predictions <- exp(output[,-1])
observations <- data
return(sum((predictions-observations)^2))
}
result <- optim(par = params, fn = min.RSS, data = data)
fit <- deSolve::ode(y = ini, times = t, func = LLV, parms = result$par)
matplot(x = fit[,1], y = fit[,-1], type = "l", lwd = 3, add = TRUE)
Any idea on how to solve this problem will be very much appreciated.
You got a better fit, but you should be very careful with this problem. I went a little crazy and used the (in-development) fitode package to tackle this problem. I fitted the model and got a much better fit, also tried fitting with 100 randomly varying starting points around my best fit. Your residual sum of squares was 1.19; fitode got to 0.29 on the first try, and the best of 100 fits was RSS=0.16. However: these fits are highly unstable. This plot shows the fits to the data and predictions 5 time steps in the future for (1) your fit (dashed lines); (2) fitode initial fit (dotted line); (3) the 100 other fitode fits (the ones within 0.05 RSS of the best fit are solid, the ones worse than that are drawn very lightly).
You can see that the out-of-sample predictions are mostly crazy. Your fit is actually more stable than some of the better fits - it gets to time step 13 before the entire community crashes - but the bottom line is that a good fit to the data in this case in no way guarantees a sensible answer. It looks like a single one of the 100 fits reaches the end of the prediction time series without collapsing (which seems like a reasonably sensible "common sense" prediction based on the observed time series).
In order to fit these data reliably, you either need a model with many fewer parameters, or external information supplied in the form of priors, or regularization - some way to make penalize fits that imply 'wiggly' deterministic trajectories, or interaction parameters/growth rates that are unreasonable.
## remotes::install_github("parksw3/fitode")
library(fitode)
## data with tags for fitode
data2 <- setNames(data,paste0(names(data),"_obs"))
data2 <- data.frame(times=seq(nrow(data2)),data2)
## Model formulation (for fitode)
LV_model <- odemodel(
name="4-species LV",
model=list(
Cod ~ Cod*(r1+a11*Cod+a12*Herring+a13*Sprat+a14*Flounder),
Herring ~ Herring*(r2+a22*Herring+a21*Cod+a23*Sprat+a24*Flounder),
Sprat ~ Sprat*(r3+a33*Sprat+a31*Cod+a32*Herring+a34*Flounder),
Flounder ~ Flounder*(r4+a44*Flounder+a41*Cod+a42*Herring+a43*Sprat)
),
observation=list(
Cod_obs ~ ols(mean=Cod),
Herring_obs ~ ols(mean=Herring),
Sprat_obs ~ ols(mean=Sprat),
Flounder_obs ~ ols(mean=Flounder)
),
initial=list(
Cod ~ data2$Cod_obs[1],
Herring ~ data2$Herring_obs[1],
Sprat ~ data2$Sprat_obs[1],
Flounder ~ data2$Flounder_obs[1]
),
link=setNames(rep("identity",length(pars)),pars),
par= pars
)
## plot results
plotres <- function(p,ODEint="rk",lty=1,
dt=0.1,
tvec=seq(1,10,by=dt),...) {
par(las=1, bty="l")
res <- deSolve::ode(ini, tvec, LLV, p, method=ODEint)
matplot(res[,1],res[,-1],type="l",lty=lty,...)
return(invisible(res[,-1]))
}
f1 <- fitode(
LV_model,
data=data2,
start=params,
control=list(maxit=1e5,trace=1000)
)
## fitode with multistart
ranfit <- function(n,fit,range=0.5) {
##
rpars <- params*runif(length(params),1-range,1+range)
newfit <- try(update(fit, start=rpars))
return(newfit)
}
cl <- makeCluster(10)
clusterSetRNGStream(cl = cl, 101)
clusterExport(cl, c("params","LV_model","data2"))
clusterEvalQ(cl,invisible(library(fitode)))
system.time(
multifit <- parLapply(cl, 1:100, ranfit, fit=f1, tvec=tvec)
)
saveRDS(multifit,file="SO65440448_multifit.rds")
ivec <- seq_along(multifit)
ivec <- ivec[sapply(multifit,function(x) !inherits(x,"try-error"))]
coef <- pred <- vector("list", length=length(ivec))
ll <- conv <- rep(NA,length(ivec))
for (i in seq_along(ivec)) {
nf <- multifit[[ivec[i]]]
coef[[i]] <- coef(nf)
pp <- predict(nf, times=1:10)
pred[[i]] <- cbind(times=pp[[1]][,1],
do.call(cbind,lapply(pp,"[",-1)))
ll[i] <- logLik(nf)
conv[i] <- nf#mle2#details$convergence
}
par(las=1,bty="l")
matplot(pred[[1]][,1],pred[[1]][,-1],
type="n",lty=1,ylim=c(0,6),
xlab="time",ylab="density")
lthresh <- 0.05
for (i in 1:length(pred)) {
good <- ll[i]>(max(ll)-lthresh)
alpha <- if (good) 0.8 else 0.1
lwd <- if (good) 2 else 1
matlines(pred[[i]][,1],pred[[i]][,-1],lty=1,
col=adjustcolor(palette()[1:4],alpha.f=alpha),
lwd=lwd)
}
matpoints(data2[,1],data2[,-1],pch=16,cex=3)
plotres(optimres$par,add=TRUE, lwd=3,lty=2,dt=1)
plotres(coef(f1),add=TRUE, lwd=3,lty=3,dt=1)
To those interested I have managed to get a solution that involves changing the ode integration method. Here is the working optimiser:
# Optimising parameter fit
LVmse = function(parms) {
out = as.matrix(deSolve::ode(ini, 1:10, LLV, parms, method="rk")[,-1])
RSS = sum((spp-out)^2, na.rm = TRUE) # Minimising residual sum of squares
return(RSS)
}
optimres <- optim(par = params, fn = LVmse)
I am using 'KFAS' package from R to estimate a state-space model with the Kalman filter. My measurement and transition equations are:
y_t = Z_t * x_t + \eps_t (measurement)
x_t = T_t * x_{t-1} + R_t * \eta_t (transition),
with \eps_t ~ N(0,H_t) and \eta_t ~ N(0,Q_t).
So, I want to estimate the variances H_t and Q_t, but also T_t, the AR(1) coefficient. My code is as follows:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
fit <- fitSSM(ss_model, inits = c(0,0.6,0), method = 'L-BFGS-B')
But it returns: "Error in is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)),: System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 1e+07"
The NA definitions for the variances works well, as documented in the package's paper. However, it seems this cannot be done for the AR coefficients. Does anyone know how can I do this?
Note that I am aware of the SSMarima function, which eases the definition of the transition equation as ARIMA models. Although I am able to estimate the AR(1) coef. and Q_t this way, I still cannot estimate the \eps_t variance (H_t). Moreover, I am migrating my Kalman filter codes from EViews to R, so I need to learn SSMcustom for other models that are more complicated.
Thanks!
It seems that you are missing something in your example, as your error message comes from the function fitSSM. If you want to use fitSSM for estimating general state space models, you need to provide your own model updating function. The default behaviour can only handle NA's in covariance matrices H and Q. The main goal of fitSSM is just to get started with simple stuff. For complex models and/or large data, I would recommend using your self-written objective function (with help of logLik method) and your favourite numerical optimization routines manually for maximum performance. Something like this:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
objf <- function(pars, model, estimate = TRUE) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
if (estimate) {
-logLik(model)
} else {
model
}
}
opt <- optim(c(1, 0.5, 1), objf, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100), model = ss_model)
ss_model_opt <- objf(opt$par, ss_model, estimate = FALSE)
Same with fitSSM:
updatefn <- function(pars, model) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
model
}
fit <- fitSSM(ss_model, c(1, 0.5, 1), updatefn, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100))
identical(ss_model_opt, fit$model)
I'm working on a binomial mixture model using OpenBUGS and R package R2OpenBUGS. I've successfully built simpler models, but once I add another level for imperfect detection, I consistently receive the error variable X is not defined in model or in data set. I've tried a number of different things, including changing the structure of my data and entering my data directly into OpenBUGS. I'm posting this in the hope that someone else has experience with this error, and perhaps knows why OpenBUGS is not recognizing variable X even though it is clearly defined as far as I can tell.
I've also gotten the error expected the collection operator c error pos 8 - this is not an error I've been getting previously, but I am similarly stumped.
Both the model and the data-simulation function come from Kery's Introduction to WinBUGS for Ecologists (2010). I will note that the data set here is in lieu of my own data, which is similar.
I am including the function to build the dataset as well as the model. Apologies for the length.
# Simulate data: 200 sites, 3 sampling rounds, 3 factors of the level 'trt',
# and continuous covariate 'X'
data.fn <- function(nsite = 180, nrep = 3, xmin = -1, xmax = 1, alpha.vec = c(0.01,0.2,0.4,1.1,0.01,0.2), beta0 = 1, beta1 = -1, ntrt = 3){
y <- array(dim = c(nsite, nrep)) # Array for counts
X <- sort(runif(n = nsite, min = xmin, max = xmax)) # covariate values, sorted
# Relationship expected abundance - covariate
x2 <- rep(1:ntrt, rep(60, ntrt)) # Indicator for population
trt <- factor(x2, labels = c("CT", "CM", "CC"))
Xmat <- model.matrix(~ trt*X)
lin.pred <- Xmat[,] %*% alpha.vec # Value of lin.predictor
lam <- exp(lin.pred)
# Add Poisson noise: draw N from Poisson(lambda)
N <- rpois(n = nsite, lambda = lam)
table(N) # Distribution of abundances across sites
sum(N > 0) / nsite # Empirical occupancy
totalN <- sum(N) ; totalN
# Observation process
# Relationship detection prob - covariate
p <- plogis(beta0 + beta1 * X)
# Make a 'census' (i.e., go out and count things)
for (i in 1:nrep){
y[,i] <- rbinom(n = nsite, size = N, prob = p)
}
# Return stuff
return(list(nsite = nsite, nrep = nrep, ntrt = ntrt, X = X, alpha.vec = alpha.vec, beta0 = beta0, beta1 = beta1, lam = lam, N = N, totalN = totalN, p = p, y = y, trt = trt))
}
data <- data.fn()
And here is the model:
sink("nmix1.txt")
cat("
model {
# Priors
for (i in 1:3){ # 3 treatment levels (factor)
alpha0[i] ~ dnorm(0, 0.01)
alpha1[i] ~ dnorm(0, 0.01)
}
beta0 ~ dnorm(0, 0.01)
beta1 ~ dnorm(0, 0.01)
# Likelihood
for (i in 1:180) { # 180 sites
C[i] ~ dpois(lambda[i])
log(lambda[i]) <- log.lambda[i]
log.lambda[i] <- alpha0[trt[i]] + alpha1[trt[i]]*X[i]
for (j in 1:3){ # each site sampled 3 times
y[i,j] ~ dbin(p[i,j], C[i])
lp[i,j] <- beta0 + beta1*X[i]
p[i,j] <- exp(lp[i,j])/(1+exp(lp[i,j]))
}
}
# Derived quantities
}
",fill=TRUE)
sink()
# Bundle data
trt <- data$trt
y <- data$y
X <- data$X
ntrt <- 3
# Standardise covariates
s.X <- (X - mean(X))/sd(X)
win.data <- list(C = y, trt = as.numeric(trt), X = s.X)
# Inits function
inits <- function(){ list(alpha0 = rnorm(ntrt, 0, 2),
alpha1 = rnorm(ntrt, 0, 2),
beta0 = rnorm(1,0,2), beta1 = rnorm(1,0,2))}
# Parameters to estimate
parameters <- c("alpha0", "alpha1", "beta0", "beta1")
# MCMC settings
ni <- 1200
nb <- 200
nt <- 2
nc <- 3
# Start Markov chains
out <- bugs(data = win.data, inits, parameters, "nmix1.txt", n.thin=nt,
n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
Note: This answer has gone through a major revision, after I noticed another problem with the code.
If I understand your model correctly, you are mixing up the y and N from the simulated data, and what is passed as C to Bugs. You are passing the y variable (a matrix) to the C variable in the Bugs model, but this is accessed as a vector. From what I can see C is representing the number of "trials" in your binomial draw (actual abundances), i.e. N in your data set. The variable y (a matrix) is called the same thing in both the simulated data and in the Bugs model.
This is a reformulation of your model, as I understand it, and this runs ok:
sink("nmix1.txt")
cat("
model {
# Priors
for (i in 1:3){ # 3 treatment levels (factor)
alpha0[i] ~ dnorm(0, 0.01)
alpha1[i] ~ dnorm(0, 0.01)
}
beta0 ~ dnorm(0, 0.01)
beta1 ~ dnorm(0, 0.01)
# Likelihood
for (i in 1:180) { # 180 sites
C[i] ~ dpois(lambda[i])
log(lambda[i]) <- log.lambda[i]
log.lambda[i] <- alpha0[trt[i]] + alpha1[trt[i]]*X[i]
for (j in 1:3){ # each site sampled 3 times
y[i,j] ~ dbin(p[i,j], C[i])
lp[i,j] <- beta0 + beta1*X[i]
p[i,j] <- exp(lp[i,j])/(1+exp(lp[i,j]))
}
}
# Derived quantities
}
",fill=TRUE)
sink()
# Bundle data
trt <- data$trt
y <- data$y
X <- data$X
N<- data$N
ntrt <- 3
# Standardise covariates
s.X <- (X - mean(X))/sd(X)
win.data <- list(y = y, trt = as.numeric(trt), X = s.X, C= N)
# Inits function
inits <- function(){ list(alpha0 = rnorm(ntrt, 0, 2),
alpha1 = rnorm(ntrt, 0, 2),
beta0 = rnorm(1,0,2), beta1 = rnorm(1,0,2))}
# Parameters to estimate
parameters <- c("alpha0", "alpha1", "beta0", "beta1")
# MCMC settings
ni <- 1200
nb <- 200
nt <- 2
nc <- 3
# Start Markov chains
out <- bugs(data = win.data, inits, parameters, "nmix1.txt", n.thin=nt,
n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
Overall, the results from this model looks ok, but there are long autocorrelation lags for beta0 and beta1. The estimate of beta1 also seems a bit off(~= -0.4), so you might want to recheck the Bugs model specification, so that it is matching the simulation model (i.e. that you are fitting the correct statistical model). At the moment, I'm not sure that it does, but I don't have the time to check further right now.
I got the same message trying to pass a factor to OpenBUGS. Like so,
Ndata <- list(yrs=N$yrs, site=N$site), ... )
The variable "site" was not passed by the "bugs" function. It simply was not in list passed
to OpenBUGS
I solved the problem by passing site as numeric,
Ndata <- list(yrs=N$yrs, site=as.numeric(N$site)), ... )