Exponentiation of vector in JAGS model, Bayesian analysis in R - r

I have the following JAGS model for use in a Bayesian model in R. I am trying to estimate the posterior distribution for my variable "R". All variables but R are supposed to be deterministic nodes. Each variable, s_A, z_A, z_W, and d are vectors. While tau_s is a data.frame. TTD_aquifer and O2s_all are therefore expected to be a vector for each i.
model {
for (i in 1:N){
y[i] ~ dnorm(mu[i], tau)
mu[i] <- sum(O2s_all)/2
tau_s_bar[i] = (s_A[i]*z_A[i])/R[i]*log(z_A[i]/(z_A[i]-z_W[i]))
TTD_aquifer <- t((d[i]*sqrt(tau_s_bar[i]))/sqrt(4*3.14*d[i]*t(tau_s[,i]^3))*exp(-1*((d[i]*tau_s_bar[i])/(4*t(tau_s[,i])))*
(1-t(tau_s[,i])/tau_s_bar[i])^2))
O2s_all <- t(O2_o[i]-k_o[i]*t(tau_s[,i]))*TTD_aquifer
# prior on R
R[i] ~ dlnorm(-2, 1/(0.6)^2)
}
# prior on tau and sigma
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}
When I run this in jags.model() I get the following error: RUNTIME ERROR:
Invalid vector argument to exp. So it looks like I cannot input a vector into exp() like you can in R. The equations for TTD_aquifer and O2s_all run fine in R for a deterministic example. How should I write my equation for TTD_aquifer in JAGS to avoid the exp issue?

In JAGS, inverse link functions like exp only take scalar arguments. You could change your model to this in order to use exp. Note that you will need to include an object in your data list that denotes how many rows are in the tau_s data frame. Since I do not know what your model is doing I have not checked to determine if your parentheses are in the correct location across all of your divisions and multiplications.
model {
for (i in 1:N){
y[i] ~ dnorm(mu[i], tau)
mu[i] <- sum(O2s_all[,i])/2
tau_s_bar[i] <- (s_A[i]*z_A[i])/R[i]*log(z_A[i]/(z_A[i]-z_W[i]))
for(j in 1:K){ # K = nrow of tau_s
TTD_aquifer[j,i] <- t((d[i]*sqrt(tau_s_bar[i]))/
sqrt(4*3.14*d[i]*t(tau_s[j,i]^3))*
exp(-1*((d[i]*tau_s_bar[i])/(4*t(tau_s[j,i])))*
(1-t(tau_s[j,i])/tau_s_bar[i])^2))
O2s_all[j,i] <- t(O2_o[i]-k_o[i]*t(tau_s[j,i]))*TTD_aquifer[j,i]
} # close K loop
# prior on R
R[i] ~ dlnorm(-2, 1/(0.6)^2)
}
# prior on tau and sigma
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}
As TTD_aquifer and 02s_all should be a vector for each i, they should then be a two dimensional matrix of the same size as tau_s for each step in an MCMC chain. If you have a big dataset (i.e., big N and K) and are running this model for many iterations, tracking those derived parameters will take up considerable memory. Just something to keep in mind if you are running this on a computer without sufficient RAM. Thinning the chain is one way to help deal with the computational intensity of tracking said parameters.

Related

How to specify zero-inflated negative binomial model in JAGS

I'm currently working on constructing a zero-inflated negative binomial model in JAGS to model yearly change in abundance using count data and am currently a bit lost on how best to specify the model. I've included an example of the base model I'm using below. The main issue I'm struggling with is that in the model output I'm getting poor convergence (high Rhat values, low Neff values) and the 95% credible intervals are huge. I realize that without seeing/running the actual data there's probably not much anyone can help with but I thought I'd at least try and see if there are any obvious errors in the way I have the basic model specified. I also tried fitting a variety of other model types (regular negative binomial, Poisson, and zero-inflated Poisson) but decided to go with the ZINB since it had the lowest DIC scores of all the models and also makes the most intuitive sense to me, given my data structure.
library(R2jags)
# Create example dataframe
years <- c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2)
sites <- c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3)
months <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
# Count data
day1 <- floor(runif(18,0,7))
day2 <- floor(runif(18,0,7))
day3 <- floor(runif(18,0,7))
day4 <- floor(runif(18,0,7))
day5 <- floor(runif(18,0,7))
df <- as.data.frame(cbind(years, sites, months, day1, day2, day3, day4, day5))
# Put count data into array
y <- array(NA,dim=c(2,3,3,5))
for(m in 1:2){
for(k in 1:3){
sel.rows <- df$years == m &
df$months==k
y[m,k,,] <- as.matrix(df)[sel.rows,4:8]
}
}
# JAGS model
sink("model1.txt")
cat("
model {
# PRIORS
for(m in 1:2){
r[m] ~ dunif(0,50)
}
t.int ~ dlogis(0,1)
b.int ~ dlogis(0,1)
p.det ~ dunif(0,1)
# LIKELIHOOD
# ECOLOGICAL SUBMODEL FOR TRUE ABUNDANCE
for (m in 1:2) {
zero[m] ~ dbern(pi[m])
pi[m] <- ilogit(mu.binary[m])
mu.binary[m] <- t.int
for (k in 1:3) {
for (i in 1:3) {
N[m,k,i] ~ dnegbin(p[m,k,i], r)
p[m,k,i] <- r[m] / (r[m] + (1 - zero[m]) * lambda.count[m,k,i]) - 1e-10 * zero[m]
lambda.count[m,k,i] <- exp(mu.count[m,k,i])
log(mu.count[m,k,i]) <- b.int
# OBSERVATIONAL SUBMODEL FOR DETECTION
for (j in 1:5) {
y[m,k,i,j] ~ dbin(p.det, N[m,k,i])
}#j
}#i
}#k
}#m
}#END", fill=TRUE)
sink()
win.data <- list(y = y)
Nst <- apply(y,c(1,2,3),max)+1
inits <- function()list(N = Nst)
params <- c("N")
nc <- 3
nt <- 1
ni <- 50000
nb <- 5000
out <- jags(win.data, inits, params, "model1.txt",
n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
working.directory = getwd())
print(out)
Tried fitting a ZINB model in JAGS using the code specified above but am having issues with model convergence.
The way that I have tended to specify zero-inflated models is to model the data as being Poisson distributed with mean that is either zero if that individual is part of the zero-inflated group, or distributed according to a gamma distribution otherwise. Something like:
Obs[i] ~ dpois(lambda[i] * is_zero[i])
is_zero[i] ~ dbern(zero_prob)
lambda[i] ~ dgamma(k, k/mean)
Something similar to this was first used in this paper: https://www.researchgate.net/publication/5231190_The_distribution_of_the_pathogenic_nematode_Nematodirus_battus_in_lambs_is_zero-inflated
These models usually converge OK, although the performance is not as good as for simpler models of course. You also need to make sure to supply initial values for is_zero so that the model starts with all individuals with positive counts in the appropriate group.
In your case, you have multiple timepoints, so you need to decide if the zero-inflation is fixed over time points (i.e. an individual cannot switch to or from zero-inflated group over time), or if each observation is completely independent with respect to zero-inflation status. You also need to decide if you want to have co-variates of year/month/site affecting the mean count (i.e. the gamma part) or the probability of a positive count (i.e. the zero-inflation part). For the former, you need to index mean (in my formulation) by i and then use a GLM-like formula (probably using log link) to relate this to the appropriate covariates. For the latter, you need to index zero_prob by i and then use a GLM-like formula (probably using logit link) to relate this to the appropriate covariates. It is also possible to do both, but if you try to use the same covariates in both parts then you can expect convergence problems!
It would arguably be better to replace the separate Poisson-Gamma distributions with a single Negative Binomial distribution using the 'ecology parameterisation' with mean and k. This is not currently implemented in JAGS, but I will add it for the next update.

Predicting new values in jags (mixed model)

I asked a similar question a while ago on how to get model predictions in JAGS for mixed models. Here's my original question.
This time, I'm trying to get predictions for the same model but using new data and not the original that was used to fit the model.
model<-"model {
# Priors
mu_int~dnorm(0, 0.0001)
sigma_int~dunif(0, 100)
tau_int <- 1/(sigma_int*sigma_int)
for (j in 1:(M)){
alpha[j] ~ dnorm(mu_int, tau_int)
}
beta~dnorm(0, 0.01)
sigma_res~dunif(0, 100)
tau_res <- 1/(sigma_res*sigma_res)
# Likelihood
for (i in 1:n) {
mu[i] <- alpha[Mat[i]]+beta*Temp[i] # Expectation
D47[i]~dnorm(mu[i], tau_res) # The actual (random) responses
}
for(i in 1:(n)){
D47_pred[i] <- dnorm(mu[i], tau_res)
}
}"
I know this mcan be done using the posterior distributions of the resulting parameters but I'm wondering if it could also be implemented inside jags.
Thank you!
It absolutely could be done inside JAGS. If you wanted predictions for new values of Temp for some of the same observations in Mat, you would just have to append them to the existing data with a corresponding D47 value of NA.

In JAGS, how does the stochastic node works?

...and what does the ~ sign mean compared to R in y[I] ~ dnorm(m[i],tau) vs y[I] <- dnorm(n,m[i],tau)?
Consider the two lines of code:
`for(I in 1:length(y)) {
y[i] ~ dnorm(m[i],tau) #---> Jags code (stochastic node)
m[i] = alpha + beta*(x[i] - x_bar)
.
.
}
y[i] <- dnorm(n,m[i],tau)?) ---> R`
In Jags, what will be the n values since it is not specified inside the dnorm function? (dnorm(m[i],tau))
For each i, does the dnorm function calculate the density values for each y value with respect to the mean m[I] which has a linear relationship determined by the deterministic node and tau(precision)?
In short, I wanna know what n values will be used by dnorm or any other density function for distributions(dgamma or dbeta).
In this specific instance y is your response variable, m is your linear predictor, and tau is precision (the inverse of variance). Using ~ makes the relationship stochastic. Looking to the JAGS user manual...
"Relations can be of two types. A stochastic relation (~) defines a stochastic node, representing a random variable in the model. A deterministic relation (<-) defines a deterministic
node, the value of which is determined exactly by the values of its parents. The equals sign
(=) can be used for a deterministic relation in place of the left arrow (<-)."
So, in other words, you are assuming that the values in y are drawn from a normal distribution that are related to m and tau.
While dnorm in R calculates the density, JAGS calculates the log density (as per the user manual). Effectively, this stochastic relationship allows you to use y and x to estimate alpha, beta, and tau, and you use dnorm in this case by making a distributional assumption about the data generating process.
Of course, as this is a Bayesian analysis, you'll need priors for your parameters. You can also deterministically calculate the standard deviation instead of precision. A full model would look something like...
model{
# likelihood
for(I in 1:length(y)) {
y[i] ~ dnorm(m[i],tau)
m[i] <- alpha + beta*(x[i] - x_bar)
}
# priors
tau ~ dgamma(0.001, 0.001)
sd <- 1/ sqrt(tau)
alpha ~ dnorm(0, 0.001)
beta ~ dnorm(0, 0.001)
}

Outcome prediction using JAGS from R

[Code is updated and does not correspond to error messages anymore]
I am trying to understand how JAGS predicts outcome values (for a mixed markov model). I've trained the model on a dataset which includes outcome m and covariates x1, x2 and x3.
Predicting the outcome without fixing parameter values works in R, but the output seems completely random:
preds <- run.jags("model.txt",
data=list(x1=x1, x2=x2, x3=x3, m=m,
statealpha=rep(1,times=M), M=M, T=T, N=N), monitor=c("m_pred"),
n.chains=1, inits = NA, sample=1)
Compiling rjags model...
Calling the simulation using the rjags method...
Note: the model did not require adaptation
Burning in the model for 4000 iterations...
|**************************************************| 100%
Running the model for 1 iterations...
Simulation complete
Finished running the simulation
However, as soon as I try to fix parameters (i.e. use model estimates to predict outcome m, I get errors:
preds <- run.jags("model.txt",
data=list(x1=x1, x2=x2, x3=x3,
statealpha=rep(1,times=M), M=M, T=T, N=N, beta1=beta1), monitor=c("m"),
n.chains=1, inits = NA, sample=1)
Compiling rjags model...
Error: The following error occured when compiling and adapting the model using rjags:
Error in rjags::jags.model(model, data = dataenv, n.chains = length(runjags.object$end.state), :
RUNTIME ERROR:
Compilation error on line 39.
beta1[2,1] is a logical node and cannot be observed
beta1 in this case is a 2x2 matrix of coefficient estimates.
How is JAGS predicting m in the first example (no fixed parameters)? Is it just completely randomly choosing m?
How can I include earlier acquired model estimates to simulate new outcome values?
The model is:
model{
for (i in 1:N)
{
for (t in 1:T)
{
m[t,i] ~ dcat(ps[i,t,])
}
for (state in 1:M)
{
ps[i,1,state] <- probs1[state]
for (t in 2:T)
{
ps[i,t,state] <- probs[m[(t-1),i], state, i,t]
}
for (prev in 1:M){
for (t in 1:T) {
probs[prev,state,i,t] <- odds[prev,state,i,t]/totalodds[prev,i,t]
odds[prev,state,i,t] <- exp(alpha[prev,state,i] +
beta1[prev,state]*x1[t,i]
+ beta2[prev,state]*x2[t,i]
+ beta3[prev,state]*x3[t,i])
}}
alpha[state,state,i] <- 0
for (t in 1:T) {
totalodds[state,i,t] <- odds[state,1,i,t] + odds[state,2,i,t]
}
}
alpha[1,2,i] <- raneffs[i,1]
alpha[2,1,i] <- raneffs[i,2]
raneffs[i,1:2] ~ dmnorm(alpha.means[1:2],alpha.prec[1:2, 1:2])
}
for (state in 1:M)
{
beta1[state,state] <- 0
beta2[state,state] <- 0
beta3[state,state] <- 0
}
beta1[1,2] <- rcoeff[1]
beta1[2,1] <- rcoeff[2]
beta2[1,2] <- rcoeff[3]
beta2[2,1] <- rcoeff[4]
beta3[1,2] <- rcoeff[5]
beta3[2,1] <- rcoeff[6]
alpha.Sigma[1:2,1:2] <- inverse(alpha.prec[1:2,1:2])
probs1[1:M] ~ ddirich(statealpha[1:M])
for (par in 1:6)
{
alpha.means[par] ~ dt(T.constant.mu,T.constant.tau,T.constant.k)
rcoeff[par] ~ dt(T.mu, T.tau, T.k)
}
T.constant.mu <- 0
T.mu <- 0
T.constant.tau <- 1/T.constant.scale.squared
T.tau <- 1/T.scale.squared
T.constant.scale.squared <- T.constant.scale*T.constant.scale
T.scale.squared <- T.scale*T.scale
T.scale <- 2.5
T.constant.scale <- 10
T.constant.k <- 1
T.k <- 1
alpha.prec[1:2,1:2] ~ dwish(Om[1:2,1:2],2)
Om[1,1] <- 1
Om[1,2] <- 0
Om[2,1] <- 0
Om[2,2] <- 1
## Prediction
for (i in 1:N)
{
m_pred[1,i] <- m[1,i]
for (t in 2:T)
{
m_pred[t,i] ~ dcat(ps_pred[i,t,])
}
for (state in 1:M)
{
ps_pred[i,1,state] <- probs1[state]
for (t in 2:T)
{
ps_pred[i,t,state] <- probs_pred[m_pred[(t-1),i], state, i,t]
}
for (prev in 1:M)
{
for (t in 1:T)
{
probs_pred[prev,state,i,t] <- odds_pred[prev,state,i,t]/totalodds_pred[prev,i,t]
odds_pred[prev,state,i,t] <- exp(alpha[prev,state,i] +
beta1[prev,state]*x1[t,i]
+ beta2[prev,state]*x2[t,i]
+ beta3[prev,state]*x3[t,i])
}}
for (t in 1:T) {
totalodds_pred[state,i,t] <- odds_pred[state,1,i,t] + odds_pred[state,2,i,t]
}
}
}
TL;DR: I think you're just missing a likelihood.
Your model is complex, so perhaps I'm missing something, but as far as I can tell there is no likelihood. You are supplying the predictors x1, x2, and x3 as data, but you aren't giving any observed m. So in what sense can JAGS be "fitting" the model?
To answer your questions:
Yes, it appears that m is drawn as random from a categorical distribution conditioned on the rest of the model. Since there are no m supplied as data, none of the parameter distributions have cause for update, so your result for m is no different than you'd get if you just did random draws from all the priors and propagated them through the model in R or whatever.
Though it still wouldn't constitute fitting the model in any sense, you would be free to supply values for beta1 if they weren't already defined completely in the model. JAGS is complaining because currently beta1[i] = rcoeff[i] ~ dt(T.mu, T.tau, T.k), and the parameters to the T distribution are all fixed. If any of (T.mu, T.tau, T.k) were instead given priors (identifying them as random), then beta1 could be supplied as data and JAGS would treat rcoeff[i] ~ dt(T.mu, T.tau, T.k) as a likelihood. But in the model's current form, as far as JAGS is concerned if you supply beta1 as data, that's in conflict with the fixed definition already in the model.
I'm stretching here, but my guess is if you're using JAGS you have (or would like to) fit the model in JAGS too. It's a common pattern to include both an observed response and a desired predicted response in a jags model, e.g. something like this:
model {
b ~ dnorm(0, 1) # prior on b
for(i in 1:N) {
y[i] ~ dnorm(b * x[i], 1) # Likelihood of y | b (and fixed precision = 1 for the example)
}
for(i in 1:N_pred) {
pred_y[i] ~ dnorm(b * pred_x[i], 1) # Prediction
}
}
In this example model, x, y, and pred_x are supplied as data, the unknown parameter b is to be estimated, and we desire the posterior predictions pred_y at each value of pred_x. JAGS knows that the distribution in the first for loop is a likelihood, because y is supplied as data. Posterior samples of b will be constrained by this likelihood. The second for loop looks similar, but since pred_y is not supplied as data, it can do nothing to constrain b. Instead, JAGS knows to simply draw pred_y samples conditioned on b and the supplied pred_x. The values of pred_x are commonly defined to be the same as observed x, giving a predictive interval for each observed data point, or as a regular sequence of values along the x axis to generate a smooth predictive interval.

Constraining Bayesian multinomial logistic in R via JAGS

I am learning how to fit Bayesian multinomial logistic models in R. This is my first attempt at using JAGS via rjags. The code illustrates with a MWE what I am trying to do:
## simulate data
set.seed(123)
n=2000
rr<-rmultinom(n, 3, c(.1,.3,.2))
r2=as.numeric(rr==1)
r3=as.numeric(rr==2)
r4=as.numeric(rr==3)
abt=rbinom(n,1,.1);smk=rbinom(n,1,.3)
age=rnorm(n);bmi=rnorm(n)
## load programs
library("rjags")
## model
NMMmodel.string <- "
model{
for (i in 1:N){
## outcome levels 2, 3, and 4
r2[i] ~ dbern(pi2[i])
r3[i] ~ dbern(pi3[i])
r4[i] ~ dbern(pi4[i])
## linear predictors
logit(pi2[i]) <- g[1]+g[2]*age[i]+g[3]*abt[i]+g[4]*smk[i]
logit(pi3[i]) <- g[5]+g[6]*bmi[i]+g[7]*age[i]+g[8]*smk[i]
logit(pi4[i]) <- g[9]+g[10]*age[i]+g[11]*smk[i]+g[12]*bmi[i]
## probability that outcome is level 1
pi1[i] <- 1-pi2[i]-pi3[i]-pi4[i]
}
for (j in 1:12) {
g[j] ~ dnorm(0, 0.01)
}
}
"
NMMmodel.spec<-textConnection(NMMmodel.string)
## fit model w JAGS
jags <- jags.model(NMMmodel.spec,
data = list('r2'=r2,'r3'=r3,'r4'=r4,
'abt'=abt,'smk'=smk,
'age'=age,'bmi'=bmi,'N'=n),
n.chains=4,
n.adapt=100)
Here are two questions, in order of decreasing importance:
Question 1: I would like to put a constraint on the estimated parameters indexed by g[1] to g[12] such that pi1 lies between some arbitrary upper and lower bound: say, a=0.25 and b=0.75. One way is to use rejection sampling, where rjags will reject all samples that return pi1 less than a or greater b. How can I do this?
Question 2: What, exactly, is this program doing? For example, if this program implements a Gibbs sampler, is there a way to code it up without resorting to JAGS, or STAN, or BUGS? Something like the first set of code on this website?

Resources