I am trying to obtain a posterior predictive distribution for specified values of x from a simple linear regression in Jags. I could get the regression itself to work by adapting this example (from https://biometry.github.io/APES//LectureNotes/StatsCafe/Linear_models_jags.html) to my own data. I have supplied a smal chunk of this data here so that the code works here also.
library(rjags)
library(R2jags)
#create data
dw=c(-15.2,-13.0,-10.0,-9.8,-8.5,-8.5,-7.7,-7.5,-7.2,-6.1,-6.1,-6.1,-5.5,-5.0,-5.0,-5.0,-4.5,-4.0,-2.0,-1.0,1.3)
phos=c(11.8,12.9,15.0,14.4,17.3,16.1,20.8,16.6,16.2,18.2,18.8,19.2,15.6,17.0,18.9,22.1,18.9,22.8,21.6,20.5,21.1)
#convert to list
jagsdwphos=list(dw=dw,phos=phos,N=length(phos))
#write model function for linear regression
lm1_jags <- function(){
# Likelihood:
for (i in 1:N){
phos[i] ~ dnorm(mu[i], tau) # tau is precision (1 / variance)
mu[i] <- intercept + slope * dw[i]
}
# Priors:
intercept ~ dnorm(0, 0.01)
slope ~ dnorm(0, 0.01)
sigma ~ dunif(0, 100) # standard deviation
tau <- 1 / (sigma * sigma)
}
#specifiy paramters of MCMC sampler, choose posteriors to be reported and run the jags model
#set initial values for MCMC
init_values <- function(){
list(intercept = rnorm(1), slope = rnorm(1), sigma = runif(1))
}
#choose paramters to report on
params <- c("intercept", "slope", "sigma")
#run model in jags
lm_dwphos <- jags(data = jagsdwphos, inits = init_values, parameters.to.save = params, model.file = lm1_jags,
n.chains = 3, n.iter = 12000, n.burnin = 2000, n.thin = 10, DIC = F)
In addition to this regression, I would like to have an output of the posterior predictive distributions of particular phos values, but I cannot get it to work with this simple example I have written. I found a tutorial here https://doingbayesiandataanalysis.blogspot.com/2015/10/posterior-predicted-distribution-for.html and tried to implement it like this:
#create data
dw=c(-15.2,-13.0,-10.0,-9.8,-8.5,-8.5,-7.7,-7.5,-7.2,-6.1,-6.1,-6.1,-5.5,-5.0,-5.0,-5.0,-4.5,-4.0,-2.0,-1.0,1.3)
phos=c(11.8,12.9,15.0,14.4,17.3,16.1,20.8,16.6,16.2,18.2,18.8,19.2,15.6,17.0,18.9,22.1,18.9,22.8,21.6,20.5,21.1)
#specifiy phos values to use for posterior predictive distribution
phosprobe=c(14,18,22)
#convert to list
jagsdwphos=list(dw=dw,phos=phos,N=length(phos),xP=phosprobe)
#write model function for linear regression
lm1_jags <- function(){
# Likelihood:
for (i in 1:N){
phos[i] ~ dnorm(mu[i], tau) # tau is precision (1 / variance)
mu[i] <- intercept + slope * dw[i]
}
# Priors:
intercept ~ dnorm(0, 0.01) # intercept
slope ~ dnorm(0, 0.01) # slope
sigma ~ dunif(0, 100) # standard deviation
tau <- 1 / (sigma * sigma) # sigma^2 doesn't work in JAGS
nu <- nuMinusOne+1
nuMinusOne ~ dexp(1/29.0)
#prediction
for(i in 1:3){
yP ~ dt(intercept+slope*xP[i],tau,nu)
}
}
#specifiy paramters of MCMC sampler, choose posteriors to be reported and run the jags model
#set initial values for MCMC
init_values <- function(){
list(intercept = rnorm(1), slope = rnorm(1), sigma = runif(1))
}
#choose paramters to report on
params <- c("intercept", "slope", "sigma","xP","yP")
#run model in jags
lm_dwphos <- jags(data = jagsdwphos, inits = init_values, parameters.to.save = params, model.file = lm1_jags,
n.chains = 3, n.iter = 12000, n.burnin = 2000, n.thin = 10, DIC = F)
But I get the following error message:
Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, : RUNTIME ERROR: Compilation error on line 14. Attempt to redefine node yP[1]
I confess I don't quite understand exactly how the prediction was implemented in that example I used and could not find an explanation on what exactly nu is or where those numbers come from. So I presume that is where I made some mistake adapting to my example, but it was the only tutorial in Jags that I could find that gives the whole distribution of y values for the probed x instead of just the mean.
I would appreciate any help or explanation.
Thanks!
This error occurs because you are not indexing yP. You have written this loop like this:
#prediction
for(i in 1:3){
yP ~ dt(intercept+slope*xP[i],tau,nu)
}
As i moves from 1 to 3 the element yP is being written over. You need to index it like you have done with xP.
#prediction
for(i in 1:3){
yP[i] ~ dt(intercept+slope*xP[i],tau,nu)
}
Related
I am trying to fit a logistic regression model in JAGS, but I have data in the form of (# success y, # attempts n), rather than a binary variable. In R, one can fit a model to data such as these by using glm(y/n ~ ) with the "weights" argument, but I am not sure how to fit this in JAGS.
Here is a simple example that I hope addresses what I am trying to ask. Note that I am using the rjags package. Thanks for any help!
y <- rbinom(10, 500, 0.2)
n <- sample(500:600, 10)
p <- y/n
x <- sample(0:100, 10) # some covariate
data <- data.frame(y, n, p, x)
model <- "model{
# Specify likelihood
for(i in 1:10){
y[i] ~ dbin(p[i], n[i])
logit(p[i]) <- b0 + b1*x
}
# Specify priors
b0 ~ dnorm(0, 0.0001)
b1 ~ dnorm(0, 0.0001)
}"
You don't need to compute p in your data set at all. Just let it be a logical node in your model. I prefer the R2jags interface, which allows you to specify a BUGS model in the form of an R function ...
jagsdata <- data.frame(y=rbinom(10, 500, 0.2),
n=sample(500:600, 10),
x=sample(0:100, 10))
model <- function() {
## Specify likelihood
for(i in 1:10){
y[i] ~ dbin(p[i], n[i])
logit(p[i]) <- b0 + b1*x[i]
}
## Specify priors
b0 ~ dnorm(0, 0.0001)
b1 ~ dnorm(0, 0.0001)
}
Now run it:
library("R2jags")
jags(model.file=model,data=jagsdata,
parameters.to.save=c("b0","b1"))
I have a beta-binomial model like this
where $B$ is the beta function.
I want to estimate the parameters $\theta_1,\theta_2,\ldots,\theta_5$.
I used a Maximum likelihood method:
BBlikelihood = function(theta){
k = theta[1]/(1+exp(theta[2]*x+theta[3]))+theta[4]
mu = exp(k)/(1+exp(k))
if(theta[5]<0) theta[5]=0
return(-sum(lchoose(n,y)+
lbeta(y+mu*theta[5],n-y+(1-mu)*theta[5]) -
lbeta(mu*theta[5],(1-mu)*theta[5])))
}
theta.init = c(8,100,-3,-6,1)
BBtheta = optim(theta.init,BBlikelihood,control=list(maxit=1000))
BBtheta$par
Then, I used MCMC as below:
HastingsBBlikelihood = function(theta,theta.prime){
k = theta[1]/(1+exp(theta[2]*x+theta[3]))+theta[4]
mu = exp(k)/(1+exp(k))
k.prime = theta.prime[1]/(1+exp(theta.prime[2]*x+theta.prime[3]))+theta.prime[4]
mu.prime = exp(k.prime)/(1+exp(k.prime))
if(theta.prime[5]<0)
return(-Inf)
sum( ( lbeta(y+mu.prime*theta.prime[5],n-y+(1-mu.prime)*theta.prime[5])
- lbeta(mu.prime*theta.prime[5],(1-mu.prime)*theta.prime[5]))
- ( lbeta(y+mu*theta[5],n-y+(1-mu)*theta[5])
lbeta(mu*theta[5],(1-mu)*theta[5])))
}
SigmaBB = list(0.1*diag(5))
Sigma.i = 1
thetaBB.mcmc = matrix(c(8,100,-3,-6,1),1,5)
for(i in 2:25000){
if((i %% 1000)==0){
Sigma.i <- Sigma.i + 1;
SigmaBB[[Sigma.i]] <- 2.38^2*var(thetaBB.mcmc)/5
}
thetaBB.mcmc = rbind(thetaBB.mcmc,
mvrnorm(n=1,
thetaBB.mcmc[i-1,],
Sigma=SigmaBB[[Sigma.i]]))
if(log(runif(1))>HastingsBBlikelihood(thetaBB.mcmc[i-1,],thetaBB.mcmc[i,]))
thetaBB.mcmc[i,] <- thetaBB.mcmc[i-1,]
}
the MCMC estimations agree with maximum likelihood estimations.
Finally, I used jags to estimate the parameters as the following:
cat("model {
# Parameters
theta[1] ~ dunif(-1.e10, 1.e10)
theta[2] ~ dunif(-1.e10, 1.e10)
theta[3] ~ dunif(-1.e10, 1.e10)
theta[4] ~ dunif(-1.e10, 1.e10)
theta[5] ~ dunif(0, 1.e10)
# Observations
for (i in 1:TT){
k[i] <- theta[1]/(1+exp(theta[2]*x[i]+theta[3]))+theta[4]
mu[i] <- exp(k[i])/(1+exp(k[i]))
p[i] ~ dbeta(mu[i]*theta[5],(1-mu[i])*theta[5])
y[i] ~ dbin(p[i],n[i])
}
}",
file="BetaBinom.bug")
library("rjags")
TT<-length(y)
jags <- jags.model('BetaBinom.bug', data = list('x'=x, 'y'=y, 'n'=n , TT=TT ))
res <- coda.samples(jags,c('theta'),1000)
res.median = apply(res[[1]],2,median)
res.median
res.mean = apply(res[[1]],2,mean)
res.mean
res.sd = apply(res[[1]],2,sd)
res.sd
The estimations in this way are very strange and not as the previous values.
I wonder what is wrong in the jags function!?
Thanks for any comment or suggestion in advance.
https://ehc.ac/p/mcmc-jags/discussion/610037/thread/dc35eac1/#4823
Using JAGS I am trying to estimate a model including a unit-specific time trend.
However, the problem is that I don't know how to model this and so far I have been unable to find a solution.
As an example, consider we have the following data:
rain<-rnorm(200) # Explanatory variable
n1<-rnorm(200) # Some noise
gdp<-rain+n1 # Outcome variable
ccode<-rep(1:10,20) # Unit codes
year<-rep(1:20,10) # Years
Using normal linear regression, we would estimate the model as:
m1<-lm(gdp~rain+factor(ccode)*year)
Where factor(ccode)*year is the unit-specific time trend. Now I want to estimate the model using JAGS. So I create parameters for the indexing:
N<-200
J<-max(ccode)
T<-max(year)
And estimate the model,
library(R2jags)
library(rjags)
set.seed(42); runif(1)
dat<-list(gdp=gdp,
rain=rain,
ccode=ccode,
year=year,
N=N,J=J,T=T)
parameters<-c("b1","b0")
model.file <- "~/model.txt"
system.time(m1<-jags(data=dat,inits=NULL,parameters.to.save=parameters,
model.file=model.file,
n.chains=4,n.iter=500,n.burnin=125,n.thin=2))
with the following model file, and this is where the error is at the moment:
# Simple model
model {
# For N observations
for(i in 1:N) {
gdp[i] ~ dnorm(yhat[i], tau)
yhat[i] <- b1*rain[i] + b0[ccode[i]*year[i]]
}
for(t in 1:T) {
for(j in 1:J) {
b0[t,j] ~ dnorm(0, .01)
}
}
# Priors
b1 ~ dnorm(0, .01)
# Hyperpriors
tau <- pow(sd, -2)
sd ~ dunif(0,20)
}
I am fairly sure that the way in which I define b0 and the indexing is incorrect given the error I get when using the code: Compilation error on line 7. Dimension mismatch taking subset of b0.
However, I don't know how to solve this so I wondered whether someone here has suggestions about this?
Your lm example can also be written:
m1 <- lm(gdp ~ -1 + rain + factor(ccode) + factor(ccode):year)
The equivalent JAGS model would be:
M <- function() {
for(i in 1:N) {
gdp[i] ~ dnorm(yhat[i], sd^-2)
yhat[i] <- b0[ccode[i]] + b1*rain[i] + b2[ccode[i]]*year[i]
}
b1 ~ dnorm(0, 0.001)
for (j in 1:J) {
b0[j] ~ dnorm(0, 0.001)
b2[j] ~ dnorm(0, 0.001)
}
sd ~ dunif(0, 100)
}
parameters<-c('b0', 'b1', 'b2')
mj <- jags(dat, NULL, parameters, M, 3)
Comparing coefficients:
par(mfrow=c(1, 2), mar=c(5, 5, 1, 1))
plot(mj$BUGSoutput$summary[grep('^b0', row.names(mj$BUGSoutput$summary)), '50%'],
coef(m1)[grep('^factor\\(ccode\\)\\d+$', names(coef(m1)))],
xlab='JAGS estimate', ylab='lm estimate', pch=20, las=1,
main='b0')
abline(0, 1)
plot(mj$BUGSoutput$summary[grep('^b2', row.names(mj$BUGSoutput$summary)), '50%'],
coef(m1)[grep('^factor\\(ccode\\)\\d+:', names(coef(m1)))],
xlab='JAGS estimate', ylab='lm estimate', pch=20, las=1,
main='b2')
abline(0, 1)
I have a specific problem in Running R2OpenBUGS in R. But it runs perfectly well in OpenBUGS.
I want to understand what the problem is.
Here is my code:
model volatility;
const n=180;
{
# likelihood: joint distribution of ys
for (t in 1:n) { yisigma2[t] <- 1/exp(theta[t]);
y[t] ~ dnorm(0,yisigma2[t]);
}
prior distributions
mu ~ dnorm(0,0.1);
phistar ~ dbeta(20,1.5);
itau2 ~ dgamma(2.5,0.025);
beta <- exp(mu/2);
phi <- 2*phistar-1;
tau <- sqrt(1/itau2);
theta0 ~ dnorm(mu,itau2);
thmean[1] <- mu + phi*(theta0-mu);
theta[1] ~ dnorm(thmean[1],itau2);
for (t in 2:n)
{
thmean[t] <- mu + phi*(theta[t-1]-mu);
theta[t] ~ dnorm(thmean[t],itau2);
}
}
R Code:
svm.sim <- bugs(data, inits, model.file = "C:/Documents and Settings/code.txt",
parameters = c("mu", "phi", "tau"),
n.chains = 1, n.iter = 1000, codaPkg = TRUE,debug=TRUE)
and here is the error shown in log of OpenBUGS when I run it from R:
model is syntactically correct
data loaded
model compiled
this component of node is not stochastic phi error pos 25
unable to generate initial values for node of type UpdaterNormal.StdUpdater
model must be initialized before updating
model must be initialized before monitors used
model must be initialized before monitors used
model must be initialized before monitors used
model must be initialized before monitors used
model must be initialized before DIC can be monitored
model must be initialized before updating
model must be initialized before monitors used
DIC monitor not set
Any help will be greatly appreciated
Thank You and Regards
Dinakar
I ran your code with a slight modification (see below) and it seemed to work fine. I did specify the value n as data, and let OpenBUGS generate initial values itself. Is it an issue with how you specified your inits?
sink("stack_model.txt")
cat("
model{
# likelihood: joint distribution of ys
for (t in 1:n)
{
yisigma2[t] <- 1/exp(theta[t])
y[t] ~ dnorm(0,yisigma2[t])
}
mu ~ dnorm(0,0.1)
phistar ~ dbeta(20,1.5)
itau2 ~ dgamma(2.5,0.025)
beta <- exp(mu/2)
phi <- 2*phistar-1
tau <- sqrt(1/itau2)
theta0 ~ dnorm(mu,itau2)
thmean[1] <- mu + phi*(theta0-mu)
theta[1] ~ dnorm(thmean[1],itau2)
for (t in 2:n)
{
thmean[t] <- mu + phi*(theta[t-1]-mu)
theta[t] ~ dnorm(thmean[t],itau2)
}
}",fill=TRUE)
sink()
win.data<-list(n=180)
svm.sim <- bugs(win.data, inits=NULL, model.file = "stack_model.txt", parameters = c("mu", "phi", "tau"), n.chains = 2, n.iter = 1000, DIC=F, debug=TRUE, codaPkg =TRUE)
I'm working on a binomial mixture model using OpenBUGS and R package R2OpenBUGS. I've successfully built simpler models, but once I add another level for imperfect detection, I consistently receive the error variable X is not defined in model or in data set. I've tried a number of different things, including changing the structure of my data and entering my data directly into OpenBUGS. I'm posting this in the hope that someone else has experience with this error, and perhaps knows why OpenBUGS is not recognizing variable X even though it is clearly defined as far as I can tell.
I've also gotten the error expected the collection operator c error pos 8 - this is not an error I've been getting previously, but I am similarly stumped.
Both the model and the data-simulation function come from Kery's Introduction to WinBUGS for Ecologists (2010). I will note that the data set here is in lieu of my own data, which is similar.
I am including the function to build the dataset as well as the model. Apologies for the length.
# Simulate data: 200 sites, 3 sampling rounds, 3 factors of the level 'trt',
# and continuous covariate 'X'
data.fn <- function(nsite = 180, nrep = 3, xmin = -1, xmax = 1, alpha.vec = c(0.01,0.2,0.4,1.1,0.01,0.2), beta0 = 1, beta1 = -1, ntrt = 3){
y <- array(dim = c(nsite, nrep)) # Array for counts
X <- sort(runif(n = nsite, min = xmin, max = xmax)) # covariate values, sorted
# Relationship expected abundance - covariate
x2 <- rep(1:ntrt, rep(60, ntrt)) # Indicator for population
trt <- factor(x2, labels = c("CT", "CM", "CC"))
Xmat <- model.matrix(~ trt*X)
lin.pred <- Xmat[,] %*% alpha.vec # Value of lin.predictor
lam <- exp(lin.pred)
# Add Poisson noise: draw N from Poisson(lambda)
N <- rpois(n = nsite, lambda = lam)
table(N) # Distribution of abundances across sites
sum(N > 0) / nsite # Empirical occupancy
totalN <- sum(N) ; totalN
# Observation process
# Relationship detection prob - covariate
p <- plogis(beta0 + beta1 * X)
# Make a 'census' (i.e., go out and count things)
for (i in 1:nrep){
y[,i] <- rbinom(n = nsite, size = N, prob = p)
}
# Return stuff
return(list(nsite = nsite, nrep = nrep, ntrt = ntrt, X = X, alpha.vec = alpha.vec, beta0 = beta0, beta1 = beta1, lam = lam, N = N, totalN = totalN, p = p, y = y, trt = trt))
}
data <- data.fn()
And here is the model:
sink("nmix1.txt")
cat("
model {
# Priors
for (i in 1:3){ # 3 treatment levels (factor)
alpha0[i] ~ dnorm(0, 0.01)
alpha1[i] ~ dnorm(0, 0.01)
}
beta0 ~ dnorm(0, 0.01)
beta1 ~ dnorm(0, 0.01)
# Likelihood
for (i in 1:180) { # 180 sites
C[i] ~ dpois(lambda[i])
log(lambda[i]) <- log.lambda[i]
log.lambda[i] <- alpha0[trt[i]] + alpha1[trt[i]]*X[i]
for (j in 1:3){ # each site sampled 3 times
y[i,j] ~ dbin(p[i,j], C[i])
lp[i,j] <- beta0 + beta1*X[i]
p[i,j] <- exp(lp[i,j])/(1+exp(lp[i,j]))
}
}
# Derived quantities
}
",fill=TRUE)
sink()
# Bundle data
trt <- data$trt
y <- data$y
X <- data$X
ntrt <- 3
# Standardise covariates
s.X <- (X - mean(X))/sd(X)
win.data <- list(C = y, trt = as.numeric(trt), X = s.X)
# Inits function
inits <- function(){ list(alpha0 = rnorm(ntrt, 0, 2),
alpha1 = rnorm(ntrt, 0, 2),
beta0 = rnorm(1,0,2), beta1 = rnorm(1,0,2))}
# Parameters to estimate
parameters <- c("alpha0", "alpha1", "beta0", "beta1")
# MCMC settings
ni <- 1200
nb <- 200
nt <- 2
nc <- 3
# Start Markov chains
out <- bugs(data = win.data, inits, parameters, "nmix1.txt", n.thin=nt,
n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
Note: This answer has gone through a major revision, after I noticed another problem with the code.
If I understand your model correctly, you are mixing up the y and N from the simulated data, and what is passed as C to Bugs. You are passing the y variable (a matrix) to the C variable in the Bugs model, but this is accessed as a vector. From what I can see C is representing the number of "trials" in your binomial draw (actual abundances), i.e. N in your data set. The variable y (a matrix) is called the same thing in both the simulated data and in the Bugs model.
This is a reformulation of your model, as I understand it, and this runs ok:
sink("nmix1.txt")
cat("
model {
# Priors
for (i in 1:3){ # 3 treatment levels (factor)
alpha0[i] ~ dnorm(0, 0.01)
alpha1[i] ~ dnorm(0, 0.01)
}
beta0 ~ dnorm(0, 0.01)
beta1 ~ dnorm(0, 0.01)
# Likelihood
for (i in 1:180) { # 180 sites
C[i] ~ dpois(lambda[i])
log(lambda[i]) <- log.lambda[i]
log.lambda[i] <- alpha0[trt[i]] + alpha1[trt[i]]*X[i]
for (j in 1:3){ # each site sampled 3 times
y[i,j] ~ dbin(p[i,j], C[i])
lp[i,j] <- beta0 + beta1*X[i]
p[i,j] <- exp(lp[i,j])/(1+exp(lp[i,j]))
}
}
# Derived quantities
}
",fill=TRUE)
sink()
# Bundle data
trt <- data$trt
y <- data$y
X <- data$X
N<- data$N
ntrt <- 3
# Standardise covariates
s.X <- (X - mean(X))/sd(X)
win.data <- list(y = y, trt = as.numeric(trt), X = s.X, C= N)
# Inits function
inits <- function(){ list(alpha0 = rnorm(ntrt, 0, 2),
alpha1 = rnorm(ntrt, 0, 2),
beta0 = rnorm(1,0,2), beta1 = rnorm(1,0,2))}
# Parameters to estimate
parameters <- c("alpha0", "alpha1", "beta0", "beta1")
# MCMC settings
ni <- 1200
nb <- 200
nt <- 2
nc <- 3
# Start Markov chains
out <- bugs(data = win.data, inits, parameters, "nmix1.txt", n.thin=nt,
n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
Overall, the results from this model looks ok, but there are long autocorrelation lags for beta0 and beta1. The estimate of beta1 also seems a bit off(~= -0.4), so you might want to recheck the Bugs model specification, so that it is matching the simulation model (i.e. that you are fitting the correct statistical model). At the moment, I'm not sure that it does, but I don't have the time to check further right now.
I got the same message trying to pass a factor to OpenBUGS. Like so,
Ndata <- list(yrs=N$yrs, site=N$site), ... )
The variable "site" was not passed by the "bugs" function. It simply was not in list passed
to OpenBUGS
I solved the problem by passing site as numeric,
Ndata <- list(yrs=N$yrs, site=as.numeric(N$site)), ... )