Wondering if any of you know why JAGS would tell me there was a dimension mismatch with my initial values here.
I am attempting to fit a spatially explicit capture-recapture model in which I estimate a fish location (x,y) at each time step. There are M=64 individuals for T=21 time steps. This is estimated in array s, which loops though i=M and t=T drawing from two normal distributions for each coordinate--x,y. Making the dimension of s = (64,2,21).
My initial values for this array are plausible locations within suitable habitat, and is an array with dimensions 64, 2, 21.
Yet, JAGS gives me the error, Error in setParameters(init.values[[i]], i) : RUNTIME ERROR: Dimension mismatch in values supplied for s. If I simply dont initialize it, I get the same error but for a state matrix, z, with dimensions 64,21. If I also dont supply values for z, I get the error Error in node y[1,1,2] Node inconsistent with parents, where y is my array of observations, dimensions 64, 7, 21 (the second element is the #of detection gates).
Any help is very appreciated. See below for full code.
Cross-posted at SourceForge, with the initial values for s attached. Not quite sure how to post it here.
sink("mod.txt")
cat("
model{
#######################################
# lamda0 = baseline detection rate, gate dependent and stage dependent
# sigma2 = scale parameter for decline in detection prob, time dependent
# phi = survival, stage dependent
# s = activity center
# M=64 individuals, T=21 time steps
# z = true state, alive (1) or dead (0)
#######################################
for(i in 1:M){
for (j in 1:ngates){
logit(lamda0[i,j])<- beta[group[i]]+ gamma[j]
}
}
for(j in 1:ngates){
gamma[j] ~ dnorm(0,0.001)
lam0.g1[j]<- 1/(1+exp(-gamma[j]))
lam0.g2[j]<- 1/(1+exp(-gamma[j]-beta[2]))
}
beta[1]<-0
beta[2] ~dnorm(0,0.001)T(-10,10)
for (t in 1:T){
sigma2[t] ~ dgamma(0.1,0.1)
}
tauv ~ dunif(0,40)
tau<- 1/(tauv*tauv)
phi[1] ~dunif(0,1)
phi[2] ~dunif(0,1)
for(i in 1:M){
for(t in 1:(first[i]-1)){
s[i,1,t]<-0 #before first detection, not in system
s[i,2,t]<-0 #before first detection, not in system
z[i,t]<-0
}
for(t in (last[i]+1):T){
s[i,1,t]<-0
s[i,2,t]<-0
z[i,t]<-0
}
#First period, locs and states
z[i,first[i]] ~ dbern(1) #know fish is alive
s[i,1,first[i]] ~ dunif(xl,xu) #possible x,y coords
s[i,2,first[i]] ~ dunif(yl,yu)
xdex[i,first[i]]<- trunc(s[i,1,first[i]]+1)
ydex[i,first[i]]<- trunc(s[i,2,first[i]]+1)
pOK[i,first[i]] <- habmat[xdex[i,first[i]],ydex[i,first[i]]] # habitat check
OK[i,first[i]] ~ dbern(pOK[i,first[i]]) # OK[i] = 1, the ones trick
for(j in 1:ngates){
#First period, detection
d[i,j,first[i]]<-sqrt(pow((s[i,1,first[i]]-gate.locs[j,1]),2) + pow((s[i,2,first[i]]-gate.locs[j,2]),2)) #estimate distance to gate (euclid)
d2[i,j,first[i]]<-pow(d[i,j,first[i]],2)
lam_g[i,j,first[i]]<-lamda0[i,j]*exp(-d2[i,j,first[i]]/(2*sigma2[first[i]]))
y[i,j,first[i]] ~ dpois(lam_g[i,j,first[i]]) # number of captures/period/gate
}
#Subsequent periods
for(t in (first[i]+1):last[i]){
s[i,1,t] ~ dnorm(s[i,1,(t-1)],tau)T(xl, xu)
s[i,2,t] ~ dnorm(s[i,2,(t-1)],tau)T(yl, yu)
xdex[i,t]<- trunc(s[i,1,t]+1)
ydex[i,t]<- trunc(s[i,2,t]+1)
pOK[i,t] <- habmat[xdex[i,t],ydex[i,t]] # habitat check
OK[i,t] ~ dbern(pOK[i,t]) # OK[i] = 1, the ones trick
for(j in 1:ngates){
d[i,j,t]<-sqrt(pow((s[i,1,t]-gate.locs[j,1]),2) + pow((s[i,2,t]-gate.locs[j,2]),2)) #estimate distance to gate (euclid)
d2[i,j,t]<-pow(d[i,j,t],2)
lam_g[i,j,t]<-lamda0[i,j]*exp(-d2[i,j,t]/(2*sigma2[t]))
y[i,j,t] ~ dpois(lam_g[i,j,t])
}
phiUP[i,t]<-z[i,t-1]*phi[group[i]] #estimate 3-day survival rate
z[i,t] ~ dbern(phiUP[i,t])
}
}
}#model
", fill=TRUE)
sink()
OK = matrix(1, nrow=M, ncol=T)
dat<-list(y=y, first=first, habmat=habmat, group=group,
xl=xl,xu=xu,yl=yl,yu=yu,
last=last, OK = OK, M=M, T=T,
ngates=ngates,gate.locs=gate.locs)
z<-matrix(NA,M,T)
for(i in 1:M){
for(t in first[i]:last[i]){
z[i,t]<-1
}
}
s<-readRDS("s_inits.Rda")
inits<-function() {list(phi=runif(2,0,1), sigma2=runif(T,0,0.5), tauv=runif(1,0,30), s=s, z=z)}
init1<-inits()
init2<-inits()
init3<-inits()
jag.inits<-list(init1,init2,init3)
params<-c("phiUP","tauv","sigma2","s","z","beta","gamma","phi")
Related
Thanks for the help! I've got a latent variable occupancy model with p (detection) and Psi (occupancy) covariates that I set up using matrices rather than vectors for convenience. It is based on the jags model from Tobler et al. (2019) but I have converted to nimble and added detection covariates.
Most sites have 3 visits per year in each of three years, so I have a response variable array of:
y[1:n_sites , 1:n_species , 1:3 (years), 1:3 (surveys per year)]
I have a similar array of detection covariates:
Xobs[1:n_sites , 1:n_p.covariates , 1:3 (years), 1:3 (surveys per year)]
And occupancy covariates (same for each visit in a given year):
Xocc[1:n_sites , 1:n_Psi.covariates , 1:3 (years)]
The model runs fine when data are included for all sites, years, and surveys. However, some sites (e.g. 5%) were only surveyed twice in a given year, and a few (3%) were only surveyed in one or two (rather than all three) years. This results in a 'ragged array' of covariates, where some (e.g. z, estimated true presence, for a site that wasn't surveyed in a given year) covariates don't need to be estimated.
When I try to run the model with the missing surveys filled with either a) zeros or b) NAs, I get several problems:
some parameters fail to initialize (this isn't necessarily a problem though), and more importantly...
some chains for z (estimated true presence) for a given site and year get stuck on 1 (this seems to occur for site_year combinations that have fewer than 3 surveys
So my main questions is how I should be dealing with those missing values? The solution I tried (in the code below) is to index year and survey numbers for each site. For years I do this with a years-per-site vector (yrs), and for surveys per year I do this with a matrix of k[1:n_sites,1:n years] that shows how many visits there were.
[side note: this model doesn't yet deal with the non-independence of sites between years; I will deal with that once I get the basic model working but am open to suggestions]
The code below makes a reproducible example (albeit perhaps not in the most efficient way). Run as-is it consists of 'full' input data (no missing years or surveys) and runs fine (as far as I can tell). If you then comment out these lines....
##############################################################
#now to make sure model runs properly with all arrays full (no missing sites):
##############################################################
k[]=3
yrsV[]=3
...and un-comment the lines that follow it (setting options for what to use to replace missing X and Y values (NA vs. zero), then you will have a dataset that resembles our true data (and gives problems in MCMC output, although it runs).
Thank you!
library(nimble)
library(mcmcplots)
library(abind)
###########################
#create detection data
###########################
#n species
n.sp=10
#mean p per species
p.sp=(1:10)/12;p.sp
#mean Psi per species
psi.intercept=0.4 #all species
#n sites
n.sites=100
#n years each site surveyed
yrs.site=sample(c(3,3,3,2,2,1),n.sites,replace = TRUE);yrs.site
#n surveys in each year at each site
surv.yr.site=data.frame(matrix(0,n.sites,max(yrs.site)))
for (i in 1:max(yrs.site)) {
names(surv.yr.site)[i]=paste0("yr_",i)
surv.yr.site[,i]=sample(c(3,3,3,2,2,1),n.sites,replace = TRUE)
for (j in 1:n.sites) {
if (yrs.site[j]<i) {
surv.yr.site[j,i]=NA
}
}
};surv.yr.site
#make true presence/abs by year
z1=z2=z3=matrix(0,n.sites,n.sp)
for (i in 1:n.sp) {
z1[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
z2[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
z3[,i]=rbinom(n=n.sites,size=1,prob=psi.intercept)
};head(z1)
#make detections by year
y1=y2=y3=array(0,c(n.sites,n.sp,max(as.matrix(surv.yr.site),na.rm = TRUE)));dim(y1)
for (i in 1:max(as.matrix(surv.yr.site),na.rm = TRUE)) {
for (j in 1:n.sp) {
y1[,j,i]=z1[,j]*rbinom(n.sites,1,p.sp[j])
y2[,j,i]=z2[,j]*rbinom(n.sites,1,p.sp[j])
y3[,j,i]=z3[,j]*rbinom(n.sites,1,p.sp[j])
}
};head(y1)
#merge into y array y[sites,species,years,surveys]
y=array(0,c(n.sites,n.sp,max(yrs.site),max(as.matrix(surv.yr.site),
na.rm = TRUE)));dim(y)
for (i in 1:dim(y)[3]) {
for (j in 1:dim(y)[4]) {
y[,,i,j]=get(paste0("y",i))[,,j]
}
};head(y)
#make four p covariates for each visit (with no simulated effects)
Xobs=array(0,c(n.sites,4,max(yrs.site),max(as.matrix(surv.yr.site),
na.rm = TRUE)));dim(Xobs)
Xobs[]=rnorm(length(Xobs))
head(Xobs)
#make four Psi covariates for each year (with no simulated effect)
Xocc=array(0,c(n.sites,4,max(yrs.site)));dim(Xocc)
Xocc[]=rnorm(length(Xocc))
head(Xocc)
#generate constants
#n species
n=n.sp;n
#J sites
J=n.sites;J
#nYr years
yrsV=yrs.site;yrsV
#surveys per year per site
k=surv.yr.site
#Vocc psi covars
Vocc=ncol(Xocc);Vocc
#Vobs p covars
Vobs=ncol(Xobs);Vobs
#set n latent variables (1/4 to 1/2 n)
nlv=5
#
#############################
#nimble model
#############################
code <- nimbleCode({
#model modified from Tobler et al. 2019 by Ryan C. Burner ryan.c.burner#gmail.com
#original paper https://doi.org/10.1002/ecy.2754
#modified to include detection covariates and derived parameters
# community hyper-priors for occupancy covariates
for(m in 1:Vocc) { #n occ covars
mu.u.b[m] ~ T(dnorm(0, 0.01),-10,10)
tau.u.b[m] <- pow(sigma.u.b[m], -2)
sigma.u.b[m] ~ dunif(0, 10)
}
# community hyper-priors for detection covariates
for(m in 1:Vobs) { #n detect covars
mu.v.b[m] ~ T(dnorm(0, 0.01),-10,10)
tau.v.b[m] <- pow(sigma.v.b[m], -2)
sigma.v.b[m] ~ dunif(0, 10)
}
# Latent variables
for(j in 1:J) { #n sites
for(l in 1:nlv){ # n lv
LV[j,l] ~ dnorm(0,1)
}
}
# Latent variable coefficients with constraints
# diagonal elements positive, upper diagonal 0
for(l in 1:(nlv-1)){ # n lv
for(l2 in (l+1):nlv){ # n lv
lv.coef[l,l2] <- 0
}
}
# Sign constraints on diagonal elements
for(l in 1:nlv) { # n lv
lv.coef[l,l] ~ dunif(0,1)
}
# lower diagonal free
for(l in 2:nlv){ # n lv
for(l2 in 1:(l-1)){
lv.coef[l,l2] ~ dunif(-1,1)
}
}
# other elements free
for(i in (nlv+1):n) { # n lv
for(l in 1:nlv){ # n lv
lv.coef[i,l] ~ dunif(-1,1)
}
}
# loop over all SPECIES
for (i in 1:(n)) { #n species
# random [beta] effect priors by species for occupancy
for(m in 1:Vocc) { #n occ covars
u.b[i, m] ~ T(dnorm(mu.u.b[m], tau.u.b[m]),-10,10)
} #prior, i species by m occ covars
# random [beta] effect priors by species for detection
for(m in 1:Vobs) { #n obs covars
v.b[i, m] ~ T(dnorm(mu.v.b[m], tau.v.b[m]),-10,10)
} #prior, i species by m det covars
# loop over all SITES
for (j in 1:J) { #n sites
for (yr in 1:(yrs[j])) {
eta[j,i,yr] <- inprod(u.b[i,1:Vocc], Xocc[j, ,yr]) +
inprod(lv.coef[i,1:nlv],LV[j,1:nlv])
}
# draw from normal, constrain variance to 1
# j sites x i species
#for each year
for (yr in 1:(yrs[j])) {
u[j,i,yr] ~ dnorm(eta[j,i,yr],1/(1-sum(lv.coef[i,1:nlv]^2)))
z[j,i,yr] <- step(u[j,i,yr])
}
# loop over all VISITS (SURVEYS) in all YEARS
for (yr in 1:(yrs[j])) {
for (kv in 1:(k[j,yr])) {
# logistic model for detection
logit(p[j,i,yr,kv]) <- inprod(v.b[i,1:Vobs], Xobs[j, ,yr,kv])
#p sites x species x years x visits
#estimated p x site x species x year x survey
mu.p[j,i,yr,kv] <- p[j,i,yr,kv]*z[j,i,yr]
y[j,i,yr,kv] ~ dbern(mu.p[j,i,yr,kv])
} # i species, j sites, yr years, kv visits
}
}
}
###Derived parameters###
#proportion of sites (j) occupied in 1+ yrs by each species (i) #could divide by pool/year
for (i in 1:(n)) { #n species
for (j in 1:(J)) { #J sites
site.occ.ever.bySp[j,i] <- max(z[j,i,1:(yrs[j])]) #indexing trouble here??
}
prop.occ.ever.bySp[i] <- sum(site.occ.ever.bySp[1:J,i])/J # j x i
}
})
occ.inits = function() { #function is used so that it is repeated per chain
#for u.b., I use estimates from a glm. If NA, fill with random
u.b<-t(sapply(seq_len(ncol(y[,,,1])),
function(x) {xx=unname(coef(glm(((apply(y, c(1,2),
max,na.rm=TRUE)>0)*1)[, x] ~ Xocc[, -1,1],
family=binomial(link=logit),
na.action = 'na.omit')))
#random values replace errors
xx[is.na(xx)==TRUE]<-rnorm(length(xx[is.na(xx)==TRUE]))
return(xx)}))
colnames(u.b)<-colnames(Xocc)
rownames(u.b)<-colnames(y)
#for really high values, bring in line with priors
u.b[abs(u.b)>10]=runif(length(u.b[abs(u.b)>10]),0,5)*sign(u.b[abs(u.b)>10])
lv.coef=matrix(runif(nlv*n,-0.2,0.2),n,nlv)
diag(lv.coef)=abs(diag(lv.coef))
lv.coef[upper.tri(lv.coef)]=0
v.b=matrix(rnorm(Vobs*n),c(n,Vobs))
colnames(v.b)=colnames(Xobs)
rownames(v.b)<-colnames(y)
u=apply(y, c(1,2,3), max,na.rm=TRUE)-runif(J*n*max(yrsV),0.1,0.8)
#replace Inf/-Inf with zero (not sure if it matters)
u[is.infinite(u)]=runif(length(u[is.infinite(u)]),-0.3,-0.1)
list(
mu.u.b=rnorm(Vocc,0, 0.01),
sigma.u.b=runif(Vocc, 0, 10),
mu.v.b=rnorm(Vobs,0, 0.01),
sigma.v.b=runif(Vobs, 0, 10),
u.b=u.b,
v.b=v.b,
u=u,
lv.coef=lv.coef,
LV=matrix(rnorm(nlv*J),J,nlv)
)
}
#indicate which parameters to monitor
params <- c('u.b','v.b','mu.u.b','tau.u.b',
prop.occ.ever.bySp, #if derived parameter included in model
'mu.v.b','tau.v.b','z')
##############################################################
#now to make sure model runs properly with all arrays full (no missing sites):
##############################################################
k[]=3
yrsV[]=3
#####################################################
#alternatively, what happens if we remove the surveys that didn't happen
#(need to rerun all code above, skipping previous two lines)
#####################################################
# #choose values to pad X covars and Y responses with
# # X covars:
# #padX=NA
# padX=0
# # Y responses:
# padY=NA
# #padY=0
#
# #fill in missing surveys with pad value
# for (i in 1:max(yrs.site)) {
# for (j in 1:nrow(y)) {
# if (yrsV[j]<i) {
# y[j,,i,][]=padY
# Xobs[j,,i,][]=padX
# Xocc[j,,i][]=padX
# } else {
# for (l in 1:max(as.matrix(surv.yr.site),na.rm = TRUE)) {
# if (k[j,i]<l) {
# y[j,,i,l][]=padY
# Xobs[j,,i,l][]=padX
# }
# }
# }
# }
# }
#
# k[is.na(k)==TRUE]=padVal
##get the x and y data##
# DATA DESCRIPTION
# n is number of species
# J is number of sites
# y is detections
# k is number of surveys in each of 3 years
# yrsV is number of years surveyed for each point
# nlv is number of latent variables
# Vocc and Vobs are numbers of occ (Psi) and obs (p) covariates
# Xocc and Xobs are matrices of covariates (incl. intercept)
occ.data = list(n=n, J=J, k=k, y=y,
Xocc=Xocc,Xobs=Xobs,Vocc=Vocc,Vobs=Vobs,nlv=nlv,yrs=yrsV)#
names(occ.data)
data <- occ.data[c(4:6)];names(data)
constants=occ.data[c(1:3,7:length(occ.data))];names(constants)
#set the initial values
inits <- occ.inits()
#inspect inits
i=1
head(inits[[i]]);names(inits)[i];i=i+1
######################################
###to run the model in Nimble...
######################################
# 1) create the model object
glmmModel <- nimbleModel(code = code, data = data, constants = constants,
inits = inits, check = getNimbleOption("checkModel"))
glmmModel$initializeInfo()
# 2) compile model
CglmmModel <- compileNimble(glmmModel,showCompilerOutput = TRUE)
# 3) configure mcmc
fglmmMCMC <- configureMCMC(glmmModel)
# 4) add monitors for parameters
fglmmMCMC$addMonitors(params)
# 5) build mcmc
BglmmMCMC<- buildMCMC(fglmmMCMC)
# 6) compile model
CglmmMCMC <- compileNimble(BglmmMCMC,showCompilerOutput = TRUE)
#setup mcmc
ni <- 800
nb <- 150
nt <- 10
nc <- 2
#run mcmc
s=Sys.time();s
samples <- runMCMC(CglmmMCMC, niter = ni, nburnin = nb, nchains = nc, thin = nt,
setSeed = 3,
summary = FALSE, samplesAsCodaMCMC = TRUE)
s-Sys.time();ss=s-Sys.time()
print(ss)
beepr::beep(5)
#check for z (occupancy) parameters where chains are stuck on 1:
problem=which(colMeans(samples$chain1[,grep("z",colnames(samples$chain1))])==1)
length(problem)
#view a subset (can run this line repeatedly)
#z parameters are z[site,species,year]
#can run this line repeatedly to see random sample
samples$chain1[1:5,grep("z",colnames(samples$chain1))[sample(problem,10)]]
#to check number of years (columns) and surveys (values) in a given year:
k
#to check y (observation) input values for a given site enter number here:
#y is y[sites,species,years,surveys]
#e.g.....
y[26,10,1,]
#check for z (occupancy) parameters where chains are stuck on 0:
problem2=which(colMeans(samples$chain1[,grep("z",colnames(samples$chain1))])==0)
length(problem2)
#view a subset (can run this line repeatedly)
#z parameters are z[site,species,year]
#can run this line repeatedly to see random sample
samples$chain1[1:5,grep("z",colnames(samples$chain1))[sample(problem2,10)]]
#to check number of years (columns) and surveys (values) in a given year:
k
#to check y (observation) input values for a given site enter number here:
#y is y[sites,species,years,surveys]
#e.g.....
y[26,10,1,]
#get summaries of the chains
xx=summary(samples)
head(xx)
#here I get an error that there are NAs (from unused parameters due to ragged array??)
#get plots of chains
mcmcplot(samples)
beepr::beep(5)
#why does this error occur on so many (or all) of the z's
#(and some other parameters with my real data)
#z[1, 1, 1]. Error in bw.SJ(x, method = "ste"): sample is too sparse to find TD
Good Morning, please I need community help in order to understand some problems that occurred writing this model.
I aim at modeling causes of death proportion using as predictors "log_GDP" (Gross domestic product in log scale), and "log_h" (hospital beds per 1,000 people on log scale)
y: 3 columns that are observed proportions of deaths over the years.
x1: "log_GDP" (Gross domestic product in log scale)
x2: "log_h" (hospital beds per 1,000 people in log scale)
As you can see from the estimation result in the last plot, I got a high noise level. Where I worked using just one covariate i.e. log_GDP, I obtained smoothed results
Here the model specification:
Here simulated data:
library(reshape2)
library(tidyverse)
library(ggplot2)
library(runjags)
CIRC <- c(0.3685287, 0.3675516, 0.3567829, 0.3517274, 0.3448940, 0.3391031, 0.3320184, 0.3268640,
0.3227445, 0.3156360, 0.3138515,0.3084506, 0.3053657, 0.3061224, 0.3051044)
NEOP <- c(0.3602199, 0.3567355, 0.3599409, 0.3591258, 0.3544591, 0.3566269, 0.3510974, 0.3536156,
0.3532980, 0.3460948, 0.3476183, 0.3475634, 0.3426035, 0.3352433, 0.3266048)
OTHER <-c(0.2712514, 0.2757129, 0.2832762, 0.2891468, 0.3006468, 0.3042701, 0.3168842, 0.3195204,
0.3239575, 0.3382691, 0.3385302, 0.3439860, 0.3520308, 0.3586342, 0.3682908)
log_h <- c(1.280934, 1.249902, 1.244155, 1.220830, 1.202972, 1.181727, 1.163151, 1.156881, 1.144223,
1.141033, 1.124930, 1.115142, 1.088562, 1.075002, 1.061257)
log_GDP <- c(29.89597, 29.95853, 29.99016, 30.02312, 30.06973, 30.13358, 30.19878, 30.25675, 30.30184,
30.31974, 30.30164, 30.33854, 30.37460, 30.41585, 30.45150)
D <- data.frame(CIRC=CIRC, NEOP=NEOP, OTHER=OTHER,
log_h=log_h, log_GDP=log_GDP)
cause.y <- as.matrix((data.frame(D[,1],D[,2],D[,3])))
cause.y <- cause.y/rowSums(cause.y)
mat.x<- D$log_GDP
mat.x2 <- D$log_h
n <- 15
Jags Model
dirlichet.model = "
model {
#setup priors for each species
for(j in 1:N.spp){
m0[j] ~ dnorm(0, 1.0E-3) #intercept prior
m1[j] ~ dnorm(0, 1.0E-3) # mat.x prior
m2[j] ~ dnorm(0, 1.0E-3)
}
#implement dirlichet
for(i in 1:N){
y[i,1:N.spp] ~ ddirch(a0[i,1:N.spp])
for(j in 1:N.spp){
log(a0[i,j]) <- m0[j] + m1[j] * mat.x[i]+ m2[j] * mat.x2[i] # m0 = intercept; m1= coeff log_GDP; m2= coeff log_h
}
}} #close model loop.
"
jags.data <- list(y = cause.y,mat.x= mat.x,mat.x2= mat.x2, N = nrow(cause.y), N.spp = ncol(cause.y))
jags.out <- run.jags(dirlichet.model,
data=jags.data,
adapt = 5000,
burnin = 5000,
sample = 10000,
n.chains=3,
monitor=c('m0','m1','m2'))
out <- summary(jags.out)
head(out)
Gather coefficient and I make estimation of proportions
coeff <- out[c(1,2,3,4,5,6,7,8,9),4]
coef1 <- out[c(1,4,7),4] #coeff (interc and slope) caus 1
coef2 <- out[c(2,5,8),4] #coeff (interc and slope) caus 2
coef3 <- out[c(3,6,9),4] #coeff (interc and slope) caus 3
pred <- as.matrix(cbind(exp(coef1[1]+coef1[2]*mat.x+coef1[3]*mat.x2),
exp(coef2[1]+coef2[2]*mat.x+coef2[3]*mat.x2),
exp(coef3[1]+coef3[2]*mat.x+coef3[3]*mat.x2)))
pred <- pred / rowSums(pred)
Predicted and Obs. values DB
Obs <- data.frame(Circ=cause.y[,1],
Neop=cause.y[,2],
Other=cause.y[,3],
log_GDP=mat.x,
log_h=mat.x2)
Obs$model <- "Obs"
Pred <- data.frame(Circ=pred[,1],
Neop=pred[,2],
Other=pred[,3],
log_GDP=mat.x,
log_h=mat.x2)
Pred$model <- "Pred"
tot60<-as.data.frame(rbind(Obs,Pred))
tot <- melt(tot60,id=c("log_GDP","log_h","model"))
tot$variable <- as.factor(tot$variable)
Plot
tot %>%filter(model=="Obs") %>% ggplot(aes(log_GDP,value))+geom_point()+
geom_line(data = tot %>%
filter(model=="Pred"))+facet_wrap(.~variable,scales = "free")
The problem for the non-smoothness is that you are calculating Pr(y=m|X) = f(x1, x2) - that is the predicted probability is a function of x1 and x2. Then you are plotting Pr(y=m|X) as a function of a single x variable - log of GDP. That result will almost certainly not be smooth. The log_GDP and log_h variables are highly negatively correlated which is why the result is not much more variable than it is.
In my run of the model, the average coefficient for log_GDP is actually positive for NEOP and Other, suggesting that the result you see in the plot is quite misleading. If you were to plot these in two dimensions, you would see that the result is again, smooth.
mx1 <- seq(min(mat.x), max(mat.x), length=25)
mx2 <- seq(min(mat.x2), max(mat.x2), length=25)
eg <- expand.grid(mx1 = mx1, mx2 = mx2)
pred <- as.matrix(cbind(exp(coef1[1]+coef1[2]*eg$mx1 + coef1[3]*eg$mx2),
exp(coef2[1]+coef2[2]*eg$mx1 + coef2[3]*eg$mx2),
exp(coef3[1]+coef3[2]*eg$mx1 + coef3[3]*eg$mx2)))
pred <- pred / rowSums(pred)
Pred <- data.frame(Circ=pred[,1],
Neop=pred[,2],
Other=pred[,3],
log_GDP=mx1,
log_h=mx2)
lattice::wireframe(Neop ~ log_GDP + log_h, data=Pred, drape=TRUE)
A couple of other things to watch out for.
Usually in hierarchical Bayesian models, your the parameters of your coefficients would themselves be distributions with hyperparameters. This enables shrinkage of the coefficients toward the global mean which is a hallmark of hierarhical models.
Not sure if this is what your data really look like or not, but the correlation between the two independent variables is going to make it difficult for the model to converge. You could try using a multivariate normal distribution for the coefficients - that might help.
I'm a beginner with OpenBUGS which I use through the R2OpenBUGS R package. I try to set state space model for identifying a lognormal signal in very noisy data. After many trials and errors, I managed to get this code but I still get the following error message: "empty slot not allowed in variable name error pos 664" which I don't understand. Can anyone knows what is wrong with the code ?
Disclaimer:
alt = measured altitude
true_alt = what I try to assess
nbird = number of individuals
nobs = number of observations (this number is not the same for every bird)
nstate = 'flight state', which is the way the birds behave (nstate = 3 because there are 3 different behaviours)
I try to determine the lognormal distribution of true_alt for each state.
model <- function(){
## MODEL SPECIFICATION
for(j in 1:nbird){
for(i in 1:nobs[j]){
alt[i,j] ~ dnorm(true_alt[i,j], tau.obs)
log(true_alt[i,j]) <- log_true_alt[i,j]
log_true_alt[i,j] ~ dnorm(mean.alt[i,j], tau[state[i,j]])
mean.alt[i,j] <- alt1[state[i,j]] + ind.re[j]
}
}
for(i in 1:nstate){ tau[i] <- 1/(sig[i]) }
# Random Effects:
tau.re <- 1/sig.re
for(j in 1:nbird) { ind.re[j] ~ dnorm(0, tau.re) }
## PRIORS
for(i in 1:nstate) {
alt1[i] ~ dnorm(0, 0.01)
sig[i] ~ dunif(0, 200)
}
sig.re ~ dunif(0, 200)
state ~ dunif(1,3)
## POSTERIOR PREDICTIVE DISTRIBUTIONS FOR EACH STATE
for(s in 1:nstate){
log_alt_pred[s] ~ dnorm(alt1[s], tau[s])
log(alt_pred[s]) <- log_alt_pred[s]
}
}
Thank you!!!
It could be because in your priors you're trying to set a distribution for "alt1[i]" but in your model you've used "alt[i,j]".
I am puzzled by a simple question in R JAGS. I have for example, 10 parameters: d[1], d[2], ..., d[10]. It is intuitive from the data that they should be increasing. So I want to put a constraint on them.
Here is what I tried to do but it give error messages saying "Node inconsistent with parents":
model{
...
for (j in 1:10){
d.star[j]~dnorm(0,0.0001)
}
d=sort(d.star)
}
Then I tried this:
d[1]~dnorm(0,0.0001)
for (j in 2:10){
d[j]~dnorm(0,0.0001)I(d[j-1],)
}
This worked, but I don't know if this is the correct way to do it. Could you share your thoughts?
Thanks!
If you are ever uncertain about something like this, it is best to just simulate some data to determine if the model structure you suggest works (spoiler alert: it does).
Here is the model that I used:
cat('model{
d[1] ~ dnorm(0, 0.0001) # intercept
d[2] ~ dnorm(0, 0.0001)
for(j in 3:11){
d[j] ~ dnorm(0, 0.0001) I(d[j-1],)
}
for(i in 1:200){
y[i] ~ dnorm(mu[i], tau)
mu[i] <- inprod(d, x[i,])
}
tau ~ dgamma(0.01,0.01)
}',
file = "model_example.R")```
And here are the data I simulated to use with this model.
library(run.jags)
library(mcmcplots)
# intercept with sorted betas
set.seed(161)
betas <- c(1,sort(runif(10, -5,5)))
# make covariates, 1 for intercept
x <- cbind(1,matrix(rnorm(2000), nrow = 200, ncol = 10))
# deterministic part of model
y_det <- x %*% betas
# add noise
y <- rnorm(length(y_det), y_det, 1)
data_list <- list(y = as.numeric(y), x = x)
# fit the model
mout <- run.jags('model_example.R',monitor = c("d", "tau"), data = data_list)
Following this, we can plot out the estimates and overlay the true parameter values
caterplot(mout, "d", reorder = FALSE)
points(rev(c(1:11)) ~ betas, pch = 18,cex = 0.9)
The black points are the true parameter values, the blue points and lines are the estimates. Looks like this set up does fine so long as there are enough data to estimate all of those parameters.
It looks like there is an syntax error in the first implementation. Just try:
model{
...
for (j in 1:10){
d.star[j]~dnorm(0,0.0001)
}
d[1:10] <- sort(d.star) # notice d is indexed.
}
and compare the results with those of the second implementation. According to the documentation, these are both correct, but it is advised to use the function sort.
I'm using rjags to calculate a species abundance using an N-mixture model and count data. To capture over dispersion of my data, I used hyperpriors. But I get an
"Error in node S[1,1,2] Invalid parent values"
My guess is that I have a problem in the dimension of my prior OR one of the prior is negative, null or NA which stops the calculation of node S.
Any idea how to stop this problem ? Is it possible to initialize S?
model {
## PRIORS ##
lambda[1] ~ dunif(0, 500)
lambda[2] ~ dunif(0, 500)
p[1] ~ dunif(0, 1)
p[2] ~ dunif(0, 1)
# surdispersion
muepsomega1 ~ dnorm(0,0.0001)
sigepsomega1 ~ dunif(0,100)
iomega1 ~ dnorm(0,0.0001)
tauepsomega1 <- 1/(sigepsomega1*sigepsomega1)
omega2 ~ dunif(0, 1)
## LIKELIHOOD ##
# Create a loop across all j sites
for(j in 1:nSites) {
# surdispersion sur omega 1
omega1[j] <- iomega1 + epsomega1[j]
epsomega1[j] ~ dnorm(muepsomega1,tauepsomega1)
N[1,j,1] ~ dpois(lambda[1])
N[2,j,1] ~ dpois(lambda[2])
for (i in 1:3) {
S[i,j,1] ~ dnegbin(2, 1)
} # end loop i
for(t in 2:nYears) {
# Estimate survivorship (between year survival)
S[1,j,t] ~ dnegbin(omega1[j], N[1,j,t-1])
S[2,j,t] ~ dnegbin(omega2, N[2,j,t-1])
N[1,j,t] <- S[1,j,t]
N[2,j,t] <- S[2,j,t]
} # end loop t in 2:years
# Loop across sampling replicates to estimate detection
for (t in 1:nYears){
for(k in 1:nReps){
n[1,j,k,t] ~ dnegbin(p[1], N[1,j,t])
n[2,j,k,t] ~ dnegbin(p[2], N[2,j,t])
} # end loop k nreps
} # end loop j sites
}
This is how I call the model:
#Model file
modFile = "model/2016-07-12/model_Nmix.R"
inits <- function(){
list('lambda' =c(1,1), 'p'= c(1,1),'omega2' = 1,'iomega1'=1, 'muepsomega1'= 1, 'sigepsomega1'= 1, 'epsomega1'=c(1,1,1,1,1,1,1)) } # size epsomega1 is length(nSites)=7
# Compile the model
require(rjags)
abundance.out <- jags.model(file=modFile, data=data,n.chains = 3, n.adapt = 3000)
Let epsomega1 and iomega1 come from distributions that don't have any probability density over negative values. Gamma, uniform, log-normal, or truncated normal distributions are candidates, and your choice should depend on what you think the most correct model specification actually is.