Hierarchical Dirichlet regression (jags)... overfitting - r

Good Morning, please I need community help in order to understand some problems that occurred writing this model.
I aim at modeling causes of death proportion using as predictors "log_GDP" (Gross domestic product in log scale), and "log_h" (hospital beds per 1,000 people on log scale)
y: 3 columns that are observed proportions of deaths over the years.
x1: "log_GDP" (Gross domestic product in log scale)
x2: "log_h" (hospital beds per 1,000 people in log scale)
As you can see from the estimation result in the last plot, I got a high noise level. Where I worked using just one covariate i.e. log_GDP, I obtained smoothed results
Here the model specification:
Here simulated data:
library(reshape2)
library(tidyverse)
library(ggplot2)
library(runjags)
CIRC <- c(0.3685287, 0.3675516, 0.3567829, 0.3517274, 0.3448940, 0.3391031, 0.3320184, 0.3268640,
0.3227445, 0.3156360, 0.3138515,0.3084506, 0.3053657, 0.3061224, 0.3051044)
NEOP <- c(0.3602199, 0.3567355, 0.3599409, 0.3591258, 0.3544591, 0.3566269, 0.3510974, 0.3536156,
0.3532980, 0.3460948, 0.3476183, 0.3475634, 0.3426035, 0.3352433, 0.3266048)
OTHER <-c(0.2712514, 0.2757129, 0.2832762, 0.2891468, 0.3006468, 0.3042701, 0.3168842, 0.3195204,
0.3239575, 0.3382691, 0.3385302, 0.3439860, 0.3520308, 0.3586342, 0.3682908)
log_h <- c(1.280934, 1.249902, 1.244155, 1.220830, 1.202972, 1.181727, 1.163151, 1.156881, 1.144223,
1.141033, 1.124930, 1.115142, 1.088562, 1.075002, 1.061257)
log_GDP <- c(29.89597, 29.95853, 29.99016, 30.02312, 30.06973, 30.13358, 30.19878, 30.25675, 30.30184,
30.31974, 30.30164, 30.33854, 30.37460, 30.41585, 30.45150)
D <- data.frame(CIRC=CIRC, NEOP=NEOP, OTHER=OTHER,
log_h=log_h, log_GDP=log_GDP)
cause.y <- as.matrix((data.frame(D[,1],D[,2],D[,3])))
cause.y <- cause.y/rowSums(cause.y)
mat.x<- D$log_GDP
mat.x2 <- D$log_h
n <- 15
Jags Model
dirlichet.model = "
model {
#setup priors for each species
for(j in 1:N.spp){
m0[j] ~ dnorm(0, 1.0E-3) #intercept prior
m1[j] ~ dnorm(0, 1.0E-3) # mat.x prior
m2[j] ~ dnorm(0, 1.0E-3)
}
#implement dirlichet
for(i in 1:N){
y[i,1:N.spp] ~ ddirch(a0[i,1:N.spp])
for(j in 1:N.spp){
log(a0[i,j]) <- m0[j] + m1[j] * mat.x[i]+ m2[j] * mat.x2[i] # m0 = intercept; m1= coeff log_GDP; m2= coeff log_h
}
}} #close model loop.
"
jags.data <- list(y = cause.y,mat.x= mat.x,mat.x2= mat.x2, N = nrow(cause.y), N.spp = ncol(cause.y))
jags.out <- run.jags(dirlichet.model,
data=jags.data,
adapt = 5000,
burnin = 5000,
sample = 10000,
n.chains=3,
monitor=c('m0','m1','m2'))
out <- summary(jags.out)
head(out)
Gather coefficient and I make estimation of proportions
coeff <- out[c(1,2,3,4,5,6,7,8,9),4]
coef1 <- out[c(1,4,7),4] #coeff (interc and slope) caus 1
coef2 <- out[c(2,5,8),4] #coeff (interc and slope) caus 2
coef3 <- out[c(3,6,9),4] #coeff (interc and slope) caus 3
pred <- as.matrix(cbind(exp(coef1[1]+coef1[2]*mat.x+coef1[3]*mat.x2),
exp(coef2[1]+coef2[2]*mat.x+coef2[3]*mat.x2),
exp(coef3[1]+coef3[2]*mat.x+coef3[3]*mat.x2)))
pred <- pred / rowSums(pred)
Predicted and Obs. values DB
Obs <- data.frame(Circ=cause.y[,1],
Neop=cause.y[,2],
Other=cause.y[,3],
log_GDP=mat.x,
log_h=mat.x2)
Obs$model <- "Obs"
Pred <- data.frame(Circ=pred[,1],
Neop=pred[,2],
Other=pred[,3],
log_GDP=mat.x,
log_h=mat.x2)
Pred$model <- "Pred"
tot60<-as.data.frame(rbind(Obs,Pred))
tot <- melt(tot60,id=c("log_GDP","log_h","model"))
tot$variable <- as.factor(tot$variable)
Plot
tot %>%filter(model=="Obs") %>% ggplot(aes(log_GDP,value))+geom_point()+
geom_line(data = tot %>%
filter(model=="Pred"))+facet_wrap(.~variable,scales = "free")

The problem for the non-smoothness is that you are calculating Pr(y=m|X) = f(x1, x2) - that is the predicted probability is a function of x1 and x2. Then you are plotting Pr(y=m|X) as a function of a single x variable - log of GDP. That result will almost certainly not be smooth. The log_GDP and log_h variables are highly negatively correlated which is why the result is not much more variable than it is.
In my run of the model, the average coefficient for log_GDP is actually positive for NEOP and Other, suggesting that the result you see in the plot is quite misleading. If you were to plot these in two dimensions, you would see that the result is again, smooth.
mx1 <- seq(min(mat.x), max(mat.x), length=25)
mx2 <- seq(min(mat.x2), max(mat.x2), length=25)
eg <- expand.grid(mx1 = mx1, mx2 = mx2)
pred <- as.matrix(cbind(exp(coef1[1]+coef1[2]*eg$mx1 + coef1[3]*eg$mx2),
exp(coef2[1]+coef2[2]*eg$mx1 + coef2[3]*eg$mx2),
exp(coef3[1]+coef3[2]*eg$mx1 + coef3[3]*eg$mx2)))
pred <- pred / rowSums(pred)
Pred <- data.frame(Circ=pred[,1],
Neop=pred[,2],
Other=pred[,3],
log_GDP=mx1,
log_h=mx2)
lattice::wireframe(Neop ~ log_GDP + log_h, data=Pred, drape=TRUE)
A couple of other things to watch out for.
Usually in hierarchical Bayesian models, your the parameters of your coefficients would themselves be distributions with hyperparameters. This enables shrinkage of the coefficients toward the global mean which is a hallmark of hierarhical models.
Not sure if this is what your data really look like or not, but the correlation between the two independent variables is going to make it difficult for the model to converge. You could try using a multivariate normal distribution for the coefficients - that might help.

Related

OpenBugs error invalid integer value keeps coming up

I am trying to create a Bayesian latent class model through R software and OpenBUGS for 2 populations using 2 diagnostic tests. I have created the equations for most parameters but I also have inconclusive results. Therefore I want to expand the model to include the probability of an inconclusive result in test 1 in an infected individual and an uninfected individual, and then the same for test 2. I have been trying to create the equations for these probabilities (p1/2[1] - p1/2[4] in the model attached) but I am struggling and don't think I am quite there yet as the model won't run. Would you be able to help me with these equations? The error code I get every time I run it at the moment is invalid integer value for x1[1]
#Bayesian LCA model creation
library(R2OpenBUGS)
setwd("mywd")
model3=paste0("model3{
#multinomial model for the data
x1[1:8] ~ dmulti(p1[1:8], n1)
x2[1:8] ~ dmulti(p2[1:8], n2)
#Observed prevalence
#Pop 1 with 2 tests and unknown prevalence
p1[1] <- prev1*Se1*Se2*(1-IncT1Inf)+(1-prev1)*(1-Sp1)*(1-Sp2)*IncT1NonInf
p1[2] <- prev1*Se1*(1-Se2)*(1-IncT1NonInf)+(1-prev1)*(1-Sp1)*Sp2*IncT1Inf
p1[3] <- prev1*Se1*Se2*(1-IncT2Inf)+(1-prev1)*(1-Sp1)*(1-Sp2)*IncT2NonInf
p1[4] <- prev1*(1-Se1)*Se2*(1-IncT2NonInf)+(1-prev1)*Sp1*(1-Sp2)*IncT2Inf
p1[5] <- prev1*Se1*Se2+(1-prev1)*(1-Sp1)*(1-Sp2)
p1[6] <- prev1*Se1*(1-Se2)+(1-prev1)*(1-Sp1)*Sp2
p1[7] <- prev1*(1-Se1)*Se2+(1-prev1)*Sp1*(1-Sp2)
p1[8] <- prev1*(1-Se1)*(1-Se2)+(1-prev1)*Sp1*Sp2
#Pop 2 with 2 tests (same tests) and unknown prevalence
p2[1] <- prev2*Se1*Se2*(1-IncT1Inf)+(1-prev2)*(1-Sp1)*(1-Sp2)*IncT1NonInf
p2[2] <- prev2*Se1*(1-Se2)*(1-IncT1NonInf)+(1-prev2)*(1-Sp1)*Sp2*IncT1Inf
p2[3] <- prev2*Se1*Se2*(1-IncT2Inf)+(1-prev2)*(1-Sp1)*(1-Sp2)*IncT2NonInf
p2[4] <- prev2*(1-Se1)*Se2*(1-IncT2NonInf)+(1-prev2)*Sp1*(1-Sp2)*IncT2Inf
p2[5] <- prev2*Se1*Se2+(1-prev2)*(1-Sp1)*(1-Sp2)
p2[6] <- prev2*Se1*(1-Se2)+(1-prev2)*(1-Sp1)*Sp2
p2[7] <- prev2*(1-Se1)*Se2+(1-prev2)*Sp1*(1-Sp2)
p2[8] <- prev2*(1-Se1)*(1-Se2)+(1-prev2)*Sp1*Sp2
#Priors taken using median values from total lit search and 95th percentile provided in lit
prev1 ~ dbeta(8.89,60.26)
prev2 ~ dbeta(4.35,76.63)
Se1 ~ dbeta(14.59,0.86)
Sp1 ~ dbeta(14.95,0.86)
Se2 ~ dbeta(78.55,15.96)
Sp2 ~ dbeta(1.71,0.38)
IncT1Inf ~ dbeta(1,1)
IncT1NonInf ~ dbeta(1,1)
IncT2Inf ~ dbeta(1,1)
IncT2NonInf ~ dbeta(1,1)
}")
#write to temporary text file
write.table(model3, file="model3.txt", quote=FALSE, sep="", row.names=FALSE, col.names=FALSE)
#Data
#Sanctuary 1
n1=19
x1<-matrix(c(3,3,1,0,1,0,0,6,5),byrow=T,ncol=3,dimnames=list(c("TST+", "TST-",
"TSTInc"),c("QFT+", "QFT-", "QFTInc")))
as.numeric(x1)
x1 <- as.numeric(x1)
#Sanctuary 2
n2=12
x2<-matrix(c(0,0,0,4,8,0,0,0,0),byrow=T,ncol=3,dimnames=list(c("TST+", "TST-",
"TSTInc"),c("QFT+", "QFT-", "QFTInc")))
as.numeric(x2)
x2 <- as.numeric(x2)
#set data inputs to BUGS
dat <- list(x1=x1,n1=sum(x1),x2=x2,n2=sum(x2))
dat
#Set parameters desired to monitor
paras <- c("Se1","Sp1","Se2","Sp2","prev1","prev2","IncT1Inf","IncT1NonInf", "IncT2Inf",
"IncT2NonInf")
#Initialising values for 3 chains done using median and mean values from initial prior
calculation
inits<-list(list(Se1=0.964, Sp1=0.965, Se2=0.834, Sp2=0.969, prev1=0.125,
prev2=0.050,IncT1Inf=0.100,IncT1NonInf=0.250,IncT2Inf=0.150,IncT2Inf=0.250), list(Se1=0.965,
Sp1=0.965, Se2=0.834, Sp2=0.969, prev1=0.125, prev2=0.05,
IncT1Inf=0.200,IncT1NonInf=0.150,IncT2Inf=0.050,IncT2Inf=0.200), list(Se1=0.960,
Sp1=0.965,Se2=0.858,Sp2=0.937,IncT1Inf=0.100,IncT1NonInf=0.250,IncT2Inf=0.150,IncT2Inf=0.250)
)
#run model in R2OpenBUGS
niterations=12000
#Running with initial 12000 iterations, burning first 1000 and thinning every 10
bug.out <- bugs(dat, inits, paras, model.file="model3.txt", n.iter=niterations, n.burnin=1000,
n.thin=10, n.chains=3, saveExec=F, restart=F, debug=T, DIC=T, digits=6, codaPkg=F,
working.directory="mywd", clearWD=F, useWINE=F, WINE=NULL,
newWINE=F, WINEPATH=NULL, bugs.seed=1, summary.only=FALSE, over.relax = F)

Standard error for lme4 random effect predictions in lme4

I have an experiment with plants having different growth habits (growth_type), genotypes nested within growth types (ge), and blocks also nested within growth types (block). The objective is to test the influence of growth type and genotypes of plant performance. Here is a sample data and reproducible example.
data1 <- read.csv(text = "
growth_type,ge,block,performance
dwarf,A,1,128.32
dwarf,A,2,97.01
dwarf,A,3,91.05
dwarf,B,1,108.51
dwarf,B,2,121.11
dwarf,B,3,84.15
dwarf,C,1,132.55
dwarf,C,2,129.45
dwarf,C,3,122.33
tall,D,1,79.68
tall,D,2,122.5
tall,D,3,143.42
tall,E,1,149.29
tall,E,2,162.13
tall,E,3,135.42
tall,F,1,90.45
tall,F,2,127.4
tall,F,3,78.99")
These are the libraries I used:
library(dplyr)
library(lme4)
library(lsmeans)
The first step was fitting a model:
model.fit <-
lmer(performance ~ growth_type + (1 | block:growth_type) + (1 | ge:growth_type),
data = data1)
From this model, I can extract the fixed effect of growth type using lsmeans:
fixed.effect.estimates <- lsmeans::lsmeans(model.fit, "growth_type")
and this is the output:
What I need to obtain is the same output for the random effect. I am able to get the prediction interval, but I cannot get the standard error. This is what I tried:
# RANDOM EFFECT ESTIMATES
data1$pred.performance <-
predict(model.fit,
newdata = data1,
re.form= ~(1 | ge:growth_type))
pred.ge <- data1 %>%
distinct(ge, growth_type, pred.performance)
And this is what I've obtained. So far so good.
Then I used the bootMer function to build the prediction interval using bootstrapping.
mySumm <- function(.) {
predict(., newdata=pred.ge, re.form= ~(1 | ge:growth_type))
}
####Collapse bootstrap into median, 95% PI
sumBoot <- function(merBoot) {
return(
data.frame(fit = apply(merBoot$t, 2, function(x) as.numeric(quantile(x, probs=.5, na.rm=TRUE))),
lwr = apply(merBoot$t, 2, function(x) as.numeric(quantile(x, probs=.025, na.rm=TRUE))),
upr = apply(merBoot$t, 2, function(x) as.numeric(quantile(x, probs=.975, na.rm=TRUE)))
)
)
}
##lme4::bootMer() method 1
PI.boot1.time <- system.time(
boot1 <- lme4::bootMer(model.fit, mySumm, nsim=250, use.u=TRUE, type="parametric")
)
PI.boot1 <- sumBoot(boot1)
cbind(pred.ge, PI.boot1)
This is what I obtained:
In summary, my questions are:
how can I get the standard errors as I did for the fixed effect components?
why random effect estimates from lme4::predict are different from lme4::bootMer?
Sorry for a long explanation.

Adjusting ODE model output using a Rogan-Gladen estimator in R

I have made an ODE model in R using the package deSolve. Currently the output of the model gives me the "observed" prevalence of a disease (i.e. the prevalence not accounting for diagnostic imperfection).
However, I want to adjust the model to output the "true" prevalence, using a simple adjustment formula called the Rogan-Gladen estimator (http://influentialpoints.com/Training/estimating_true_prevalence.htm):
True prevalence =
(Apparent prev. + (Specificity-1)) / (Specificity + (Sensitivity-1))
As you will see in the code below, I have attempted to adjust only one of the differential equations (diggP).
Running the model without adjustment gives an expected output (a proportion between 0 and 1). However, attempting to adjust the model using the RG-estimator gives a spurious output (a proportion less than 0).
Any advice on what might be going wrong here would be very much appreciated.
# Load required packages
library(tidyverse)
library(broom)
library(deSolve)
# Set time (age) for function
time = 1:80
# Defining exponential decay of lambda over age
y1 = 0.003 + (0.15 - 0.003) * exp(-0.05 * time) %>% jitter(10)
df <- data.frame(t = time, y = y1)
fit <- nls(y ~ SSasymp(time, yf, y0, log_alpha), data = df)
fit
# Values of lambda over ages 1-80 years
data <- as.matrix(0.003 + (0.15 - 0.003) * exp(-0.05 * time))
lambda<-as.vector(data[,1])
t<-as.vector(seq(1, 80, by=1))
foi<-cbind(t, lambda)
foi[,1]
# Making lambda varying by time useable in the ODE model
input <- approxfun(x = foi[,1], y = foi[,2], method = "constant", rule = 2)
# Model
ab <- function(time, state, parms) {
with(as.list(c(state, parms)), {
# lambda, changing by time
import<-input(time)
# Derivatives
# RG estimator:
#True prevalence = (apparent prev + (sp-1)) / (sp + (se-1))
diggP<- (((import * iggN) - iggR * iggP) + (sp_igg-1)) / (sp_igg + (se_igg-1))
diggN<- (-import*iggN) + iggR*iggP
dtgerpP<- (0.5*import)*tgerpN -tgerpR*tgerpP
dtgerpN<- (0.5*-import)*tgerpN + tgerpR*tgerpP
# Return results
return(list(c(diggP, diggN, dtgerpP, dtgerpN)))
})
}
# Initial values
yini <- c(iggP=0, iggN=1,
tgerpP=0, tgerpN=1)
# Parameters
pars <- c(iggR = 0, tgerpR = (1/8)/12,
se_igg = 0.95, sp_igg = 0.92)
# Solve model
results<- ode(y=yini, times=time, func=ab, parms = pars)
# Plot results
plot(results, xlab="Time (years)", ylab="Proportion")

Approach for comparing linear, non-linear and different parameterization non-linear models

I search for one approach for comparing linear, non-linear and different parameterization non-linear models. For this:
#Packages
library(nls2)
library(minpack.lm)
# Data set - Diameter in function of Feature and Age
Feature<-sort(rep(c("A","B"),22))
Age<-c(60,72,88,96,27,
36,48,60,72,88,96,27,36,48,60,72,
88,96,27,36,48,60,27,27,36,48,60,
72,88,96,27,36,48,60,72,88,96,27,
36,48,60,72,88,96)
Diameter<-c(13.9,16.2,
19.1,19.3,4.7,6.7,9.6,11.2,13.1,15.3,
15.4,5.4,7,9.9,11.7,13.4,16.1,16.2,
5.9,8.3,12.3,14.5,2.3,5.2,6.2,8.6,9.3,
11.3,15.1,15.5,5,7,7.9,8.4,10.5,14,14,
4.1,4.9,6,6.7,7.7,8,8.2)
d<-dados <- data.frame(Feature,Age,Diameter)
str(d)
I will create three different models, two non-linear models with specific parametization and one linear model. In my example
a suppose that all the coefficients of each mode were significant (and not considering real results).
# Model 1 non-linear
e1<- Diameter ~ a1 * Age^a2
#Algoritm Levenberg-Marquardt
m1 <- nlsLM(e1, data = d,
start = list(a1 = 0.1, a2 = 10),
control = nls.control(maxiter = 1000))
# Model 2 linear
m2<-lm(Diameter ~ Age, data=d)
# Model 3 another non-linear
e2<- Diameter ~ a1^(-Age/a2)
m3 <- nls2(e2, data = d, alg = "brute-force",
start = data.frame(a1 = c(-1, 1), a2 = c(-1, 1)),
control = nls.control(maxiter = 1000))
Now, my idea is comparing the "better" model despite the different nature of each model, than I try a proportional measure
and for this I use each mean square error of each model comparing of total square error in data set, when a make this I have if
a comparing model 1 and 2:
## MSE approach (like pseudo R2 approach)
#Model 1
SQEm1<-summary(m1)$sigma^2*summary(m1)$df[2]# mean square error of model
SQTm1<-var(d$Diameter)*(length(d$Diameter)-1)#total square error in data se
R1<-1-SQEm1/SQTm1
R1
#Model 2
SQEm2<-summary(m2)$sigma^2*summary(m2)$df[2]# mean square error of model
R2<-1-SQEm2/SQTm1
R2
In my weak opinion model 1 is "better" that model 2. My question is, does this approach sounds correct? Is there any way to compare these models types?
Thanks in advance!
#First cross-validation approach ------------------------------------------
#Cross-validation model 1
set.seed(123) # for reproducibility
n <- nrow(d)
frac <- 0.8
ix <- sample(n, frac * n) # indexes of in sample rows
e1<- Diameter ~ a1 * Age^a2
#Algoritm Levenberg-Marquardt
m1 <- nlsLM(e1, data = d,
start = list(a1 = 0.1, a2 = 10),
control = nls.control(maxiter = 1000), subset = ix)# in sample model
BOD.out <- d[-ix, ] # out of sample data
pred <- predict(m1, new = BOD.out)
act <- BOD.out$Diameter
RSS1 <- sum( (pred - act)^2 )
RSS1
#[1] 56435894734
#Cross-validation model 2
m2<-lm(Diameter ~ Age, data=d,, subset = ix)# in sample model
BOD.out2 <- d[-ix, ] # out of sample data
pred <- predict(m2, new = BOD.out2)
act <- BOD.out2$Diameter
RSS2 <- sum( (pred - act)^2 )
RSS2
#[1] 19.11031
# Sum of squares approach -----------------------------------------------
deviance(m1)
#[1] 238314429037
deviance(m2)
#[1] 257.8223
Based in gfgm and G. Grothendieck comments, RSS2 has lower error that RSS1 and comparing deviance(m2) and deviance(m2) too, than model 2 is better than model 1.

Two models in one Winbugs script

I am conducting a Bayesian analysis using Winbugs from R. I need to combine two Winbugs scripts into one: however, I am receiving an error message (Variable x2 is not defined in model or in data set). Here is the winbugs code:
model{
# Model’s likelihood
for (i in 1:n) {
tto[i] ~ dnorm( mu[i], tau ) # stochastic componenent
b[i] ~ dnorm(0.0, tau2)
# link and linear predictor
mu[i] <- 1 - (beta.concern2*concern2[i] + beta.concern3*concern3[i] + b[i])
}
for (i in 1:1002) {
# Linear regression on logit
logit(p[i]) <- beta.concern2*x2[i,1] + beta.concern2*x2[i,2]
# Likelihood function for each data point
y2[i] ~ dbern(p[i])
}
s2<-1/tau
s <-sqrt(s2)
a2<-1/tau2
a <-sqrt(a2)
}
where x2 is a 1002*2 matrix and y is a vector
This is the R code definining the data:
combined.data <- list(n=n,tto=tto,concern2=concern2,
concern3=concern3,y2=y2, x2=x2)
Anyone know what is wrong?
I'm going to be making quite a few assumptions here...
Perhaps you could add a diagram illustrating the relationships between the variables, and which are deterministic vs stochastic. I find this helpful when making models in BUGS. Also, it would be helpful to have the dimensions of all your data, the meaning of n and perhaps some context or detail on what you're modelling and the nodes in which you're interested.
I'm guessing that y is a binary (0,1) vector of length 1002, and has corresponding values for x2[,1] and x2[,2] (herein x1, x2) and concern2, concern3 (herein c2, c3) and tto i.e.
nrow(x2) == 1002
Here's some sample data with of nrow==10 to work with:
y <- sample(x=c(0,1), size=10, replace=TRUE, prob=c(0.5,0.5))
x2 <- matrix(rnorm(20), nrow=10, ncol=2)
c2 <- rnorm(10)
c3 <- rnorm(10)
tto <- rnorm(10)
It appears that you're trying to determine the values of beta.concern2 (herein b2) for both values of x2 in the logit. Not sure why you'd want to fit it with the same parameter for two different predictors. In case this is a typo I'm giving b2 and b3 as parameters instead. I hope you'll be able to adapt this to your needs.
The product of these values of b2, b3 (stochastic) and c2, c3 (given) are used to generate a variable mu, which also has an error term. (I'm presuming b[i] (herein b1[i]) is a normally distributed error term.)
Then tto is a normally distributed variable which depends on the value of mu, and itself has an error term. I have set the precision of the error terms as being equal in both cases.
So for such a model:
require(rjags)
### The data
dataList <- list(
x1 = x2[,1],
x2 = x2[,2],
y = y,
c2 = c2,
c3 = c3,
tto = tto,
nRowX = nrow(x2)
)
### make sure logistic model can be fitted
f1 <- stats::glm(dataList$y ~ dataList$x1 + dataList$x2 -1, family=binomial(logit))
show(f1)
### set some approximate initial values
b1Init <- 0.1 # arbitrary
b2Init <- f1$coef[2]
b3Init <- f1$coef[3]
initsList <- list(
b1 = b1Init,
b2 = b2Init,
b3 = b3Init)
### Model: varying parameters (b2, b3) per observation; 2x error terms
modelstring <- "
model {
for(i in 1:nRowX){
tto[i] ~ dnorm(mu[i], prec)
mu[i] <- 1 - (b1 + b2*c2[i] + b3*c3[i])
y[i] ~ dbern(L[i]) # L for logit
L[i] <- 1/(1+exp(- ( b2*x1[i] + b3*x2[i]) ))
}
b1 ~ dnorm(0, prec) # precision
prec <- 1/sqrt(SD) # convert to Std Deviation
SD <- 0.5
b2 ~ dnorm(0, 1.4) # arbitrary
b3 ~ dnorm(0, 1.4)
}
"
writeLines(modelstring,con="model.txt")
parameters <- c("b1","b2","b3") # to monitor
adaptSteps <- 1e4 # "tune in" samplers
burnInSteps <- 2e4 # "burn in" samplers
nChains <- 3
numSavedSteps <-2e3
thinSteps <- 1 # Steps to "thin" (1=keep every step).
nPerChain <- ceiling(( numSavedSteps * thinSteps ) / nChains) # Steps per chain
rm(jagsModel) # in case already present
jagsModel <- rjags::jags.model(
"model.txt", data=dataList,
inits=initsList, n.chains=nChains,
n.adapt=adaptSteps)
stats::update(jagsModel, n.iter=burnInSteps)
### MCMC chain
MCMC1 <- as.matrix(rjags::coda.samples(
jagsModel, variable.names=parameters,
n.iter=nPerChain, thin=thinSteps))
### Extract chain values
b2Sample <- as.vector(MCMC1[,grep("b2",colnames(MCMC1))])

Resources