Bayesian Weighted Proportional-Odds Cumulative Logit model in JAGS - r

I am creating multilevel/mixed ordinal models in JAGS code, which are run through the “runjags” package interface in [R].
The dataset that I am analysing also includes a set of survey weights. These are important to include, in order to make the data representative of the population surveyed.
For a glm regression this can be achieved with the following code, which seeks to reduce the variance of extreme responses, bringing values closer to the expected distribution ("bias-variance tradeoff"):
y[i] ~ dnorm(fitted[i], (precision/weights[i]))
I would like to know the most appropriate way to include survey weights in a Proportional-Odds Cumulative Logit model.
Please see the JAGS code beneath for an example of the unweighted models that I have constructed so far. A working application for the displayed code and/or reference to a relevant citation would be most appreciated.
polr_model <- model{
## N = Number of observations
## ConditionCat = y-varaiable measured as 5-point Likert scale
## Categories = Number of y-variable categories (5)
## variable1 = numeric x-variable
## variable2 = numeric x-variable
## Geography = level-2 hierachical struture, in which observations are nested
## weights = survey weights (NOT CURRENTLY USED)
for(i in 1:N){
## Calculate categorical likelihood of the outcome:
ConditionCat[i] ~ dcat(prob[i,1:Categories])
## The odds is fixed between observation categories
## This is the same as any LR except that the intercept isnt included here:
odds[i] <- b1 * variable1[i] + b2 * variable2[i] +
randomeffect[Geography[i]]
## Each observation category is associated with a cumualtive probability called theta:
for(k in 1:(Categories-1)){
## Each observation category has its own intercept - note the minus sign here NOT plus:
logit(theta[i,k]) <- intercept[k] - odds[i]
}
## With the cumulative probability of the highest category equal to 1:
theta[i, Categories] <- 1
## Then calculate the non-cumulative probabilities from theta:
prob[i,1] <- theta[i,1]
for(k in 2:(Categories-1)){
prob[i,k] <- theta[i,k] - theta[i,(k-1)]
}
prob[i, Categories] <- 1 - theta[i,(Categories-1)]
}
## These lines give the prior distributions for the numeric fixed parameters to be estimated:
b1 ~ dnorm(0, 10^-6)
b2 ~ dnorm(0, 10^-6)
## Calculate the Categories-1 independent intercepts::
for(k in 1:(Categories-1)){
intercept[k] ~ dnorm(0, 10^-6)
}
## These lines give the prior distributions for the 'random' (hierachical/nested) parameters to be estimated:
for(iterator in 1:Nest1){
randomeffect[iterator] ~ dnorm(0, Geography_precision)}
Geography_precision ~ dgamma(0.001, 0.001)
#data# N, Nest1, Categories, ConditionCat, variable1, variable2, Geography
#monitor# intercept, b1, b2, randomeffect
#inits# Geography_precision, intercept, b1, b2
#modules# glm on, lecuyer on
}

Related

How to specify zero-inflated negative binomial model in JAGS

I'm currently working on constructing a zero-inflated negative binomial model in JAGS to model yearly change in abundance using count data and am currently a bit lost on how best to specify the model. I've included an example of the base model I'm using below. The main issue I'm struggling with is that in the model output I'm getting poor convergence (high Rhat values, low Neff values) and the 95% credible intervals are huge. I realize that without seeing/running the actual data there's probably not much anyone can help with but I thought I'd at least try and see if there are any obvious errors in the way I have the basic model specified. I also tried fitting a variety of other model types (regular negative binomial, Poisson, and zero-inflated Poisson) but decided to go with the ZINB since it had the lowest DIC scores of all the models and also makes the most intuitive sense to me, given my data structure.
library(R2jags)
# Create example dataframe
years <- c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2)
sites <- c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3)
months <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
# Count data
day1 <- floor(runif(18,0,7))
day2 <- floor(runif(18,0,7))
day3 <- floor(runif(18,0,7))
day4 <- floor(runif(18,0,7))
day5 <- floor(runif(18,0,7))
df <- as.data.frame(cbind(years, sites, months, day1, day2, day3, day4, day5))
# Put count data into array
y <- array(NA,dim=c(2,3,3,5))
for(m in 1:2){
for(k in 1:3){
sel.rows <- df$years == m &
df$months==k
y[m,k,,] <- as.matrix(df)[sel.rows,4:8]
}
}
# JAGS model
sink("model1.txt")
cat("
model {
# PRIORS
for(m in 1:2){
r[m] ~ dunif(0,50)
}
t.int ~ dlogis(0,1)
b.int ~ dlogis(0,1)
p.det ~ dunif(0,1)
# LIKELIHOOD
# ECOLOGICAL SUBMODEL FOR TRUE ABUNDANCE
for (m in 1:2) {
zero[m] ~ dbern(pi[m])
pi[m] <- ilogit(mu.binary[m])
mu.binary[m] <- t.int
for (k in 1:3) {
for (i in 1:3) {
N[m,k,i] ~ dnegbin(p[m,k,i], r)
p[m,k,i] <- r[m] / (r[m] + (1 - zero[m]) * lambda.count[m,k,i]) - 1e-10 * zero[m]
lambda.count[m,k,i] <- exp(mu.count[m,k,i])
log(mu.count[m,k,i]) <- b.int
# OBSERVATIONAL SUBMODEL FOR DETECTION
for (j in 1:5) {
y[m,k,i,j] ~ dbin(p.det, N[m,k,i])
}#j
}#i
}#k
}#m
}#END", fill=TRUE)
sink()
win.data <- list(y = y)
Nst <- apply(y,c(1,2,3),max)+1
inits <- function()list(N = Nst)
params <- c("N")
nc <- 3
nt <- 1
ni <- 50000
nb <- 5000
out <- jags(win.data, inits, params, "model1.txt",
n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
working.directory = getwd())
print(out)
Tried fitting a ZINB model in JAGS using the code specified above but am having issues with model convergence.
The way that I have tended to specify zero-inflated models is to model the data as being Poisson distributed with mean that is either zero if that individual is part of the zero-inflated group, or distributed according to a gamma distribution otherwise. Something like:
Obs[i] ~ dpois(lambda[i] * is_zero[i])
is_zero[i] ~ dbern(zero_prob)
lambda[i] ~ dgamma(k, k/mean)
Something similar to this was first used in this paper: https://www.researchgate.net/publication/5231190_The_distribution_of_the_pathogenic_nematode_Nematodirus_battus_in_lambs_is_zero-inflated
These models usually converge OK, although the performance is not as good as for simpler models of course. You also need to make sure to supply initial values for is_zero so that the model starts with all individuals with positive counts in the appropriate group.
In your case, you have multiple timepoints, so you need to decide if the zero-inflation is fixed over time points (i.e. an individual cannot switch to or from zero-inflated group over time), or if each observation is completely independent with respect to zero-inflation status. You also need to decide if you want to have co-variates of year/month/site affecting the mean count (i.e. the gamma part) or the probability of a positive count (i.e. the zero-inflation part). For the former, you need to index mean (in my formulation) by i and then use a GLM-like formula (probably using log link) to relate this to the appropriate covariates. For the latter, you need to index zero_prob by i and then use a GLM-like formula (probably using logit link) to relate this to the appropriate covariates. It is also possible to do both, but if you try to use the same covariates in both parts then you can expect convergence problems!
It would arguably be better to replace the separate Poisson-Gamma distributions with a single Negative Binomial distribution using the 'ecology parameterisation' with mean and k. This is not currently implemented in JAGS, but I will add it for the next update.

Find the parameter estimates for each random term in a binomial GLMM (lme4)?

Does anyone know how to extract the parameter estimates of random term when using the (1 | …) syntax in a glmer model (including se, t ratio and p value)? I’m only able to access the average variance and std. deviance with the summary function.
Some background: I used cohort and period random terms (both factorized), where period = each survey year, and cohort = 8 birth cohorts. My model empty model looks like this :
glmer(pid ~ age + age2 + (1 | cohort) + (1| period)
There's a bit of a conceptual problem with what you are doing. The random effects do not have the same standing in statistical theory as the fixed effects. You are not really supposed to be making inferences on their estimates since you don't have a random sampling from their overall population. Hence you need to make some unteseted assumptions on their distribution. That said, there are apparently times when you might want to do it but with care that you are not making unsupportable claims. See: https://stats.stackexchange.com/questions/392314/interpretation-of-fixed-effect-coefficients-from-glms-and-glmms .
Dimitris Rizopoulosthen responded to a request for the possibility of getting "an average" of the random effects conditional on the fixed effects (rather the flipped version of mixed models inference). He offered a function in his GLMM package:
https://drizopoulos.github.io/GLMMadaptive/articles/Methods_MixMod.html#marginalized-coefficients
This is his example ......
install.packages("GLMMadaptive"); library(GLMMadaptive)
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
#We continue by fitting the mixed effects logistic regression for y assuming random intercepts and random slopes for the random-effects part.
fm <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
family = binomial())
.... and then the call to his marginal_coefs function.
marginal_coefs(fm, std_errors=TRUE)
Estimate Std.Err z-value p-value
(Intercept) -1.6025 0.2906 -5.5154 < 1e-04
sexfemale -1.0975 0.3676 -2.9859 0.0028277
time 0.1766 0.0337 5.2346 < 1e-04
sexfemale:time 0.0508 0.0366 1.3864 0.1656167

Implement null distribution for gbm interaction strength

I am trying to determine which interactions in a gbm model are significant using the method described in Friedman and Popescu 2008 https://projecteuclid.org/euclid.aoas/1223908046. My gbm is a classification model with 9 different classes.
I'm struggling with how to translate Section 8.3 into code to run in R.
I think the overall process is to:
Train a version of the model with max.depth = 1
Simulate response data from this model
Train a new model on this data with max.depth the same as the real model
Get interaction strength for this model
Repeat steps 1-4 to create a null distribution of interaction strengths
The part that I am finding most confusing is implementing equations 48 and 49. (You will have to look at the linked article since I can't reproduce them here)
This is what I think I understand but please correct me if I'm wrong:
y_i is a new vector of the response that we will use to train a new model which will provide the null distribution of interaction statistics.
F_A(x_i) is the prediction from a version of the gbm model trained with max.depth = 1
b_i is a probability between 0 and 1 based on the prediction from the additive model F_A(x_i)
Questions
What is subscript i? Is it the number of iterations in the bootstrap?
How is each artificial data set different from the others?
Are we subbing the Pr(b_i = 1) into equation 48?
How can this be done with multinomial classification?
How would one implement this in R? Preferably using the gbm package.
Any ideas or references are welcome!
Overall, the process is an elegant way to neutralise the interaction effects in y by permuting/re-distributing the extra contribution of modelling on the interactions. The extra contribution could be captured by the margins between a full and an additive model.
What is subscript i? Is it the number of iterations in the bootstrap?
It is the index of samples. There are N samples in each iteration.
How is each artificial data set different from the others?
The predictors X are the same across data sets. The response values Y~ are different due to random permutation of margins in equation 47 and random realisation (for categorical outcomes only) in equation 48.
Are we subbing the Pr(b_i = 1) into equation 48?
Yes, if the outcome Y is binary.
How can this be done with multinomial classification?
One way is to randomly permute margins in the log-odds of each category. Followed by random realisation according to the probability from the additive model.
How would one implement this in R? Preferably using the gbm package.
I tried to implement it in-line with your overall process.
Firstly, a simulated training data set {X1,X2,Y} of size N=200 where Y has three categories (Y1,Y2,Y3) realised by the probabilities determined by X1, X2. The interaction part X1*X2 is in Y1, while the additive parts are in Y2,Y3.
set.seed(1)
N <- 200
X1 <- rnorm(N) # 2 predictors
X2 <- rnorm(N)
#log-odds for 3 categories
Y1 <- 2*X1*X2 + rnorm(N, sd=1/10) # interaction
Y2 <- X1^2 + rnorm(N, sd=1/10) #additive
Y3 <- X2^2 + rnorm(N, sd=1/10) #additive
Y <- rep(NA, N) # Multinomial outcome with 3 categories
for (i in 1:N)
{
prob <- 1 / (1 + exp(-c(Y1[i],Y2[i],Y3[i]))) #logistic regression
Y[i] <- which.max(rmultinom(1, 10000, prob=prob)) #realisation from prob
}
Y <- factor(Y)
levels(Y) <- c('Y1','Y2','Y3')
table(Y)
#Y1 Y2 Y3
#38 75 87
dat = data.frame(Y, X1, X2)
head(dat)
# Y X1 X2
# 2 -0.6264538 0.4094018
# 3 0.1836433 1.6888733
# 3 -0.8356286 1.5865884
# 2 1.5952808 -0.3309078
# 3 0.3295078 -2.2852355
# 3 -0.8204684 2.4976616
Train a full and an additive models with max.depth = 2 and 1 respectively.
library(gbm)
n.trees <- 100
F_full <- gbm(Y ~ ., data=dat, distribution='multinomial', n.trees=n.trees, cv.folds=3,
interaction.depth=2) # consider interactions
F_additive <- gbm(Y ~ ., data=dat, distribution='multinomial', n.trees=n.trees, cv.folds=3,
interaction.depth=1) # ignore interactions
#use improved prediction as interaction strength
interaction_strength_original <- min(F_additive$cv.error) - min(F_full$cv.error)
> 0.1937891
Simulate response data from this model.
#randomly permute margins (residuals) of log-odds to remove any interaction effects
margin <- predict(F_full, n.trees=gbm.perf(F_full, plot.it=FALSE), type='link')[,,1] -
predict(F_additive, n.trees=gbm.perf(F_additive, plot.it=FALSE), type='link')[,,1]
margin <- apply(margin, 2, sample) #independent permutation for each category (Y1, Y2, Y3)
Y_art <- rep(NA, N) #response values of an artificial dataset
for (i in 1:N)
{
prob <- predict(F_additive, n.trees=gbm.perf(F_additive, plot.it=FALSE), type='link',
newdata=dat[i,])
prob <- prob + margin[i,] # equation (47)
prob <- 1 / (1 + exp(-prob))
Y_art[i] <- which.max(rmultinom(1, 1000, prob=prob)) #Similar to random realisation in equation (49)
}
Y_art <- factor(Y_art)
levels(Y_art) = c('Y1','Y2','Y3')
table(Y_art)
#Y1 Y2 Y3
#21 88 91
Train a new model on this artificial data with max.depth (2) the same as the real model
F_full_art = gbm(Y_art ~ ., distribution='multinomial', n.trees=n.trees, cv.folds=3,
data=data.frame(Y_art, X1, X2),
interaction.depth=2)
F_additive_art = gbm(Y_art ~ ., distribution='multinomial', n.trees=n.trees, cv.folds=3,
data=data.frame(Y_art, X1, X2),
interaction.depth=1)
Get interaction strength for this model
interaction_strength_art = min(F_additive_art$cv.error) - min(F_full_art$cv.error)
> 0.01323959 # much smaller than interaction_strength_original in step 1.
Repeat steps 2-4 to create a null distribution of interaction strengths. As expected, the interaction effects are much lower (-0.0527 to 0.0421) in the neutralised data sets than in the original training data set (0.1938).
interaction_strength_art <- NULL
for (j in 1:10)
{
print(j)
interaction_strength_art <- c(interaction_strength_art, step_2_to_4())
}
summary(interaction_strength_art)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#-0.052648 -0.019415 0.001124 -0.004310 0.012759 0.042058
interaction_strength_original
> 0.1937891

Population-level prediction from bam {mgcv}

Using bam, I made a logistic mixed model with the following form:
PresAbs ~ s(Var 1) + s(Var 2) + ... + s(Var n) + s(RandomVar, bs = "re")
The RandomVar is a factor and I am not interested in the predictions for each of its level. How can I obtain population-level prediction, comparable to predict.lme?
One way is just exclude the random effect spline from the predictions.
Using the example from ?gam.models
library("mgcv")
dat <- gamSim(1,n=400,scale=2) ## simulate 4 term additive truth
## Now add some random effects to the simulation. Response is
## grouped into one of 20 groups by `fac' and each groups has a
## random effect added....
fac <- as.factor(sample(1:20,400,replace=TRUE))
dat$X <- model.matrix(~fac-1)
b <- rnorm(20)*.5
dat$y <- dat$y + dat$X%*%b
m1 <- gam(y ~ s(fac,bs="re")+s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="ML")
we want to exclude the term s(fac) as it is written in the output from
summary(m1)
For the observed data, population effects are
predict(m1, exclude = 's(fac)')
but you can supply newdata to generate predictions for other combinations of the covariates.

Regression with weights: Less standardized residuals then observations

I modelled a multiple Regression based on the Mincer-Wage-Equation and I added a weighting-factor to make it representative for the whole population.
But when I'm adding the weights function into my modell, R calculates less standardized residuals than I have observations.
Here's my modell:
lm(log(earings) ~ Gender + Age + Age^2 + Education, weights= phrf)
So I got problems to analyze the residuals because when I'm trying to plot the rstandard against the fitted.values R is telling: Different Variable Length in rstandard() found.
This Problem ist only by rstandard and rstudent, when I'm plotting the normal resid() against fitted.values there is no problem.
And when I'm leaving out the weights function I have not problems, too.
In the help file for rstudent():
Note that cases with weights == 0 are dropped from all these functions, but that if a linear model has been fitted with na.action = na.exclude, suitable values are filled in for the cases excluded during fitting.
A simple example to demonstrate:
set.seed(123)
x <- 1:100
y <- x + rnorm(100)
w <- runif(100)
w[44] <- 0
fit <- lm(y ~ x, weights=w)
length(fitted(fit))
length(rstudent(fit))
Gives:
> length(fitted(fit))
[1] 100
> length(rstudent(fit))
[1] 99
And this makes sense. If you have a weight of 0, the theoretical variance is 0 which is an infinite studentized or standardized residual.
Since you are effectively deleting those observations, you can subset the call to lm with subset=w!=0 or you can use that flag for the fitted values:
plot(fitted(fit)[w!=0], rstudent(fit))

Resources