Logistic regression when response is a proportion (using JAGS) - r

I am trying to fit a logistic regression model in JAGS, but I have data in the form of (# success y, # attempts n), rather than a binary variable. In R, one can fit a model to data such as these by using glm(y/n ~ ) with the "weights" argument, but I am not sure how to fit this in JAGS.
Here is a simple example that I hope addresses what I am trying to ask. Note that I am using the rjags package. Thanks for any help!
y <- rbinom(10, 500, 0.2)
n <- sample(500:600, 10)
p <- y/n
x <- sample(0:100, 10) # some covariate
data <- data.frame(y, n, p, x)
model <- "model{
# Specify likelihood
for(i in 1:10){
y[i] ~ dbin(p[i], n[i])
logit(p[i]) <- b0 + b1*x
}
# Specify priors
b0 ~ dnorm(0, 0.0001)
b1 ~ dnorm(0, 0.0001)
}"

You don't need to compute p in your data set at all. Just let it be a logical node in your model. I prefer the R2jags interface, which allows you to specify a BUGS model in the form of an R function ...
jagsdata <- data.frame(y=rbinom(10, 500, 0.2),
n=sample(500:600, 10),
x=sample(0:100, 10))
model <- function() {
## Specify likelihood
for(i in 1:10){
y[i] ~ dbin(p[i], n[i])
logit(p[i]) <- b0 + b1*x[i]
}
## Specify priors
b0 ~ dnorm(0, 0.0001)
b1 ~ dnorm(0, 0.0001)
}
Now run it:
library("R2jags")
jags(model.file=model,data=jagsdata,
parameters.to.save=c("b0","b1"))

Related

Jags: Attempt to redefine node error, mixed effect regression

I want to perform a mixed effect regression in rjags, with a random slope and intercept. I define the following toy dataset:
library(ggplot2)
library(data.table)
global_slope <- 1
global_int <- 1
Npoints_per_group <- 50
N_groups <- 10
pentes <- rnorm(N_groups,-1,.5)
centers_x <- seq(0,10,length = N_groups)
center_y <- global_slope*centers_x + global_int
group_spread <- 2
group_names <- sample(LETTERS,N_groups)
df <- lapply(1:N_groups,function(i){
x <- seq(centers_x[i]-group_spread/2,centers_x[i]+group_spread/2,length = Npoints_per_group)
y <- pentes[i]*(x- centers_x[i])+center_y[i]+rnorm(Npoints_per_group)
data.table(x = x,y = y,ID = group_names[i])
}) %>% rbindlist()
ggplot(df,aes(x,y,color = as.factor(ID)))+
geom_point()
This is a typical situation of Simpson paradox: an overall increasing trend when you have a decreasing trend within each group (given by the ID variable).
I define the following model:
library(rjags)
model_code_simpson <-
" model
{
# first level
for (i in 1:n) {
y[i] ~ dnorm(alpha[i] + beta[i] * x[i], tau)
alpha[i] = alpha[group[i]] # random intercept
beta[i] = beta[group[i]] # random slope
}
# second level
for(j in 1:J){
alpha[j] ~ dnorm(mu.alpha, tau.alpha)
beta[j] ~ dnorm(mu.beta, tau.beta)
}
# Priors
mu.alpha ~ dnorm(0,0.001)
mu.beta ~ dnorm(0,0.001)
sigma ~ dunif(0,10)
sigma.alpha ~ dunif(0,10)
sigma.beta ~ dunif(0,10)
# Derived quantities
tau <- pow(sigma,-2)
tau.alpha <- pow(sigma.alpha,-2)
tau.beta <- pow(sigma.beta,-2)
}
"
# Choose the parameters to watch
model_parameters <- c("mu.alpha","tau.alpha","tau.beta","tau")
# define numeric grouping variable
df[,ID2 := .GRP,by = ID]
model_data <- list(n = nrow(df),
y = df$y,
x = df$x,
group = df$ID2,
J = df[,uniqueN(ID)])
model <- jags.model(textConnection(model_code_simpson),
data = model_data,
n.chains = 2)
I get the following error:
Compiling model graph
Resolving undeclared variables
Allocating nodes
Deleting model
Error in jags.model(textConnection(model_code_simpson), data = model_data, :
RUNTIME ERROR:
Compilation error on line 8.
Attempt to redefine node beta[1]
I do not understand what is happening, and related questions did not help me much.
You defined beta twice. First, beta is a vector of length n when you are looping through the data. Second, beta is a vector of length J when you are creating the random effects. This "redefining" is causing this issue, but it is an easy fix. You just need to remove that first instance of beta in your model and it will compile (i.e., just move your nested indexing inside of dnorm() and you are good to go).
model_code_simpson <-
" model
{
# first level
for (i in 1:n) {
y[i] ~ dnorm(
alpha[group[i]] + beta[group[i]] * x[i],
tau
)
}
# second level
for(j in 1:J){
alpha[j] ~ dnorm(mu.alpha, tau.alpha)
beta[j] ~ dnorm(mu.beta, tau.beta)
}
# Priors
mu.alpha ~ dnorm(0,0.001)
mu.beta ~ dnorm(0,0.001)
sigma ~ dunif(0,10)
sigma.alpha ~ dunif(0,10)
sigma.beta ~ dunif(0,10)
# Derived quantities
tau <- pow(sigma,-2)
tau.alpha <- pow(sigma.alpha,-2)
tau.beta <- pow(sigma.beta,-2)
}
"

How to specify nested model

I am using runjags to model some hierarchical data. I can model one level of the hierarchy but I do not know how to extend it to more levels. I am trying to do this using method 3 from page 24 of "Bayesian Hierarchical Modelling using WinBUGS", by Nicky Best et al which uses a nested loop (as opposed to nested indexing).
For one level I can model
filestring <-
"model{
for(j in 1:Ninner){
for(i in 1:N){
y[j,i] ~ dnorm(beta + alpha[j], py)
}
alpha[j] ~ dnorm(0, taua)
}
beta ~ dnorm(0, 0.001)
taua ~ dgamma(0.01, 0.01)
py ~ dgamma(0.01, 0.1)
}"
INITS <- list(list(.RNG.seed=1, .RNG.name="base::Wichmann-Hill"),
list(.RNG.seed=2, .RNG.name="base::Wichmann-Hill"))
results <- run.jags(filestring, monitor=c("py", "beta", "alpha"), data=jags_data, sample=1e3,
n.chains=2, inits=INITS, summarise=FALSE)
I then tried to add another level using
for(k in 1:Nouter){
for(j in 1:Ninner){
for(i in 1:N){
y[j,i] ~ dnorm(beta + alpha_in[j] + alpha_out[k], py)
} } }
but receive the error
Compilation error on line 5.
Attempt to redefine node y[1,1]
How do I extend this to model another level of which the first one is nested? Thank you.
Below is some reproducible data which shows the structure of the data. I wish to estimate random estimates for both outer_grp and the inner_grp.
library(data.table)
library(runjags)
set.seed(12345)
dat <- data.table(outer_grp=rep(1:5, each=10), inner_grp=rep(1:10, each=5), y=rnorm(50), x=rnorm(50), time=1:5)
wdat = dcast(dat, inner_grp + outer_grp ~ time, value.var=c("y", "x"))
jags_data = c(setNames(
lapply(split.default(wdat, substr(names(wdat), 1, 1)),as.matrix),
c("inner_grp", "outer_grp","x", "y")),
N=5, Nouter=5, Ninner=10)
EDIT
Perhaps it is enough to model like??
filestring <-
"model{
for(j in 1:Ninner){
for(i in 1:N){
y[j,i] ~ dnorm(beta + alpha_in[j] + alpha_out[outer_grp[j]], py)
}
}
for(i in 1:Ninner){ alpha_in[i] ~ dnorm(0, taua) }
for(i in 1:Nouter){ alpha_out[i] ~ dnorm(0, taub) }
beta ~ dnorm(0, 0.001)
taua ~ dgamma(0.01, 0.01)
taub ~ dgamma(0.01, 0.01)
py ~ dgamma(0.01, 0.1)
}"
It is possible to add the outer group intercept by using nested indexing while still using the loop format. I'll use the Pastes dataset from lme4 for comparison.
filestring <-
"model{
for(j in 1:Ninner){
for(i in 1:N){
y[j,i] ~ dnorm(beta + alpha_in[j] + alpha_out[batch[j]], py)
}
}
for(i in 1:Ninner){ alpha_in[i] ~ dnorm(0, taua) }
for(i in 1:Nouter){ alpha_out[i] ~ dnorm(0, taub) }
beta ~ dnorm(0, 0.001)
taua <- 1/(sa*sa)
sa ~ dunif(0,100)
taub <- 1/(sb*sb)
sb ~dunif(0,100)
py ~ dgamma(0.001, 0.001)
}"
INITS <- list(list(.RNG.seed=1, .RNG.name="base::Wichmann-Hill"),
list(.RNG.seed=2, .RNG.name="base::Wichmann-Hill"))
results <- run.jags(filestring, monitor=c("py", "beta", "alpha_in", "alpha_out", "sa", "sb"),
data=jags_data, burnin=1e4, sample=1e4, n.chains=2,
inits=INITS, summarise=0)
summary(results, vars=c("py", "beta", "sa", "sb"))
Compare to lme4
fm1 <- lmer(strength ~ (1|batch) + (1|sample), Pastes)
print(summary(fm1), corr=FALSE)
Data used
library(lme4); library(data.table); library(runjags)
data(Pastes); setDT(Pastes)
Pastes[,time := sequence(.N), by=sample]
# Change format to match question
wdat = dcast(Pastes, batch + sample ~ time, value.var="strength")
jags_data = list(y=as.matrix(wdat[,3:4]), batch=wdat$batch, N=2, Ninner=length(unique(wdat$sample)), Nouter=length(unique(wdat$batch)))

JAGS and R: Obtain posterior predictive distribution for specific x

I am trying to obtain a posterior predictive distribution for specified values of x from a simple linear regression in Jags. I could get the regression itself to work by adapting this example (from https://biometry.github.io/APES//LectureNotes/StatsCafe/Linear_models_jags.html) to my own data. I have supplied a smal chunk of this data here so that the code works here also.
library(rjags)
library(R2jags)
#create data
dw=c(-15.2,-13.0,-10.0,-9.8,-8.5,-8.5,-7.7,-7.5,-7.2,-6.1,-6.1,-6.1,-5.5,-5.0,-5.0,-5.0,-4.5,-4.0,-2.0,-1.0,1.3)
phos=c(11.8,12.9,15.0,14.4,17.3,16.1,20.8,16.6,16.2,18.2,18.8,19.2,15.6,17.0,18.9,22.1,18.9,22.8,21.6,20.5,21.1)
#convert to list
jagsdwphos=list(dw=dw,phos=phos,N=length(phos))
#write model function for linear regression
lm1_jags <- function(){
# Likelihood:
for (i in 1:N){
phos[i] ~ dnorm(mu[i], tau) # tau is precision (1 / variance)
mu[i] <- intercept + slope * dw[i]
}
# Priors:
intercept ~ dnorm(0, 0.01)
slope ~ dnorm(0, 0.01)
sigma ~ dunif(0, 100) # standard deviation
tau <- 1 / (sigma * sigma)
}
#specifiy paramters of MCMC sampler, choose posteriors to be reported and run the jags model
#set initial values for MCMC
init_values <- function(){
list(intercept = rnorm(1), slope = rnorm(1), sigma = runif(1))
}
#choose paramters to report on
params <- c("intercept", "slope", "sigma")
#run model in jags
lm_dwphos <- jags(data = jagsdwphos, inits = init_values, parameters.to.save = params, model.file = lm1_jags,
n.chains = 3, n.iter = 12000, n.burnin = 2000, n.thin = 10, DIC = F)
In addition to this regression, I would like to have an output of the posterior predictive distributions of particular phos values, but I cannot get it to work with this simple example I have written. I found a tutorial here https://doingbayesiandataanalysis.blogspot.com/2015/10/posterior-predicted-distribution-for.html and tried to implement it like this:
#create data
dw=c(-15.2,-13.0,-10.0,-9.8,-8.5,-8.5,-7.7,-7.5,-7.2,-6.1,-6.1,-6.1,-5.5,-5.0,-5.0,-5.0,-4.5,-4.0,-2.0,-1.0,1.3)
phos=c(11.8,12.9,15.0,14.4,17.3,16.1,20.8,16.6,16.2,18.2,18.8,19.2,15.6,17.0,18.9,22.1,18.9,22.8,21.6,20.5,21.1)
#specifiy phos values to use for posterior predictive distribution
phosprobe=c(14,18,22)
#convert to list
jagsdwphos=list(dw=dw,phos=phos,N=length(phos),xP=phosprobe)
#write model function for linear regression
lm1_jags <- function(){
# Likelihood:
for (i in 1:N){
phos[i] ~ dnorm(mu[i], tau) # tau is precision (1 / variance)
mu[i] <- intercept + slope * dw[i]
}
# Priors:
intercept ~ dnorm(0, 0.01) # intercept
slope ~ dnorm(0, 0.01) # slope
sigma ~ dunif(0, 100) # standard deviation
tau <- 1 / (sigma * sigma) # sigma^2 doesn't work in JAGS
nu <- nuMinusOne+1
nuMinusOne ~ dexp(1/29.0)
#prediction
for(i in 1:3){
yP ~ dt(intercept+slope*xP[i],tau,nu)
}
}
#specifiy paramters of MCMC sampler, choose posteriors to be reported and run the jags model
#set initial values for MCMC
init_values <- function(){
list(intercept = rnorm(1), slope = rnorm(1), sigma = runif(1))
}
#choose paramters to report on
params <- c("intercept", "slope", "sigma","xP","yP")
#run model in jags
lm_dwphos <- jags(data = jagsdwphos, inits = init_values, parameters.to.save = params, model.file = lm1_jags,
n.chains = 3, n.iter = 12000, n.burnin = 2000, n.thin = 10, DIC = F)
But I get the following error message:
Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, : RUNTIME ERROR: Compilation error on line 14. Attempt to redefine node yP[1]
I confess I don't quite understand exactly how the prediction was implemented in that example I used and could not find an explanation on what exactly nu is or where those numbers come from. So I presume that is where I made some mistake adapting to my example, but it was the only tutorial in Jags that I could find that gives the whole distribution of y values for the probed x instead of just the mean.
I would appreciate any help or explanation.
Thanks!
This error occurs because you are not indexing yP. You have written this loop like this:
#prediction
for(i in 1:3){
yP ~ dt(intercept+slope*xP[i],tau,nu)
}
As i moves from 1 to 3 the element yP is being written over. You need to index it like you have done with xP.
#prediction
for(i in 1:3){
yP[i] ~ dt(intercept+slope*xP[i],tau,nu)
}

Constrain order of parameters in R JAGS

I am puzzled by a simple question in R JAGS. I have for example, 10 parameters: d[1], d[2], ..., d[10]. It is intuitive from the data that they should be increasing. So I want to put a constraint on them.
Here is what I tried to do but it give error messages saying "Node inconsistent with parents":
model{
...
for (j in 1:10){
d.star[j]~dnorm(0,0.0001)
}
d=sort(d.star)
}
Then I tried this:
d[1]~dnorm(0,0.0001)
for (j in 2:10){
d[j]~dnorm(0,0.0001)I(d[j-1],)
}
This worked, but I don't know if this is the correct way to do it. Could you share your thoughts?
Thanks!
If you are ever uncertain about something like this, it is best to just simulate some data to determine if the model structure you suggest works (spoiler alert: it does).
Here is the model that I used:
cat('model{
d[1] ~ dnorm(0, 0.0001) # intercept
d[2] ~ dnorm(0, 0.0001)
for(j in 3:11){
d[j] ~ dnorm(0, 0.0001) I(d[j-1],)
}
for(i in 1:200){
y[i] ~ dnorm(mu[i], tau)
mu[i] <- inprod(d, x[i,])
}
tau ~ dgamma(0.01,0.01)
}',
file = "model_example.R")```
And here are the data I simulated to use with this model.
library(run.jags)
library(mcmcplots)
# intercept with sorted betas
set.seed(161)
betas <- c(1,sort(runif(10, -5,5)))
# make covariates, 1 for intercept
x <- cbind(1,matrix(rnorm(2000), nrow = 200, ncol = 10))
# deterministic part of model
y_det <- x %*% betas
# add noise
y <- rnorm(length(y_det), y_det, 1)
data_list <- list(y = as.numeric(y), x = x)
# fit the model
mout <- run.jags('model_example.R',monitor = c("d", "tau"), data = data_list)
Following this, we can plot out the estimates and overlay the true parameter values
caterplot(mout, "d", reorder = FALSE)
points(rev(c(1:11)) ~ betas, pch = 18,cex = 0.9)
The black points are the true parameter values, the blue points and lines are the estimates. Looks like this set up does fine so long as there are enough data to estimate all of those parameters.
It looks like there is an syntax error in the first implementation. Just try:
model{
...
for (j in 1:10){
d.star[j]~dnorm(0,0.0001)
}
d[1:10] <- sort(d.star) # notice d is indexed.
}
and compare the results with those of the second implementation. According to the documentation, these are both correct, but it is advised to use the function sort.

JAGS: unit-specific time trends

Using JAGS I am trying to estimate a model including a unit-specific time trend.
However, the problem is that I don't know how to model this and so far I have been unable to find a solution.
As an example, consider we have the following data:
rain<-rnorm(200) # Explanatory variable
n1<-rnorm(200) # Some noise
gdp<-rain+n1 # Outcome variable
ccode<-rep(1:10,20) # Unit codes
year<-rep(1:20,10) # Years
Using normal linear regression, we would estimate the model as:
m1<-lm(gdp~rain+factor(ccode)*year)
Where factor(ccode)*year is the unit-specific time trend. Now I want to estimate the model using JAGS. So I create parameters for the indexing:
N<-200
J<-max(ccode)
T<-max(year)
And estimate the model,
library(R2jags)
library(rjags)
set.seed(42); runif(1)
dat<-list(gdp=gdp,
rain=rain,
ccode=ccode,
year=year,
N=N,J=J,T=T)
parameters<-c("b1","b0")
model.file <- "~/model.txt"
system.time(m1<-jags(data=dat,inits=NULL,parameters.to.save=parameters,
model.file=model.file,
n.chains=4,n.iter=500,n.burnin=125,n.thin=2))
with the following model file, and this is where the error is at the moment:
# Simple model
model {
# For N observations
for(i in 1:N) {
gdp[i] ~ dnorm(yhat[i], tau)
yhat[i] <- b1*rain[i] + b0[ccode[i]*year[i]]
}
for(t in 1:T) {
for(j in 1:J) {
b0[t,j] ~ dnorm(0, .01)
}
}
# Priors
b1 ~ dnorm(0, .01)
# Hyperpriors
tau <- pow(sd, -2)
sd ~ dunif(0,20)
}
I am fairly sure that the way in which I define b0 and the indexing is incorrect given the error I get when using the code: Compilation error on line 7. Dimension mismatch taking subset of b0.
However, I don't know how to solve this so I wondered whether someone here has suggestions about this?
Your lm example can also be written:
m1 <- lm(gdp ~ -1 + rain + factor(ccode) + factor(ccode):year)
The equivalent JAGS model would be:
M <- function() {
for(i in 1:N) {
gdp[i] ~ dnorm(yhat[i], sd^-2)
yhat[i] <- b0[ccode[i]] + b1*rain[i] + b2[ccode[i]]*year[i]
}
b1 ~ dnorm(0, 0.001)
for (j in 1:J) {
b0[j] ~ dnorm(0, 0.001)
b2[j] ~ dnorm(0, 0.001)
}
sd ~ dunif(0, 100)
}
parameters<-c('b0', 'b1', 'b2')
mj <- jags(dat, NULL, parameters, M, 3)
Comparing coefficients:
par(mfrow=c(1, 2), mar=c(5, 5, 1, 1))
plot(mj$BUGSoutput$summary[grep('^b0', row.names(mj$BUGSoutput$summary)), '50%'],
coef(m1)[grep('^factor\\(ccode\\)\\d+$', names(coef(m1)))],
xlab='JAGS estimate', ylab='lm estimate', pch=20, las=1,
main='b0')
abline(0, 1)
plot(mj$BUGSoutput$summary[grep('^b2', row.names(mj$BUGSoutput$summary)), '50%'],
coef(m1)[grep('^factor\\(ccode\\)\\d+:', names(coef(m1)))],
xlab='JAGS estimate', ylab='lm estimate', pch=20, las=1,
main='b2')
abline(0, 1)

Resources