Latent variable estimation with binary and continuous indicators in JAGS - r

I am using R2jags to estimate latent factor using both binary and continuous variables. I was wondering wether jags command from R2jags can handle both binary and continuous variables at the same time. I tried to find articles explain the types of variables jags can handle, but I was not able to find any.
If jags can account both binary and continuous in the same model, what is the correct way to specify the types of variables when creating a model?
Thanks for your help in advance.

Here's an example. First, I'll make some data:
library(R2jags)
set.seed(207)
z <- rnorm(250)
y_num <- apply(matrix(rnorm(750,0, .25), ncol=3), 2, \(x)x+z)
y_bin <- apply(matrix(rnorm(750,0, .25), ncol=3), 2, \(x)plogis(x+z))
y_bin <- apply(y_bin, 2, \(x)rbinom(length(x), 1, x))
y_num <- scale(y_num)
Next, we put the data in a list to feed to JAGS:
datl <- list(
N = nrow(y_num),
y_num = y_num,
y_bin = y_bin,
a0=c(0,0),
A0 = diag(2)*.1
)
Next, we can write the model. Here, we are using a Bernoulli distribution to fit the binary variables and a normal distribution to fit the continuous variables (Note, the code below will write the model code to a file in your working directory):
jags.mod <- "model{
for(i in 1:N){
for(j in 1:3){
y_num[i,j] ~ dnorm(mu[i,j], tau[j])
mu[i,j] <- b[j]*zeta[i]
}
for(j in 1:3){
y_bin[i,j] ~ dbern(p[i,j])
logit(p[i,j]) <- a[1,j] + a[2,j]*zeta[i]
}
zeta[i] ~ dnorm(0,1)
}
b[1] <- 1
b[2] ~ dnorm(0,.1)T(0, )
b[3] ~ dnorm(0,.1)T(0, )
for(j in 1:3){
a[1:2, j] ~ dmnorm(a0[1:2], A0[1:2,1:2])
tau[j] ~ dt(0,.1, 1)T(0,)
}
}"
cat(jags.mod, file="jags_model.txt")
Now we can run the model:
jgs <- jags(datl,
inits=NULL,
parameters.to.save = c("a", "b", "tau", "zeta"),
model.file = "jags_model.txt")
#> module glm loaded
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 1500
#> Unobserved stochastic nodes: 258
#> Total graph size: 4776
#>
#> Initializing model
Finally, since we made the data, we know the true latent variable, so we can evaluate the similarity between the estimated latent variable and the true latent variable:
zeta_hat <- jgs$BUGSoutput$summary[,1]
zeta_hat <- zeta_hat[grep("zeta", names(zeta_hat))]
# correlate estimated latent variable (zeta_hat) with true latent variable (z).
cor(zeta_hat, z)
#> [1] 0.99146
Created on 2022-10-19 by the reprex package (v2.0.1)

Related

"non-conforming parameters in function :" in simple linear regression using JAGS

I am super new to JAGS and Bayesian statistics, and have simply been trying to follow the Chapter 22 on Bayesian statistics in Crawley's 2nd Edition R Book. I copy the code down exactly as it appears in the book for the simple linear model: growth = a + b *tannin, where there are 9 rows of two continuous variables: growth and tannins. The data and packages are this:
install.packages("R2jags")
library(R2jags)
growth <- c(12,10,8,11,6,7,2,3,3)
tannin <- c(0,1,2,3,4,5,6,7,8)
N <- c(1,2,3,4,5,6,7,8,9)
bay.df <- data.frame(growth,tannin,N)
The ASCII file looks like this:
model{
for(i in 1:N) {
growth[i] ~ dnorm(mu[i],tau)
mu[i] <- a+b*tannin[i]
}
a ~ dnorm(0.0, 1.0E-4)
b ~ dnorm(0.0, 1.0E-4)
sigma <- 1.0/sqrt(tau)
tau ~ dgamma(1.0E-3, 1.0E-3)
}
But then, when I use this code:
> practicemodel <- jags(data=data.jags,parameters.to.save = c("a","b","tau"),
+ n.iter=100000, model.file="regression.bugs.txt", n.chains=3)
I get an error message that says:
module glm loaded
Compiling model graph
Resolving undeclared variables
Deleting model
Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, :
RUNTIME ERROR:
Non-conforming parameters in function :
The problem has been solved!
Basically the change is from N <- (1,2...) to N <- 9, but there is one other solution as well, where no N is specified in the beginning. You can specify N inside the data.jags function as the number of rows in the data frame; data.jags = list(growth=bay.df$growth, tannin=bay.df$tannin, N=nrow(bay.df)).
Here is the new code:
# Make the data frame
growth <- c(12,10,8,11,6,7,2,3,3)
tannin <- c(0,1,2,3,4,5,6,7,8)
# CHANGED : This is for the JAGS code to know there are 9 rows of data
N <- 9 code
bay.df <- data.frame(growth,tannin)
library(R2jags)
# Now, write the Bugs model and save it in a text file
sink("regression.bugs.txt") #tell R to put the following into this file
cat("
model{
for(i in 1:N) {
growth[i] ~ dnorm(mu[i],tau)
mu[i] <- a+b*tannin[i]
}
a ~ dnorm(0.0, 1.0E-4)
b ~ dnorm(0.0, 1.0E-4)
sigma <- 1.0/sqrt(tau)
tau ~ dgamma(1.0E-3, 1.0E-3)
}
", fill=TRUE)
sink() #tells R to stop putting things into this file.
#tell jags the names of the variables containing the data
data.jags <- list("growth","tannin","N")
# run the JAGS function to produce the function:
practicemodel <- jags(data=data.jags,parameters.to.save = c("a","b","tau"),
n.iter=100000, model.file="regression.bugs.txt", n.chains=3)
# inspect the model output. Important to note that the output will
# be different every time because there's a stochastic element to the model
practicemodel
# plots the information nicely, can visualize the error
# margin for each parameter and deviance
plot(practicemodel)
Thanks for the help! I hope this helps others.

How to deal with "Non-conforming parameters with inprod function" in JAGS model

I am trying to model the variance in overall species richness with the habitat covariates of a camera trapping station using R2jags. However, I keep getting the error:
"Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, :
RUNTIME ERROR:
Non-conforming parameters in function inprod"
I used a very similar function in my previous JAGS model (to find the species richness) so I am not sure why it is not working now...
I have already tried formatting the covariates within the inprod function in different ways, as a data frame and a matrix, to no avail.
Variable specification:
J=length(ustations) #number of camera stations
NSite=Global.Model$BUGSoutput$sims.list$Nsite
NS=apply(NSite,2,function(x)c(mean(x)))
###What I think is causing the problem:
COV <- data.frame(as.numeric(station.cov$NDVI), as.numeric(station.cov$TRI), as.numeric(station.cov$dist2edge), as.numeric(station.cov$dogs), as.numeric(station.cov$Leopard_captures))
###but I have also tried:
COV <- cbind(station.cov$NDVI, station.cov$TRI, station.cov$dist2edge, station.cov$dogs, station.cov$Leopard_captures)
JAGS model:
sink("Variance_model.txt")
cat("model {
# Priors
Y ~ dnorm(0,0.001) #Mean richness
X ~ dnorm(0,0.001) #Mean variance
for (a in 1:length(COV)){
U[a] ~ dnorm(0,0.001)} #Variance covariates
# Likelihood
for (i in 1:J) {
mu[i] <- Y #Hyper-parameter for station-specific all richness
NS[i] ~ dnorm(mu[i], tau[i]) #Likelihood
tau[i] <- (1/sigma2[i])
log(sigma2[i]) <- X + inprod(U,COV[i,])
}
}
", fill=TRUE)
sink()
var.data <- list(NS = NS,
COV = COV,
J=J)
Bundle data:
# Inits function
var.inits <- function(){list(
Y =rnorm(1),
X =rnorm(1),
U =rnorm(length(COV)))}
# Parameters to estimate
var.params <- c("Y","X","U")
# MCMC settings
nc <- 3
ni <-20000
nb <- 10000
nthin <- 10
Start Gibbs sampler:
jags(data=var.data,
inits=var.inits,
parameters.to.save=var.params,
model.file="Variance_model.txt",
n.chains=nc,n.iter=ni,n.burnin=nb,n.thin=nthin)
Ultimately, I get the error:
Compiling model graph
Resolving undeclared variables
Allocating nodes
Deleting model
Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, :
RUNTIME ERROR:
Non-conforming parameters in function inprod
In the end, I would like to calculate the mean and 95% credible interval (BCI) estimates of the habitat covariates hypothesized to influence the variance in station-specific (point-level) species richness.
Any help would be greatly appreciated!
It looks like you are using length to generate the priors for U. In JAGS this function will return the number of elements in a node array. In this case, that would be the number of rows ins COV multiplied by the number of columns.
Instead, I would supply a scalar to your data list that you supply to jags.model.
var.data <- list(NS = NS,
COV = COV,
J=J,
ncov = ncol(COV)
)
Following this, you can modify your JAGS code where you are generating your priors for U. The model would then become:
sink("Variance_model.txt")
cat("model {
# Priors
Y ~ dnorm(0,0.001) #Mean richness
X ~ dnorm(0,0.001) #Mean variance
for (a in 1:ncov){ # THIS IS THE ONLY LINE OF CODE THAT I MODIFIED
U[a] ~ dnorm(0,0.001)} #Variance covariates
# Likelihood
for (i in 1:J) {
mu[i] <- Y #Hyper-parameter for station-specific all richness
NS[i] ~ dnorm(mu[i], tau[i]) #Likelihood
tau[i] <- (1/sigma2[i])
log(sigma2[i]) <- X + inprod(U,COV[i,])
}
}
", fill=TRUE)
sink()

Cox model, coxph(), control treatment with no event, seed germination

I am performing survival analysis and I´m not sure if I am doing it correctly. My dataset is a result of a seed germination experiment. The main variable of interest is the "treat" one (categorical with 3 levels). In my script I am trying to figure out if there is a difference in between treatments, which one is the best, and at what extent, by comparing the PH coeff percentages. Could anyone help me with some of the problems that I'm dealing with?
1) Do I need to declare my variables as.factor() to use them? Or integer is interpreted equally?
2) If proportionality of hazards assumption (PH) is violated, what should I do with my data to proceed to a cox model building? I've intensely researched but haven't been able to understand the programming to add covariate*time interaction
or stratification to my model.
3) How to include frailty terms to cox model and detect random effect (e.g. plate in which seeds were germinated, categorical variable with 4 levels, representing repetition).
4) I also wasn't able to interpret the print(summary(cox.fra)).*
*see below
See below my two whole scripts with comments.
SCRIPT 1
rd01 <- read.table("sa_kb01.txt", header = T) # raw dataset, seed
survival
rd01
str(rd01)
rd01$begin <- as.factor(rd01$begin) # integers to factors
rd01$spp <- as.factor(rd01$spp)
rd01$cit <- as.factor(rd01$cit)
rd01$treat <- as.factor(rd01$treat)
rd01$plate <- as.factor(rd01$plate)
str(rd01)
summary(rd01)
names(rd01) # headers
### Survival analysis
# install.packages("survival")
library(survival)
library (survminer)
?survfit
?survfit.formula
?survfit.coxph
?ggsurvplot
## Fit Kaplan-Meier survivor function
km.fit <- survfit(Surv(day, status) ~ treat, data= rd01, type="kaplan-meier")
km.fit
print(summary(km.fit))
plot(km.fit, conf.int= T, fun = "event", mark.time = c(140), pch = c("S", "W", "A"), col = c("darkred","darkblue","darkgreen"), lty = c("solid","dotted","longdash"),lwd = 1.5, xlab = "time [days]", ylab = "germination probability [%]")
print(summary(km.fit))
## Comparison of Survivor Functions
# Log-rank tests
?survdiff
# Log-rank or Mantel-Haenszel test in "rho = 0" OR
# Peto & Peto modification of the Gehan-Wilcoxon test in "rho = 1"
# ... Assess all groups for heterogeneity
lrmh.123 <- survdiff(Surv(day,status) ~ treat, data= rd01, rho= 0)
print(lrmh.123) # If p<0.05 there are difference between all groups!
# ... Comparing groups pairwise
lrmh.120 <- survdiff(Surv(day,status) ~ treat, data= rd01, subset= {treat!=3}, rho= 0)
lrmh.103 <- survdiff(Surv(day,status) ~ treat, data= rd01, subset= {treat!=2}, rho= 0)
lrmh.023 <- survdiff(Surv(day,status) ~ treat, data= rd01, subset= {treat!=1}, rho= 0)
print(lrmh.120)
print(lrmh.103)
print(lrmh.023) # If p<0.05 there are difference pairwised groups!
## Checking Proportional Hazard (PH) assumption
# Define function mlogmlog() to calculate -log(-log(S(t)))
mlogmlog <- function(y){-log(-log(y))}
# Use estimated Kaplan-Meier survivor functions
km.fit
# ... to plot -log(-log(S(t))) versus log(t)
plot(km.fit, fun= mlogmlog, log="x", mark.time= c(140), pch = c("S", "W", "A"), col = c("darkred","darkblue","darkgreen"), lty = c("solid","dotted","longdash"), lwd = 1.5, xlab="time [days]", ylab= "-log(-log(S(t)))") # If lines do not cross, PH assumption is plausible!
# Interpretarion: http://www.sthda.com/english/wiki/cox-model-assumptions#testing-proportional-hazards-assumption
## Checking for multicollinearity
# install.packages("HH")
library(HH)
# Fit a generalized linear model predicting days from treatment
?glm
mc.glm <- glm(day ~ treat, data=rd01)
print(mc.glm) # doesn't need interpretation, only used to create object to VIF function
# Check for multicollinearity among covariates throught variance inflation factor (VIF)
?vif
mc.vif <- vif(mc.glm)
print(mc.vif) # VIF can determine what proportion of the variation in each covariate
# is explained by the other covariates:
# VIF > 10, serious multicollinearity; VIF = 5, evidence of multicollinearity;
# VIF < 1, no evidence of multicollinearity
## Adding covariates to the Cox model
# Create a Cox model
cox.mod <- coxph(Surv(day, status) ~ treat, data= rd01)
print(summary(cox.mod))
# Interpretation: http://www.sthda.com/english/wiki/cox-proportional-hazards-model
# Double check for PH assumption now with Cox model built
dc.ph <- cox.zph(cox.mod)
dc.ph
ggcoxzph(dc.ph) # if global and individual p-vale > 0.05, PH assumption is plausible!
## Including random effects
?frailty
# Adding plate variable as frailty term
cox.fra <- coxph(Surv(day, status) ~ treat + frailty(plate), data= rd01)
print(summary(cox.fra)) # if global and individual p-vale < 0.05,
# maintain frailty term while adding covariates 1 at a time in cox model!`
SCRIPT 2 - the same, but different dataset, control treat1 with no event!
rd01 <- read.table("sa_hal01.txt", header = T) # raw dataset, seed survival
rd01
str(rd01)
rd01$begin <- as.factor(rd01$begin) # integers to factors
rd01$spp <- as.factor(rd01$spp)
rd01$cit <- as.factor(rd01$cit)
rd01$treat <- as.factor(rd01$treat)
rd01$plate <- as.factor(rd01$plate)
str(rd01)
summary(rd01)
names(rd01) # headers
### Survival analysis
# install.packages("survival")
library(survival)
library (survminer)
?survfit
?survfit.formula
?survfit.coxph
?ggsurvplot
## Fit Kaplan-Meier survivor function
km.fit <- survfit(Surv(day, status) ~ treat, data= rd01, type="kaplan-meier")
km.fit
print(summary(km.fit))
plot(km.fit, conf.int= T, fun = "event", mark.time = c(140), pch = c("S", "W", "A"), col = c("darkred","darkblue","darkgreen"), lty = c("solid","dotted","longdash"),lwd = 1.5, xlab = "time [days]", ylab = "germination probability [%]")
print(summary(km.fit))
## Comparison of Survivor Functions
# Log-rank tests
?survdiff
# Log-rank or Mantel-Haenszel test in "rho = 0" OR
# Peto & Peto modification of the Gehan-Wilcoxon test in "rho = 1"
# ... Assess all groups for heterogeneity
lrmh.123 <- survdiff(Surv(day,status) ~ treat, data= rd01, rho= 0)
print(lrmh.123) # If p<0.05 there are difference between all groups!
# ... Comparing groups pairwise
lrmh.120 <- survdiff(Surv(day,status) ~ treat, data= rd01, subset= {treat!=3}, rho= 0)
lrmh.103 <- survdiff(Surv(day,status) ~ treat, data= rd01, subset= {treat!=2}, rho= 0)
lrmh.023 <- survdiff(Surv(day,status) ~ treat, data= rd01, subset= {treat!=1}, rho= 0)
print(lrmh.120)
print(lrmh.103)
print(lrmh.023) # If p<0.05 there are difference pairwised groups!
## Checking Proportional Hazard (PH) assumption
# Define function mlogmlog() to calculate -log(-log(S(t)))
mlogmlog <- function(y){-log(-log(y))}
# Use estimated Kaplan-Meier survivor functions
km.fit
# ... to plot -log(-log(S(t))) versus log(t)
plot(km.fit, fun= mlogmlog, log="x", mark.time= c(140), pch = c("S", "W", "A"), col = c("darkred","darkblue","darkgreen"), lty = c("solid","dotted","longdash"), lwd = 1.5, xlab="time [days]", ylab= "- log(-log(S(t)))") # If lines do not cross, PH assumption is plausible!
# Interpretarion: http://www.sthda.com/english/wiki/cox-model- assumptions#testing-proportional-hazards-assumption
## Checking for multicollinearity
# install.packages("HH")
library(HH)
# Fit a generalized linear model predicting days from treatment
?glm
mc.glm <- glm(day ~ treat, data=rd01)
print(mc.glm) # doesn't need interpretation, only used to create object to VIF function
# Check for multicollinearity among covariates throught variance inflation factor (VIF)
?vif
mc.vif <- vif(mc.glm)
print(mc.vif) # VIF can determine what proportion of the variation in each covariate
# is explained by the other covariates:
# VIF > 10, serious multicollinearity; VIF = 5, evidence of multicollinearity;
# VIF < 1, no evidence of multicollinearity
## Adding covariates to the Cox model
# Create a Cox model
cox.mod <- coxph(Surv(day, status) ~ treat, data= rd01)
print(summary(cox.mod))
# Interpretation: http://www.sthda.com/english/wiki/cox-proportional-hazards-model
# Double check for PH assumption now with Cox model built
dc.ph <- cox.zph(cox.mod)
dc.ph
ggcoxzph(dc.ph) # if global and individual p-vale > 0.05, PH assumption is plausible!
## Including random effects
?frailty
# Adding plate variable as frailty term
cox.fra <- coxph(Surv(day, status) ~ treat + frailty(plate), data= rd01)
print(summary(cox.fra)) # if global and individual p-vale < 0.05,
# maintain frailty term while adding covariates 1 at a time in cox model!
There seems to be a statistically significant difference and treat3 differs from the other groups in both scripts. In script 1 PH is violated and I don´t now what to do. Apart from that, Cox model in script 1 seems to work fine and the interpretation of hazard ratios are OK, but in script 2, no idea how to interpret or solve that (there was no event in control treat1).
1) Do I need to declare my variables as.factor() to use them? Or integer is interpreted equally?
I think in your case as.factor is correct. You can use integers if you have continuous numeric variables - for example if you would have time seeds have been stored before the experiment, you could use as.numeric for time variable.
2) If PH is violated, what should I do with my data to proceed to a cox model building? I've intensely researched but haven't been able to understand the programming to add covariate x time interaction or stratification to my model.
Cox regression, aka Cox proportional hazards model, is based on the assumption of proportional hazards. If that assumption is violated, you won´t get reliable results. You probably could try some data transformations to see if it would help. Or if it is violated in some subexperiment/group, you could just leave it out.

bivariate Probit/logit R : how to find ALL coefficients and marginal effects with the "zeligverse" package

I am running a bivariate logit model in R with the zeligverse package.I want to calculate the impact of my independant variables on P(Y1=1), P(Y2=1), P(Y1=1,Y2=0), P(Y1=1,Y2=1), P(Y1=0,Y2=1), P(Y1=0,Y2=0), P(Y1=1|Y2=0) and all the other conditional probabilities (Y1 and Y2 are my dependant variables. They both equal 0 or 1). I also want all the marginal effects associated with these probabilities for each independant variable.
Do you know how to find those in this package (or in another package if it works better)?
Not sure this is what you are looking for (feel free to mark me down if not). Zelig packages do seem to be a right choice for your specific question.
library(Zelig)
## Let X_i be independent variable
## Assume you are working with a univariate target variable Y where Y \in {0, 1}
set.seed(123)
m <- 100
df <- data.frame(
Y = rbinom(m, 1, 0.5),
X1 = rbinom(m, 1, 0.95),
X2 = rbinom(m, 1, 0.95)
)
## Fit model once:
fit <- zelig(
Y ~ .,
model = "logit",
data = df,
cite = FALSE
)
summary(fit)
## Let's focus on the binomial predictor 2
x.out1 <- setx(fit, X2=1)
## Run estimation based on a posterior distribution:
postFit <- Zelig::sim(fit, x=x.out1)
summary(postFit)
# plot(postFit)

Structural Equation Model with Linear Dependency (Lavaan)

I want to estimate a structural equation model using lavaan in R with a categorical mediator. A wrinkle is that three of the exogenous variables are linearly dependent. However, this shouldn't be a problem since I'm using the categorical mediator to achieve identification a la Judea Pearl's front-door criterion. That is, mathematically each particular equation is identified (see the R code below).
With lavaan in R I can obtain estimates when the mediator is numeric, but not when it is categorical. With a categorical mediator I obtain the following error:
Error in lav_samplestats_step1(Y = Data, ov.names = ov.names, ov.types = ov.types,
: lavaan ERROR: linear regression failed for y; X may not be of full rank in group 1
Any advice on how to obtain estimates with a categorical mediator using lavaan?
Code:
# simulating the dataset
set.seed(1234) # seed for replication
x1 <- rep(seq(1:4), 100) # variable 1
x2 <- rep(1:4, each=100) # variable 2
x3 <- x2 - x1 + 4 # linear dependence
m <- sample(0:1, size = 400, replace = TRUE) # mediator
df <- data.frame(cbind(x1,x2,x3,m)) # dataframe
df$y <- 6.5 + x1*(0.5) + x2*(0.2) + m*(-0.4) + x3*(-1) + rnorm(400, 0, 1) # outcome
# structural equation model using pearl's front-door criterion
sem.formula <- 'y ~ 1 + x1 + x2 + m
m ~ 1 + x3'
# continuous mediator: works!
fit <- lavaan::sem(sem.formula, data=df, estimator="WLSMV",
se="none", control=list(iter.max=500))
# categorical mediator: doesn't work
fit <- lavaan::sem(sem.formula, data=df, estimator="WLSMV",
se="none", control=list(iter.max=500),
ordered = "m")

Resources