Producing effects plots within the GLMMadaptive package - r

I am getting an inscrutable error message while trying to run effects plots on objects created using the GLMMadaptive::mixed_model() and effects::predictorEffect() functions.
Here is an example problem created from toy binary longitudinal data, created using code supplied with one of the vignettes included with the GLMMadaptive package.
# Library Relevant Packages
library(GLMMadaptive)
library(effects)
# Now we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
Now when we fit a longitudinal logistic regression using the mixed_model function...
# fit the mixed effects logistic regression for y assuming random intercepts and random slopes for the random-effects part.
fm <- mixed_model(fixed = y ~ sex * time,
random = ~ time | id,
data = DF,
family = binomial())
...and try to create an effects plot using the effects::predictorEffect() function...
plot(predictorEffect("time", fm), type = "link")
...we get the following error
Error in mod.matrix %*% scoef : non-conformable arguments
Has anyone encountered this problem before and if so, found a way to solve it?

Related

Find the parameter estimates for each random term in a binomial GLMM (lme4)?

Does anyone know how to extract the parameter estimates of random term when using the (1 | …) syntax in a glmer model (including se, t ratio and p value)? I’m only able to access the average variance and std. deviance with the summary function.
Some background: I used cohort and period random terms (both factorized), where period = each survey year, and cohort = 8 birth cohorts. My model empty model looks like this :
glmer(pid ~ age + age2 + (1 | cohort) + (1| period)
There's a bit of a conceptual problem with what you are doing. The random effects do not have the same standing in statistical theory as the fixed effects. You are not really supposed to be making inferences on their estimates since you don't have a random sampling from their overall population. Hence you need to make some unteseted assumptions on their distribution. That said, there are apparently times when you might want to do it but with care that you are not making unsupportable claims. See: https://stats.stackexchange.com/questions/392314/interpretation-of-fixed-effect-coefficients-from-glms-and-glmms .
Dimitris Rizopoulosthen responded to a request for the possibility of getting "an average" of the random effects conditional on the fixed effects (rather the flipped version of mixed models inference). He offered a function in his GLMM package:
https://drizopoulos.github.io/GLMMadaptive/articles/Methods_MixMod.html#marginalized-coefficients
This is his example ......
install.packages("GLMMadaptive"); library(GLMMadaptive)
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
#We continue by fitting the mixed effects logistic regression for y assuming random intercepts and random slopes for the random-effects part.
fm <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
family = binomial())
.... and then the call to his marginal_coefs function.
marginal_coefs(fm, std_errors=TRUE)
Estimate Std.Err z-value p-value
(Intercept) -1.6025 0.2906 -5.5154 < 1e-04
sexfemale -1.0975 0.3676 -2.9859 0.0028277
time 0.1766 0.0337 5.2346 < 1e-04
sexfemale:time 0.0508 0.0366 1.3864 0.1656167

bivariate Probit/logit R : how to find ALL coefficients and marginal effects with the "zeligverse" package

I am running a bivariate logit model in R with the zeligverse package.I want to calculate the impact of my independant variables on P(Y1=1), P(Y2=1), P(Y1=1,Y2=0), P(Y1=1,Y2=1), P(Y1=0,Y2=1), P(Y1=0,Y2=0), P(Y1=1|Y2=0) and all the other conditional probabilities (Y1 and Y2 are my dependant variables. They both equal 0 or 1). I also want all the marginal effects associated with these probabilities for each independant variable.
Do you know how to find those in this package (or in another package if it works better)?
Not sure this is what you are looking for (feel free to mark me down if not). Zelig packages do seem to be a right choice for your specific question.
library(Zelig)
## Let X_i be independent variable
## Assume you are working with a univariate target variable Y where Y \in {0, 1}
set.seed(123)
m <- 100
df <- data.frame(
Y = rbinom(m, 1, 0.5),
X1 = rbinom(m, 1, 0.95),
X2 = rbinom(m, 1, 0.95)
)
## Fit model once:
fit <- zelig(
Y ~ .,
model = "logit",
data = df,
cite = FALSE
)
summary(fit)
## Let's focus on the binomial predictor 2
x.out1 <- setx(fit, X2=1)
## Run estimation based on a posterior distribution:
postFit <- Zelig::sim(fit, x=x.out1)
summary(postFit)
# plot(postFit)

Plotting interaction effects in Bayesian models (using rstanarm)

I'm trying to show how the effect of one variables changes with the values of another variable in a Bayesian linear model in rstanarm(). I am able to fit the model and take draws from the posterior to look at the estimates for each parameter, but it's not clear how to give some sort of plot of the effects of one variable in the interaction as the other changes and the associated uncertainty (i.e. a marginal effects plot). Below is my attempt:
library(rstanarm)
# Set Seed
set.seed(1)
# Generate fake data
w1 <- rbeta(n = 50, shape1 = 2, shape2 = 1.5)
w2 <- rbeta(n = 50, shape1 = 3, shape2 = 2.5)
dat <- data.frame(y = log(w1 / (1-w1)),
x = log(w2 / (1-w2)),
z = seq(1:50))
# Fit linear regression without an intercept:
m1 <- rstanarm::stan_glm(y ~ 0 + x*z,
data = dat,
family = gaussian(),
algorithm = "sampling",
chains = 4,
seed = 123,
)
# Create data sets with low values and high values of one of the predictors
dat_lowx <- dat
dat_lowx$x <- 0
dat_highx <- dat
dat_highx$x <- 5
out_low <- rstanarm::posterior_predict(object = m1,
newdata = dat_lowx)
out_high <- rstanarm::posterior_predict(object = m1,
newdata = dat_highx)
# Calculate differences in posterior predictions
mfx <- out_high - out_low
# Somehow get the coefficients for the other predictor?
In this (linear, Gaussian, identity link, no intercept) case,
mu = beta_x * x + beta_z * z + beta_xz * x * z
= (beta_x + beta_xz * z) * x
= (beta_z + beta_xz * x) * z
So, to plot the marginal effect of x or z, you just need an appropriate range of each and the posterior distribution of the coefficients, which you can obtain via
post <- as.data.frame(m1)
Then
dmu_dx <- post[ , 1] + post[ , 3] %*% t(sort(dat$z))
dmu_dz <- post[ , 2] + post[ , 3] %*% t(sort(dat$x))
And you can then estimate a single marginal effect for each observation in your data by using something like the below, which calculated the effect of x on mu for each observation in your data and the effect of z on mu for each observation.
colnames(dmu_dx) <- round(sort(dat$x), digits = 1)
colnames(dmu_dz) <- dat$z
bayesplot::mcmc_intervals(dmu_dz)
bayesplot::mcmc_intervals(dmu_dx)
Note that the column names are simply the observations in this case.
You could also use either the ggeffects-package, especially for marginal effects; or the sjPlot-package for marginal effects and other plot types (for marginal effects, sjPlot simply wraps the functions from ggeffects).
To plot marginal effects of interactions, use sjPlot::plot_model() with type = "int". Use mdrt.values to define which values to plot for continuous moderator variables, and use ppd to let prediction be based on either the posterior distribution of the linear predictor or draws from posterior predictive distribution.
library(sjPlot)
plot_model(m1, type = "int", terms = c("x", "z"), mdrt.values = "meansd")
plot_model(m1, type = "int", terms = c("x", "z"), mdrt.values = "meansd", ppd = TRUE)
or to plot marginal effects at other specific values, use type = "pred" and specify the values in the terms-argument:
plot_model(m1, type = "pred", terms = c("x", "z [10, 20, 30, 40]"))
# same as:
library(ggeffects)
dat <- ggpredict(m1, terms = c("x", "z [10, 20, 30, 40]"))
plot(dat)
There are more options, and also different ways of customizing the plot appearance. See related help files and package vignettes.

Structural Equation Model with Linear Dependency (Lavaan)

I want to estimate a structural equation model using lavaan in R with a categorical mediator. A wrinkle is that three of the exogenous variables are linearly dependent. However, this shouldn't be a problem since I'm using the categorical mediator to achieve identification a la Judea Pearl's front-door criterion. That is, mathematically each particular equation is identified (see the R code below).
With lavaan in R I can obtain estimates when the mediator is numeric, but not when it is categorical. With a categorical mediator I obtain the following error:
Error in lav_samplestats_step1(Y = Data, ov.names = ov.names, ov.types = ov.types,
: lavaan ERROR: linear regression failed for y; X may not be of full rank in group 1
Any advice on how to obtain estimates with a categorical mediator using lavaan?
Code:
# simulating the dataset
set.seed(1234) # seed for replication
x1 <- rep(seq(1:4), 100) # variable 1
x2 <- rep(1:4, each=100) # variable 2
x3 <- x2 - x1 + 4 # linear dependence
m <- sample(0:1, size = 400, replace = TRUE) # mediator
df <- data.frame(cbind(x1,x2,x3,m)) # dataframe
df$y <- 6.5 + x1*(0.5) + x2*(0.2) + m*(-0.4) + x3*(-1) + rnorm(400, 0, 1) # outcome
# structural equation model using pearl's front-door criterion
sem.formula <- 'y ~ 1 + x1 + x2 + m
m ~ 1 + x3'
# continuous mediator: works!
fit <- lavaan::sem(sem.formula, data=df, estimator="WLSMV",
se="none", control=list(iter.max=500))
# categorical mediator: doesn't work
fit <- lavaan::sem(sem.formula, data=df, estimator="WLSMV",
se="none", control=list(iter.max=500),
ordered = "m")

How to simulate quantities of interest using arm or rstanarm packages in R?

I would like to know how to simulate quantities of interest out of a regression model estimated using either the arm or the rstanarm packages in R. I am a newbie in Bayesian methods and R and have been using the Zelig package for some time. I asked a similar question before, but I would like to know if it is possible to simulate those quantities using the posterior distribution estimated by those packages.
In Zelig you can set the values you want for the independent values and it calculates the results for the outcome variable (expected value, probability, etc). An example:
# Creating a dataset:
set.seed(10)
x <- rnorm(100,20,10)
z <- rnorm(100,10,5)
e <- rnorm(100,0,1)
y <- 2*x+3*z+e
df <- data.frame(x,z,e,y)
# Loading Zelig
require(Zelig)
# Model
m1.zelig <- zelig(y ~ x + z, model="ls", data=df)
summary(m1.zelig)
# Simulating z = 10
s1 <- setx(m1.zelig, z = 10)
simulation <- sim(m1.zelig, x = s1)
summary(simulation)
So Zelig keeps x at its mean (20.56), and simulates the quantity of interest with z = 10. In this case, y is approximately 71.
The same model using arm:
# Model
require(arm)
m1.arm <- bayesglm(y ~ x + z, data=df)
summary(m1.arm)
And using rstanarm:
# Model
require(rstanarm)
m1.stan <- stanlm(y ~ x + z, data=df)
print(m1.stan)
Is there any way to simulate z = 10 and x equals to its mean with the posterior distribution estimated by those two packages and get the expected value of y? Thank you very much!
In the case of bayesglm, you could do
sims <- arm::sim(m1.arm, n = 1000)
y_sim <- rnorm(n = 1000, mean = sims#coef %*% t(as.matrix(s1)), sd = sims#sigma)
mean(y_sim)
For the (unreleased) rstanarm, it would be similar
sims <- as.matrix(m1.stan)
y_sim <- rnorm(n = nrow(sims), mean = sims[,1:(ncol(sims)-1)] %*% t(as.matrix(s1)),
sd = sims[,ncol(sims)])
mean(y_sim)
In general for Stan, you could pass s1 as a row_vector and utilize it in a generated quantities block of a .stan file like
generated quantities {
real y_sim;
y_sim <- normal_rng(s1 * beta, sigma);
}
in which case the posterior distribution of y_sim would appear when you print the posterior summary.

Resources