multiple imputation, lmer, and pooling ggeffects objects - r

I computed linear mixed effects models using lme4::lmer() on data that I multiply imputed using the mice package. On these lmer objects, I want to apply ggeffects::ggeffect() to get marginal effects that I can then plot for mean, +1sd and -1sd.
The pool_predictions function seems perfectly suited and does a great job for lm objects; however, for lmer objects the ggeffect() function does not work. ggpredict() for some reason works, but I want to get marginal, not conditional effects.
Here's a minimal reproducible example that I adapted from the pool_predictions() reference (the mixed model doesn't make sense, it's just to create an example):
if (!require("pacman")) install.packages("pacman")
pacman::p_load(mice,stats,lme4,ggeffects)
data("nhanes2")
#First, the working example from the pool_predictions() reference, using an lm object and ggpredict():
imp <- mice(nhanes2, printFlag = FALSE)
predictions1 <- lapply(1:5, function(i) {
m1 <- lm(bmi ~ age + hyp + chl, data = complete(imp, action = i))
ggpredict(m1, "age")
})
pool_predictions(predictions1)
#Now the same example, but using ggeffect() on the lm object, which also works:
predictions2 <- lapply(1:5, function(i) {
m2 <- lm(bmi ~ age + hyp + chl, data = complete(imp, action = i))
ggeffect(m2, "age")
})
pool_predictions(predictions2)
#It also seems to work for lmer objects, at least when using ggpredict():
predictions3 <- lapply(1:5, function(i) {
m3 <- lmer(bmi ~ age + chl + (1|hyp), data = complete(imp, action = i))
ggpredict(m3, "age")
})
pool_predictions(predictions3)
#But when I use ggeffect() instead of ggpredict(), this doesn't work anymore for lmer objects.
predictions4 <- lapply(1:5, function(i) {
m4 <- lmer(bmi ~ age + chl + (1|hyp), data = complete(imp, action = i))
ggeffect(m4, "age")
})
pool_predictions(predictions4)
Does anyone have an idea why this happens or has any tips how I can get the pooled marginal effects for my lmer object?
Thanks a lot!
Antje

I think this may be due to the way how data is retrieved from the environment, which fails for ggeffect() (which is based on the effects package). You could try ggemmeans() instead, which should give you the same results as ggeffect() does.

Related

Getting an interaction plot from a pooled lme model with mids object

Preface - I really hope this makes sense!
I ran a linear-mixed effect model using an imputed dataset (FYI, the data is a mids object imputed using mice). The model has a three-way interaction with 3 continuous variables. I am now trying to plot the interaction using the interactions::interact_plot function. However, I'm receiving an error when I run the plot code, which I believe is due to the fact that the model came from a mids object and not a data frame. Does anyone know how to address this error or if there's a better way to get the plot that I'm trying to get?
Thanks very much in advance!
MIDmod1 <- with(data = df.mids, exp = lmer(GC ~ Age + Sex + Edu + Stress*Time*HLI + (1|ID)))
summary(pool(MIDmod1))
interact_plot(
model=MIDmod1,
pred = Time,
modx=Stress,
mod2=HLI,
data = df.mids,
interval=TRUE,
y.label='Global cognition composite score',
modx.labels=c('Low Baseline Stress (-1SD)','Moderate Baseline Stress (Mean)', 'High Baseline Stress (+1SD)'),
mod2.labels=c('Low HLI (-1SD)', 'Moderate HLI (Mean)', 'High HLI (+1SD)'),
legend.main='') + ylim(-2,2)
Error:
Error in rep(1, times = nrow(data)) : invalid 'times' argument
Note - I also get an error if I don't include the data argument (optional argument for this function).
Error in formula.default(object, env = baseenv()) : invalid formula
BTW - I am able to generate the plot when the model comes from a data frame - an example of what this should look like is included here: 1
Sorry, but it won’t be that easy. Multiple imputation object will definitely require special treatment, and none of the many R packages which can plot interactions are likely to work out of hte box.
Here’s a minimal example, adapted from the multiple imputation vignette of the marginaleffects package. (Disclaimer: I am the author.)
library(mice)
library(lme4)
library(ggplot2)
library(marginaleffects)
# insert missing data in an existing dataset and impute
iris_miss <- iris
iris_miss$Sepal.Width[sample(1:nrow(iris), 20)] <- NA
iris_mice <- mice(iris_miss, m = 20, printFlag = FALSE, .Random.seed = 1024)
iris_mice <- complete(iris_mice, "all")
# fit a model on 1 imputed datatset and use the `plot_predictions()` function
# with the `draw=FALSE` argument to extract the data that we want to plot
fit <- function(dat) {
mod <- lmer(Sepal.Width ~ Petal.Width * Petal.Length + (1 | Species), data = dat)
out <- plot_predictions(mod, condition = list("Petal.Width", "Petal.Length" = "threenum"), draw = FALSE)
# `mice` requires a unique row identifier called "term"
out$term <- out$rowid
class(out) <- c("custom", class(out))
return(out)
}
# `tidy.custom()` is needed by `mice` to combine datasets, but the output of fit() also has
# the right structure and column names, so it is useless
tidy.custom <- function(x, ...) return(x)
# Fit on each imputation
mod_mice <- lapply(iris_mice, fit)
# Pool
mod_pool <- pool(mod_mice)$pooled
# Merge back some of the covariates
datplot <- data.frame(mod_pool, mod_mice[[1]][, c("Petal.Width", "Petal.Length")])
# Plot
ggplot(datplot, aes(Petal.Width, estimate, color = Petal.Length)) +
geom_line() +
theme_minimal()

How to repeat univariate regression and extract P values?

I am using lapply to perform several glm regressions on one dependent variable by one independent variable at a time. but I'm not sure how to extract the P values at a time.
There are 200 features in my dataset, but the code below only gave me the P value of feature#1. How can I get a matrix of all P values of the 200 features?
valName<- as.data.frame(colnames(repeatData))
featureName<-valName[3,]
lapply(featureName,
function(var) {
formula <- as.formula(paste("outcome ~", var))
fit.logist <- glm(formula, data = repeatData, family = binomial)
summary(fit.logist)
Pvalue<-coef(summary(fit.logist))[,'Pr(>|z|)']
})
I
I simplified your code a little bit; (1) used reformulate() (not really different, just prettier) (2) returned only the p-value for the focal variable (not the intercept p-value). (If you leave out the 2, you'll get a 2-row matrix with intercept and focal-variable p-values.)
My example uses the built-in mtcars data set, with an added (fake) binomial response.
repeatData <- data.frame(outcome=rbinom(nrow(mtcars), size=1, prob=0.5), mtcars)
ff <- function(var) {
formula <- reformulate(var, response="outcome")
fit.logist <- glm(formula, data = repeatData, family = binomial)
coef(summary(fit.logist))[2, 'Pr(>|z|)']
}
## skip first column (response variable).
sapply(names(repeatData)[-1], ff)

How can I extract one specific coefficient from multiple lavaan models?

I wrote a function to run several lavaan models at once (from 5 different datasets). In the output I get the 5 different outputs. However, I would like to extract one specific estimate from each of these models, because I am using these in a meta-analysis (and I have many more models)
Here is my code for running the model:
df_list <- list ('Y1'=emo_dyn_1,'Y2'=emo_dyn_2,'Y3'=emo_dyn_3,'Y4'=emo_dyn_4,'Y5'=emo_dyn_5)
model <- 'DepB ~ isdNA + imeanNA + sex + age'
fun = function(emo_dyn){
fit=sem(model,
data=emo_dyn,
estimator = "MLR",
missing = "ml.x")
summ = summary(fit, standardized = TRUE)
list(fit = fit,summary = summ)
}
results <- lapply(df_list,fun)
names(results) <- names(df_list)
results
And this is how I extract the coefficient. It kinda makes it a dataframe and then I extract the specific value from it. Not sure if that is the best option. It is about the standardized estimate of a specific path. But it is just copy and paste and I am sure this goes easier, but I don't know how to write this loop.
emo_dyn_1_est<-standardizedSolution(results$Y1$fit) # Standardised coefficients
emo_dyn_1_est_1<-emo_dyn_1_est[1, 4]
emo_dyn_1_est_1
emo_dyn_2_est<-standardizedSolution(results$Y2$fit) # Standardised coefficients
emo_dyn_2_est_2<-emo_dyn_2_est[1, 4]
emo_dyn_2_est_2
emo_dyn_3_est<-standardizedSolution(results$Y3$fit) # Standardised coefficients
emo_dyn_3_est_3<-emo_dyn_3_est[1, 4]
emo_dyn_3_est_3
emo_dyn_4_est<-standardizedSolution(results$Y4$fit) # Standardised coefficients
emo_dyn_4_est_4<-emo_dyn_4_est[1, 4]
emo_dyn_4_est_4
emo_dyn_5_est<-standardizedSolution(results$Y5$fit) # Standardised coefficients
emo_dyn_5_est_5<-emo_dyn_5_est[1, 4]
emo_dyn_5_est_5
lavaan has the parameterEstimates function so you can do something like:
df_list <- list ('Y1'=emo_dyn_1,'Y2'=emo_dyn_2,'Y3'=emo_dyn_3,'Y4'=emo_dyn_4,'Y5'=emo_dyn_5)
model <- 'DepB ~ isdNA + imeanNA + sex + age'
fun <- function(emo_dyn){
fit <- sem(model,
data=emo_dyn,
estimator = "MLR",
missing = "ml.x")
fit
}
results <- lapply(df_list,fun)
names(results) <- names(df_list)
## Get a specific parameter
get_param <- function(fit, coef_pos) {
param <- parameterEstimates(fit, standardized = TRUE)[coef_pos, "std.lv"]
param
}
lapply(results, get_param, coef_pos = 1)
I made one change: in your lapply to get the results I only kept the model fit. If you want all the summaries you can just do lapply(results, summary). The get_param function assumes that you know the position in the results table of the parameter you want.
If you want to keep your existing lapply for the results then something like this would work:
results_fit_only <- lapply(results, "[[", "fit")
lapply(results_fit_only, get_param, coef_pos = 1)

Overlay 2 allEffects graphs

I have the following model
require(effects)
fit<-lme(x ~ y, data, random= ~1|item)
plot(allEffects(fit)
fit2<-lme(x ~ y, data2, random = ~1|item)
plot(allEffects(fit2)
How can I plot fit and fit2 overlaying? I have tried the par(new=T), but it does not work. The graphs plot fine individually.
I'm not sure there's a very nice way to do this. I usually extract the information from the effects structure and plot it with ggplot (lattice would be possible too).
Here's an example:
library(effects)
library(nlme)
library(plyr) ## utilities
Fit a model to the first and second half of one of the standard example data sets:
fm1 <- lme(distance ~ age, random = ~1|Subject,
data = Orthodont[1:54,])
fm2 <- update(fm1, data = Orthodont[55:108,])
a1 <- allEffects(fm1)
a2 <- allEffects(fm2)
Extract the information from the efflist object. This is the part that isn't completely general ... the hard part is getting out the predictor variable.
as.data.frame.efflist <- function(x) {
ldply(x,
function(z) {
r <- with(z,data.frame(fit,
var=variables[[1]]$levels,
lower,upper))
return(plyr::rename(r,setNames(z$variables[[1]]$name,"var")))
})
}
For convenience, use ldply to put the results of both models together:
comb <- ldply(list(fm1=a1,fm2=a2),as.data.frame,.id="model")
Now plot:
library(ggplot2); theme_set(theme_bw())
ggplot(comb,aes(age,fit,
ymin=lower,ymax=upper,
colour=model,fill=model))+
geom_line()+
geom_ribbon(alpha=0.2,colour=NA)+
geom_rug(sides="b")
The rug plot component is a little silly here.

predict method for felm from lfe package

Does anyone have a nice clean way to get predict behavior for felm models?
library(lfe)
model1 <- lm(data = iris, Sepal.Length ~ Sepal.Width + Species)
predict(model1, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Works
model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Does not work
UPDATE (2020-04-02): The answer from Grant below using the new package fixest provides a more parsimonious solution.
As a workaround, you could combine felm, getfe, and demeanlist as follows:
library(lfe)
lm.model <- lm(data=demeanlist(iris[, 1:2], list(iris$Species)), Sepal.Length ~ Sepal.Width)
fe <- getfe(felm(data = iris, Sepal.Length ~ Sepal.Width | Species))
predict(lm.model, newdata = data.frame(Sepal.Width = 3)) + fe$effect[fe$idx=="virginica"]
The idea is that you use demeanlist to center the variables, then lm to estimate the coefficient on Sepal.Width using the centered variables, giving you an lm object over which you can run predict. Then run felm+getfe to get the conditional mean for the fixed effect, and add that to the output of predict.
Late to the party, but the new fixest package (link) has a predict method. It supports high-dimensional fixed effects (and clustering, etc.) using a very similar syntax to lfe. Somewhat remarkably, it is also considerably faster than lfe for the benchmark cases that I've tested.
library(fixest)
model_feols <- feols(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict(model_feols, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Works
This might not be the answer that you are looking for, but it seems that the author did not add any functionality to the lfe package in order to make predictions on external data by using the fitted felm model. The primary focus seems to be on the analysis of the group fixed effects. However, it's interesting to note that in the documentation of the package the following is mentioned:
The object has some resemblance to an 'lm' object, and some
postprocessing methods designed for lm may happen to work. It may
however be necessary to coerce the object to succeed with this.
Hence, it might be possible to coerce the felm object to an lm object in order to obtain some additional lm functionality (if all the required info is present in the object to perform the necessary computations).
The lfe package is intended to be run on very large datasets and effort was made to conserve memory: As a direct result of this, the felm object does not use/contain a qr decomposition, as opposed to the lm object. Unfortunately, the lm predict procedure relies on this information in order to compute the predictions. Hence, coercing the felm object and executing the predict method will fail:
> model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
> class(model2) <- c("lm","felm") # coerce to lm object
> predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
Error in qr.lm(object) : lm object does not have a proper 'qr' component.
Rank zero or should not have used lm(.., qr=FALSE).
If you really must use this package to perform the predictions then you could maybe write your own simplified version of this functionality by using the information that you have available in the felm object. For example, the OLS regression coëfficients are available via model2$coefficients.
This should work for cases where you wish to ignore the group effects in the prediction, are predicting for new X's, and only want confidence intervals. It first looks for a clustervcv attribute, then robustvcv, then vcv.
predict.felm <- function(object, newdata, se.fit = FALSE,
interval = "none",
level = 0.95){
if(missing(newdata)){
stop("predict.felm requires newdata and predicts for all group effects = 0.")
}
tt <- terms(object)
Terms <- delete.response(tt)
attr(Terms, "intercept") <- 0
m.mat <- model.matrix(Terms, data = newdata)
m.coef <- as.numeric(object$coef)
fit <- as.vector(m.mat %*% object$coef)
fit <- data.frame(fit = fit)
if(se.fit | interval != "none"){
if(!is.null(object$clustervcv)){
vcov_mat <- object$clustervcv
} else if (!is.null(object$robustvcv)) {
vcov_mat <- object$robustvcv
} else if (!is.null(object$vcv)){
vcov_mat <- object$vcv
} else {
stop("No vcv attached to felm object.")
}
se.fit_mat <- sqrt(diag(m.mat %*% vcov_mat %*% t(m.mat)))
}
if(interval == "confidence"){
t_val <- qt((1 - level) / 2 + level, df = object$df.residual)
fit$lwr <- fit$fit - t_val * se.fit_mat
fit$upr <- fit$fit + t_val * se.fit_mat
} else if (interval == "prediction"){
stop("interval = \"prediction\" not yet implemented")
}
if(se.fit){
return(list(fit=fit, se.fit=se.fit_mat))
} else {
return(fit)
}
}
To extend the answer from pbaylis, I created a slightly longwinded function that extends nicely to allow for more than one fixed effect. Note that you have to manually enter the original dataset used in the felm model. The function returns a list with two items: the vector of predictions, and a dataframe based on the new_data that includes the predictions and fixed effects as columns.
predict_felm <- function(model, data, new_data) {
require(dplyr)
# Get the names of all the variables
y <- model$lhs
x <- rownames(model$beta)
fe <- names(model$fe)
# Demean according to fixed effects
data_demeaned <- demeanlist(data[c(y, x)],
as.list(data[fe]),
na.rm = T)
# Create formula for LM and run prediction
lm_formula <- as.formula(
paste(y, "~", paste(x, collapse = "+"))
)
lm_model <- lm(lm_formula, data = data_demeaned)
lm_predict <- predict(lm_model,
newdata = new_data)
# Collect coefficients for fe
fe_coeffs <- getfe(model) %>%
select(fixed_effect = effect, fe_type = fe, idx)
# For each fixed effect, merge estimated fixed effect back into new_data
new_data_merge <- new_data
for (i in fe) {
fe_i <- fe_coeffs %>% filter(fe_type == i)
by_cols <- c("idx")
names(by_cols) <- i
new_data_merge <- left_join(new_data_merge, fe_i, by = by_cols) %>%
select(-matches("^idx"))
}
if (length(lm_predict) != nrow(new_data_merge)) stop("unmatching number of rows")
# Sum all the fixed effects
all_fixed_effects <- base::rowSums(select(new_data_merge, matches("^fixed_effect")))
# Create dataframe with predictions
new_data_predict <- new_data_merge %>%
mutate(lm_predict = lm_predict,
felm_predict = all_fixed_effects + lm_predict)
return(list(predict = new_data_predict$felm_predict,
data = new_data_predict))
}
model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict_felm(model = model2, data = iris, new_data = data.frame(Sepal.Width = 3, Species = "virginica"))
# Returns prediction and data frame
I think what you're looking for might be the lme4 package. I was able to get a predict to work using this:
library(lme4)
data(iris)
model2 <- lmer(data = iris, Sepal.Length ~ (Sepal.Width | Species))
predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
1
6.610102
You may have to play around a little to specify the particular effects you're looking for, but the package is well-documented so it shouldn't be a problem.

Resources