How to extract random intercepts from mixed effects Tidymodels - r

I am trying to extract random intercepts from tidymodels using lme4 and multilevelmod. I able to do this using lme4 below:
Using R and lme4:
library("tidyverse")
library("lme4")
# set up model
mod <- lmer(Reaction ~ Days + (1|Subject),data=sleepstudy)
# create expanded df
expanded_df <- with(sleepstudy,
data.frame(
expand.grid(Subject=levels(Subject),
Days=seq(min(Days),max(Days),length=51))))
# create predicted df with **random intercepts**
predicted_df <- data.frame(expanded_df,resp=predict(mod,newdata=expanded_df))
predicted_df
# plot intercepts
ggplot(predicted_df,aes(x=Days,y=resp,colour=Subject))+
geom_line()
Using tidymodels:
# example from
# https://github.com/tidymodels/multilevelmod
library("multilevelmod")
library("tidymodels")
library("tidyverse")
library("lme4")
#> Loading required package: parsnip
data(sleepstudy, package = "lme4")
# set engine to lme4
mixed_model_spec <- linear_reg() %>% set_engine("lmer")
# create model
mixed_model_fit_tidy <-
mixed_model_spec %>%
fit(Reaction ~ Days + (1 | Subject), data = sleepstudy)
expanded_df_tidy <- with(sleepstudy,
data.frame(
expand.grid(Subject=levels(Subject),
Days=seq(min(Days),max(Days),length=51))))
predicted_df_tidy <- data.frame(expanded_df_tidy,resp=predict(mixed_model_fit_tidy,new_data=expanded_df_tidy))
ggplot(predicted_df_tidy,aes(x=Days,y=.pred,colour=Subject))+
geom_line()
Using the predict() function seems to gives only the fixed effect predictions.
Is there a way to extract the random intercepts from tidymodels and multilevelmod? I know the package is still in development so it might not be possible at this stage.

I think you can work around this as follows:
predicted_df_tidy <- mutate(expanded_df_tidy,
.pred = predict(mixed_model_fit_tidy,
new_data=expanded_df_tidy,
type = "raw", opts=list(re.form=NULL)))
bind_cols() instead of mutate() might be useful in some circumstances?
the issue is that multilevelmod internally sets the default for prediction to re.form = NA; the code above resets it to re.form = NULL (which is the lme4 default, i.e. include all random effects in the prediction)
If you actually want the random intercepts (only) I guess you could predicted_df_tidy %>% filter(Days==0)
PS If you want to be more 'tidy' about this I think you can use purrr::cross_df() in place of expand.grid and pipe the results directly to mutate() ...

Related

plotting an interaction term in moderated regression using MICE imputation

I'm using imputed data to test a series of regression models, including some moderation models.
Imputation
imp_data <- mice(data,m=20,maxit=20,meth='cart',seed=12345)
I then convert this to long format so I can recode / sum variables as needed, beore turning back to mids format
impdatlong_mids<-as.mids(impdat_long)
Example model:
model1 <- with(impdatlong_mids,
lm(Outcome ~ p1_sex + p2 + p3 + p4
+ p5+ p6+ p7+ p8+ p9+ p10
+ p11+ p1_sex*p12+ p1_sex*p13 + p14)
in non-imputed data, to create a graphic representation of the significant ineraction, I'd use (e.g.)
interact_plot (model=model1, pred = p1_sex, modx = p12)
This doesn't work with imputed data / mids objects.
Has anyone plotted an interaction using imputed data, and able to help or share examples?
Thanks
EDIT: Reproducible example
library(tidyverse)
library(interactions)
library(mice)
# library(reprex) does not work with this
set.seed(42)
options(warn=-1)
#---------------------------------------#
# Data preparations
# loading an editing data
d <- mtcars
d <- d %>% mutate_at(c('cyl','am'),factor)
# create missing data and impute it
mi_d <- d
nr_of_NAs <- 30
for (i in 1:nr_of_NAs) {
mi_d[sample(nrow(mi_d),1),sample(ncol(mi_d),1)] <- NA
}
mi_d <- mice(mi_d, m=2, maxit=2)
#---------------------------------------#
# regressions
#not imputed
lm_d <- lm(qsec ~ cyl*am + mpg*disp, data=d)
#imputed dataset
lm_mi <- with(mi_d,lm(qsec ~ cyl*am + mpg*disp))
lm_mi_pool <- pool(lm_mi)
#---------------------------------------#
# interaction plots
# not imputed
#continuous
interactions::interact_plot(lm_d, pred=mpg,modx=disp, interval=T,int.width=0.3)
#categorical
interactions::cat_plot(lm_d, pred = cyl, modx = am)
#---------------------------------------#
# interaction plots
# imputed
#continuous
interactions::interact_plot(lm_mi_pool, pred=mpg,modx=disp, interval=T,int.width=0.3)
# Error in model.frame.default(model) : object is not a matrix
#categorical
interactions::cat_plot(lm_mi_pool, pred = cyl, modx = am)
# Error in model.frame.default(model) : object is not a matrix
The problem seems to be that neither interact_plot, cat_plot or any other available package allows for (at least categorical) interaction plotting with objects of class mipo or pooled regression outputs.
I am using the walking data from the mice package as an example. One way to get the interaction plot (well version of one type of interaction plot) is to use the gtsummary package. Under the hood it will take the model1 use pool() from mice to average over the models and then use a combo of tbl_regression() and plot() to output a plot of the coefficients in the model. The tbl_regression() function is what is calling the pool() function.
library(mice)
library(dplyr)
library(gtsummary)
imp_data <- mice(mice::walking,m=20,maxit=20,meth='cart',seed=12345)
model1 <- with(imp_data,
lm(age ~ sex*YA))
model1 %>%
tbl_regression() %>%
plot()
The package emmeans allows you to extract interaction effects from a mira object. Here is a gentle introduction. After that, the interactions can be plotted with appropriate ggplot. This example is for the categorical variables but could be extended to the continous case - after the emmeans part things get relatively straighforward.
library(ggplot2)
library(ggstance)
library(emmeans)
library(khroma)
library(jtools)
lm_mi <- with(mi_d,lm(qsec ~ gear*carb))
#extracting interaction effects
emcatcat <- emmeans(lm_mi, ~gear*carb)
tidy <- as_tibble(emcatcat)
#plotting
pd <- position_dodge(0.5)
ggplot(tidy, aes(y=gear, x=emmean, colour=carb)) +
geom_linerangeh(aes(xmin=lower.CL, xmax=upper.CL), position=pd,size = 2) +
geom_point(position=pd,size = 4)+
ggtitle('Interactions') +
labs (x = "aggreageted interaction effect") +
scale_color_bright() +
theme_nice()
this can be extended to a three-way interaction plot with facet_grid as long as you have a third categorical interaction term.

Tbl_regression from the gtsummary package for negative binomial regressions

To assess if there is an association between certain groups of patients (patient_group; categorical variable) and a disease (disease_outcome; count variable) I am running negative binomial regression models (due to overdispersion). To check for confounding by other variables I am running 3 models with increasing amounts of covariates.
To display the IRRs and CIs i want to use the tbl_regression function from the package gtsummary (I am using the latest version 1.3.7.9022). However, calling the function returns the IRR and the corresponding 95% CIs non-exponentiated, even though I put exponentiate=TRUE:
# Load packages
library(haven)
library(magrittr)
library(MASS)
library(dplyr)
install.packages("gtsummary")
remotes::install_github("ddsjoberg/gtsummary")
library(gtsummary)
# Load example data.
dat <- read_dta("https://stats.idre.ucla.edu/stat/stata/dae/nb_data.dta")
dat <- within(dat, {
prog <- factor(prog, levels = 1:3, labels = c("General", "Academic", "Vocational"))
id <- factor(id)
})
# Run negative binomial regression and pipe in the tbl_regression function
Model 1 <-
glm.nb(data=dat, formula=daysabs ~ prog) %>%
tbl_regression(exponentiate=TRUE)
Model 1
This returns the summary table, but the regression coefficients have not been exponentiated. Is there a way to get gtsummary to return exponentiated coefficients and CIs?
Thanks!
I was just doing some poking around to see what is going on. The tbl_regression() function uses broom::tidy() in the background. Support for negbin models was just added 7 days ago, but for some reason an exponentiate= argument was not added for this type of model.
I am going to request that it be added. In the meantime, this code should get you up and going with negbin models.
library(gtsummary)
library(tidyverse)
# add a custom tidying function
my_negbin_tidy <- function(x, exponentiate = FALSE, ...) {
df_tidy <- broom::tidy(x, ...)
# exponentiate coef and CI if requested
if (exponentiate) {
df_tidy <-
df_tidy %>%
mutate_at(vars(any_of(c("estimate", "conf.low", "conf.high"))), exp)
}
df_tidy
}
# build model
mod <- MASS::glm.nb(response ~ age, gtsummary::trial)
# summarize model results
tbl <-
tbl_regression(
mod,
exponentiate = TRUE,
tidy_fun = my_negbin_tidy
)
Created on 2021-04-12 by the reprex package (v2.0.0)

Broom::tidy function not working for glm2 object?

I'm trying to 'tidy' up a binary regression (so using a log link not a logit link -> so I get RR estimates not OR) using the broom function 'tidy' on a 'glm2' object. However its giving me an error saying
> tidy(model, conf.int=TRUE, exponentiate=TRUE)
Error: no valid set of coefficients has been found: please supply starting values
Here is a reproducible example of what I mean:
library(tidyverse)
library(glm2)
library(broom)
data(iris)
glimpse(iris)
table(iris$Species)
##create an outcome
df <-iris %>%
mutate(outcome = case_when(Petal.Width>2 ~1,
TRUE ~0))
#fit stardard glm
glm(outcome ~ Sepal.Length+Sepal.Width, data=df,
family = binomial(link="log"))
# -> doesnt converge using a log link due to parameter space issues (common in fitting binary regression).
# go to glm2 to fit the model instead, but need starting values for this:
p0 <- sum(as.numeric(df$outcome))/length(as.numeric(df$outcome))
start.val <- c(log(p0),rep(0,2))
model<-glm2(outcome ~ Sepal.Length+Sepal.Width, data=df,
family = binomial(link="log"),
start = start.val)
##get warnings, but converges
model$converged
##now tidy up and display model
tidy(model, conf.int=TRUE, exponentiate=TRUE)
#error -> wants starting values again? also shows warnings from previous
# (which are now saying model hasnt converged?)
tidy(model, conf.int=TRUE, exponentiate=TRUE, start=start.val)
# doesnt recognise starting values?
Any ideas on how to get tidy to work, or do I just do it manually?

Cluster-Robust Standard Errors in Stargazer

Does anyone know how to get stargazer to display clustered SEs for lm models? (And the corresponding F-test?) If possible, I'd like to follow an approach similar to computing heteroskedasticity-robust SEs with sandwich and popping them into stargazer as in http://jakeruss.com/cheatsheets/stargazer.html#robust-standard-errors-replicating-statas-robust-option.
I'm using lm to get my regression models, and I'm clustering by firm (a factor variable that I'm not including in the regression models). I also have a bunch of NA values, which makes me think multiwayvcov is going to be the best package (see the bottom of landroni's answer here - Double clustered standard errors for panel data - and also https://sites.google.com/site/npgraham1/research/code)? Note that I do not want to use plm.
Edit: I think I found a solution using the multiwayvcov package...
library(lmtest) # load packages
library(multiwayvcov)
data(petersen) # load data
petersen$z <- petersen$y + 0.35 # create new variable
ols1 <- lm(y ~ x, data = petersen) # create models
ols2 <- lm(y ~ x + z, data = petersen)
cl.cov1 <- cluster.vcov(ols1, data$firmid) # cluster-robust SEs for ols1
cl.robust.se.1 <- sqrt(diag(cl.cov1))
cl.wald1 <- waldtest(ols1, vcov = cl.cov1)
cl.cov2 <- cluster.vcov(ols2, data$ticker) # cluster-robust SEs for ols2
cl.robust.se.2 <- sqrt(diag(cl.cov2))
cl.wald2 <- waldtest(ols2, vcov = cl.cov2)
stargazer(ols1, ols2, se=list(cl.robust.se.1, cl.robust.se.2), type = "text") # create table in stargazer
Only downside of this approach is you have to manually re-enter the F-stats from the waldtest() output for each model.
Using the packages lmtest and multiwayvcov causes a lot of unnecessary overhead. The easiest way to compute clustered standard errors in R is the modified summary() function. This function allows you to add an additional parameter, called cluster, to the conventional summary() function. The following post describes how to use this function to compute clustered standard errors in R:
https://economictheoryblog.com/2016/12/13/clustered-standard-errors-in-r/
You can easily the summary function to obtain clustered standard errors and add them to the stargazer output. Based on your example you could simply use the following code:
# estimate models
ols1 <- lm(y ~ x)
# summary with cluster-robust SEs
summary(ols1, cluster="cluster_id")
# create table in stargazer
stargazer(ols1, se=list(coef(summary(ols1,cluster = c("cluster_id")))[, 2]), type = "text")
I would recommend lfe package, which is much more powerful package than lm package. You can easily specify the cluster in the regression model:
ols1 <- felm(y ~ x + z|0|0|firmid, data = petersen)
summary(ols1)
stargazer(OLS1, type="html")
The clustered standard errors will be automatically produced. And stargazer will report the clustered-standard error accordingly.
By the way (allow me to do more marketing), for micro-econometric analysis, felm is highly recommended. You can specify fixed effects and IV easily using felm. The grammar is like:
ols1 <- felm(y ~ x + z|FixedEffect1 + FixedEffect2 | IV | Cluster, data = Data)

How to get the corr(u_i, Xb) for panel data fixed effects regression in R

I am trying to develop a fixed effect regression model for a panel data using the plm package in R. I want to get the correlation between fixed effects and the regressors. Something like the corr(u_i, Xb) that comes in the Stata output.
How to get it in R?
I have tried the following (using the in-built dataset in the plm package):-
data("Grunfeld", package = "plm")
library(plm)
# build the model
gi <- plm(inv ~ value + capital, data = Grunfeld, model = "within")
# extract the fixed effects fixef(gi)
summary(fixef(gi))
fixefs <- fixef(gi)[index(gi, which = "id")] ## get the fixed effects
newdata <- as.data.frame(cbind(fixefs, Grunfeld$value, Grunfeld$capital))
colnames(newdata) <- c("fixed_effects", "value", "capital")
cor(newdata)
EDIT: I asked this question on cross validated first and I got this reply- "Questions that are solely about programming or carrying out an operation within a statistical package are off-topic for this site and may be closed." Since my question has more to do with a operation in a package, so I guess this is the right place!
How about the following considering functions of plm:
# Run the model
gi <- plm(inv ~ value + capital, data = Grunfeld, model = "within")
# Get the residuals (res) and fixed effects (fix)
res = residuals(gi)
fix = fixef(gi)
# Aggregate residuals and fixed effects
newdata = cbind(res, fix)
# Correlation
cor(newdata)
res fix
res 1.00000000 0.05171279
fix 0.05171279 1.00000000

Resources