r- MICE package getting standardized betas from lm - r

I have a dataset that is missing values. I imputed using the mice package and ran my linear model using lm and pool for the results. I only get unstandardized beta weights. Is there a way to get standardized beta weights?

There are two ways in which you can do so (which I know of), there can be many:
1) First method:
You need to first scale your data, so assume you imputed your data first then you can do as following:
A toy example:
mtcars1 <- mtcars[,c("mpg", "disp", "hp", "wt", "qsec", "drat")]
mtcars_scaled <- data.frame(sapply(mtcars1, scale), stringsAsFactors=F) ##scaling for standardization,
model_fit_st <- lm(mpg ~ disp + wt + drat, data=mtcars_scaled)
Here model_fit_st is your standardized result but it does however having the intercept(which is kind of odd, the reason being that we supplied it using lm, it will generate an intercept), however if you compare it with QuantPsyc::lm.beta function coefficients value will match.
2) Second Method:
Here QuantPsyc::lm.beta can be used once you install QuantPsyc package which is for generating standardized betas like below.
QuantPsyc::lm.beta(lm(mpg ~ disp + wt + drat, data=mtcars))
Off-course apart from intercept(there is no sense of having intercept in standardized betas) both the results (via scaling and quantpsyc outcome) is matching here.

Related

fixest::feols and ggeffects::ggeffect not working together in R

I'm having a hard time getting a fixest object to play nicely with ggeffects in R, when fixed effects are included.
When I run the following code:
m <- feols(mpg ~ disp + gear + hp | cyl, mtcars,
cluster = c("am", "cyl"))
summary(m)
marg1 <- ggeffect(m, terms = c("disp"))
I get an error reading:
Can't compute marginal effects, 'effects::Effect()' returned an error.
Reason: non-conformable arguments
You may try 'ggpredict()' or 'ggemmeans()'.
However, there are no problems when I remove the fixed effects term / include it without using the pipe:
m <- feols(mpg ~ disp + gear + hp + cyl, mtcars,
cluster = c("am", "cyl"))
summary(m)
marg1 <- ggeffect(m, terms = c("disp"))
ggpredict also returns an error on my data (Could not compute variance-covariance matrix of predictions. No confidence intervals are returned.) but I am unable to replicate that same error using the toy data.

How do I combine fitted models on imputed data into a usable model for new predictions?

I'm performing predictive analysis where I train a model to a portion of my data and test the model with the remaining portion. I'm familiar with the MICE package and the imputation procedure using predictive mean matching.
My understanding is that the proper way to utilize imputation is to create numerous imputed data sets, fit a model to each of those imputed data sets, then combine the coefficients across all of those fitted models into one single model. I know how to do this and view the summary of the coefficients with which I can perform inference on the variables. However, that is not my objective; I need to end up with a single model that I can use to predict new values.
Simply put, when I try to use the predict function with this model I got from using MICE, it doesn't work.
Any suggestions? I am coding this in R.
Edit: using the airquality data set as an example, my code looks like this:
imputed_data <- mice(airquality, method = c(rep("pmm", 6)), m = 5, maxit = 5)
model <- with(imputed_data, lm(Ozone ~ Solar.R + Wind + Temp + Month + Day))
pooled_model <- pool(model)
This gives me a pooled model across my 5 imputed data sets. However, I am unable to use the predict function with this model. When I then execute:
predict(pooled_model, newdata = airquality)
I get this error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('mira', 'matrix')"
Not sure exactly what you're looking for, but something like this might work:
library(mice)
library(mitools)
data(mtcars)
mtcars$qsec[c(4,6,8,21)] <- NA
imps <- mice(mtcars, m=10)
comps <- lapply(1:imps$m, function(i)complete(imps, i))
mods <- lapply(comps, function(x)lm(qsec ~hp + drat + wt, data=x))
pmod <- MIcombine(mods)
pmod$coefficients
#> (Intercept) hp drat wt
#> 18.15389098 -0.02570887 0.11434023 0.92348390
newvals <- data.frame(hp=300, drat=4, wt=2.58)
X <- model.matrix(~hp + drat + wt, data=newvals)
preds <- X %*% pmod$coefficients
preds
#> [,1]
#> 1 13.28118
Created on 2023-02-01 by the reprex package (v2.0.1)

make R report adjusted R squared and F-test in output with robust standard errors

I have estimated a linear regression model using lm(x~y1 + y1 + ... + yn) and to counter the present heteroscedasticity I had R estimate the robust standard errors with
coeftest(model, vcov = vcovHC(model, type = "HC0"))
I know that (robust) R squared and F statistic from the "normal" model are still valid, but how do I get R to report them in the output? I want to fuse several regression output from different specifications together with stargazer and it would become very chaotic if I had to enter the non-robust model along just to get these statistics. Ideally I want to enter a regression output into stargazer that contains these statistics, thus importing it to their framework.
Thanks in advance for all answers
I don't have a solution with stargarzer, but I do have a couple of viable alternatives for regression tables with robust standard errors:
Option 1
Use the modelsummary package to make your tables.
it has a statistic_override argument which allows you to supply a function that calculates a robust variance covariance matrix (e.g., sandwich::vcovHC.
library(modelsummary)
library(sandwich)
mod1 <- lm(drat ~ mpg, mtcars)
mod2 <- lm(drat ~ mpg + vs, mtcars)
mod3 <- lm(drat ~ mpg + vs + hp, mtcars)
models <- list(mod1, mod2, mod3)
modelsummary(models, statistic_override = vcovHC)
Note 1: The screenshot above is from an HTML table, but the modelsummary package can also save Word, LaTeX or markdown tables.
Note 2: I am the author of this package, so please treat this as a potentially biased view.
Option 2
Use the estimatr::lm_robust function, which automatically includes robust standard errors. I believe that estimatr is supported by stargazer, but I know that it is supported by modelsummary.
library(estimatr)
mod1 <- lm_robust(drat ~ mpg, mtcars)
mod2 <- lm_robust(drat ~ mpg + vs, mtcars)
mod3 <- lm_robust(drat ~ mpg + vs + hp, mtcars)
models <- list(mod1, mod2, mod3)
modelsummary(models)
This is how to go about it. You need to use model object that is supported by stargazer as a template and then you can provide a list with standard errors to be used:
library(dplyr)
library(lmtest)
library(stargazer)
# Basic Model ---------------------------------------------------------------------------------
model1 <- lm(hp ~ factor(gear) + qsec + cyl + factor(am), data = mtcars)
summary(model1)
# Robust standard Errors ----------------------------------------------------------------------
model_robust <- coeftest(model1, vcov = vcovHC(model1, type = "HC0"))
# Get robust standard Errors (sqrt of diagonal element of variance-covariance matrix)
se = vcovHC(model1, type = "HC0") %>% diag() %>% sqrt()
stargazer(model1, model1,
se = list(NULL, se), type = 'text')
Using this approach you can use stargazer even for model objects that are not supported. You only need coefficients, standard errors and p-values as vectors. Then you can 'mechanically insert' even unsupported models.
One last Note. You are correct that once heteroskedasticity is present, Rsquared can still be used. However, overall F-test as well as t-tests are NOT valid anymore.

The ways to construct columns associated with interaction terms in data frame

I have a data frame with 6 columns,
dat<-data.frame(x1,x2,x3,x4,x5,x6)
Right now, I need to build two extra columns associated with interaction terms, x1*x2 and x3*x4*x5 How to do that in R. Are there any special consideration when some of them, such as x2 is categorical?
I guess the function model.matrix does exactly what you want.
For instance, you can fit a linear model including the variables and interaction terms you're interested in and then extract the model matrix from that fitted object
model.matrix(lm(drat ~ mpg * cyl + disp * hp * wt, data = mtcars))
Factors need to be explicitly coded as factors, find an example below
mtcars$cyl <- factor(mtcars$cyl)
model.matrix(lm(drat ~ mpg * cyl + disp * hp * wt, data = mtcars))
The default kind of contrasts used for factors is treatment coding. You can easily change this to sum coding (or other codings: ?contr.sum) by using the command below
contrasts(mtcars$cyl) <- contr.sum

Plot each predictor variable from multivariate GLM versus response (other predictors held constant)

I can plot one predictor variable (from a mulitvariate logistic, binomial GLM) versus the predicted response. I do it like this:
m3 <- mtcars # example with mtcars
model = glm(vs~cyl+mpg+wt+disp+drat,family=binomial, data=m3)
newdata <- m3
newdata$cyl <- mean(m3$cyl)
newdata$mpg <- mean(m3$mpg)
newdata$wt <- mean(m3$wt)
newdata$disp <- mean(m3$disp)
newdata$drat <- m3$drat
newdata$vs <- predict(model, newdata = newdata, type = "response")
ggplot(newdata, aes(x = drat, y = vs)) + geom_line()
Above, drat vs vs with all other predictors held constant. However, I would to do this for each of the predictor variables, and doing the above process each time seems tedious. Is there a smarter way to do this? I'd like to visualize the response of each the different predictors and eventually, perhaps, at different constants.
Check the response.plot2 function in the biomod2 package. It was developed to create response curves for species distribution models but it essentially does what you need- it generates a multi pannel plot with responses for each variable used in your model. It also outputs the data into a data structure that can then be used to plot in whichever way you like.

Resources