I was wondering what would be the best way to calculate and present standardized coefficients using fixest. Here is what I tried using easystats
library(parameters)
library(effectsize)
library(fixest)
m <- lm(rating ~ complaints, data = attitude)
standardize_parameters(m, method="basic")# works
m <- feols(rating ~ complaints, data = attitude)
standardize_parameters(m, method="basic")# Error in stats::model.frame(model)[[1]] : subscript out of bounds
I also tried the modelsummary approach, but it shows unstandardized coefficients with no error.
library(parameters)
library(effectsize)
m <- lm(rating ~ complaints, data = attitude)
modelsummary(m, standardize="refit") # works, coeffs are different
m <- feols(rating ~ complaints, data = attitude)
modelsummary(m, standardize="refit")# doesn't work, coeffs are the same
Any insight or advice on how to elegantly and easily pull standardized coefficients out of fixest estimation results would be greatly appreciated. My goal is to replicate the venerable to use listcoef package in Stata. Many thanks to the authors of the packages mentioned in this post!
Edit: ``` > packageVersion("modelsummary")
[1] ‘1.1.0.9000’
One potential solution is to just manually calculate the standardized coefficients yourself, as [detailed here][1]. As an example, below I scale your predictor and outcome, then calculate the standardized beta coefficient of the only predictor in your model.
#### Scale Predictor and Outcome ####
scale.x <- sd(attitude$complaints)
scale.y <- sd(attitude$rating)
#### Obtain Standardized Coefficients ####
sb <- coef(m)[2] * scale.x / scale.y
sb
Which gives you this (you can ignore the column name, as it is just borrowing it from the original coef vector):
complaints
0.8254176
[1]: https://www.sciencedirect.com/topics/mathematics/standardized-regression-coefficient#:~:text=The%20standardized%20regression%20coefficient%2C%20found,one%20of%20its%20standardized%20units%20(
Related
I am using the mice package and lmer from lme4 for my analyses. However, pool.r.squared() won't work on this output. I am looking for suggestions on how to include the computation of the adjusted R squared in the following workflow.
require(lme4, mice)
imp <- mice(nhanes)
imp2 <- mice::complete(imp, "all") # This step is necessary in my analyses to include other variables/covariates following the multiple imputation
fit <- lapply(imp2, lme4::lmer,
formula = bmi ~ (1|age) + hyp + chl,
REML = T)
est <- pool(fit)
summary(est)
You have two separate problems here.
First, there are several opinions about what an R-squared for multilevel/mixed-model regressions actually is. This is the reason why pool.r.squared does not work for you, as it does not accept results from anything other than lm(). I do not have an answer for you how to calculate something R-squared-ish for your model and since it is a statistics question – not a programming one – I am not going into detail. However, a quick search indicates that for some kinds of multilevel R-squares, there are functions available for R, e.g. mitml::multilevelR2.
Second, in order to pool a statistic across imputation samples, it should be normally distributed. Therefore, you have to transform R-squared into Fisher's Z and back-transform it after pooling. See https://stefvanbuuren.name/fimd/sec-pooling.html
In the following I assume that you have a way (or several options) to calculate your (adjusted) R-squared. Assuming that you use mitl::multilevelR2 and choose the method by LaHuis et al. (2014), you can compute and pool it across your imputations with the following steps:
# what you did before:
imp <- mice::mice(nhanes)
imp2 <- mice::complete(imp, "all")
fit_l <- lapply(imp2, lme4::lmer,
formula = bmi ~ (1|age) + hyp + chl,
REML = T)
# get your R-squareds in a vector (replace `mitl::multilevelR2` with your preferred function for this)
Rsq <- lapply(fit_l, mitml::multilevelR2, print="MVP")
Rsq <- as.double(Rsq)
# convert the R-squareds into Fisher's Z-scores
Zrsq <- 1/2*log( (1+sqrt(Rsq)) / (1-sqrt(Rsq)) )
# get the variance of Fisher's Z (same for all imputation samples)
Var_z <- 1 / (nrow(imp2$`1`)-3)
Var_z <- rep(Var_z, imp$m)
# pool the Zs
Z_pool <- pool.scalar(Zrsq, Var_z, n=imp$n)$qbar
# back-transform pooled Z to Rsquared
Rsq_pool <- ( (exp(2*Z_pool) - 1) / (exp(2*Z_pool) + 1) )^2
Rsq_pool #done
I'm trying to use bootcov to get clustered standard errors for a regression analysis on panel data. In the analysis, I'm including the cluster variable as a fixed effect to address cluster-level confounding. However, including the cluster variable as a fixed effect causes bootcov to throw an error ("Warning message:...fit failure in 200 resamples. Might try increasing maxit"). I imagine this is because the coefficient matrix varies over bootstrap replications depending on which clusters are selected (here's a similar issue and solution in Stata).
Does anyone know a way around this problem? If not, I can try to manually edit the function myself. Unfortunately, I can't use the cluster option in robcov because my analysis actually requires the Glm function rather than the ols function. Furthermore, I want to stick with the rms package because my analysis involves restricted cubic splines, which rms makes easy to visualize, and test via ANOVA (although I'm open to other suggestions).
Thanks for the help. I copied an example below.
#load package
library(rms)
#make df
x <- rnorm(1000)
y <- sample(c(1:100),1000, replace=TRUE)
z <- factor(rep(1:50, 20))
df <- data.frame(y,x,z)
#set datadist
dd <- datadist(df)
options(datadist='dd')
#works when cluster variable isn't included as fixed effect in regression
reg <- ols(x ~ y, df, x=TRUE, y=TRUE)
reg_clus <- bootcov(reg, df$z)
summary(reg_clus)
#doesn't work when cluster variable included as fixed effect in regression
reg2 <- ols(x ~ y + z, df, x=TRUE, y=TRUE)
reg_clus2 <- bootcov(reg2, df$z)
summary(reg_clus2)
Does anyone know how to get stargazer to display clustered SEs for lm models? (And the corresponding F-test?) If possible, I'd like to follow an approach similar to computing heteroskedasticity-robust SEs with sandwich and popping them into stargazer as in http://jakeruss.com/cheatsheets/stargazer.html#robust-standard-errors-replicating-statas-robust-option.
I'm using lm to get my regression models, and I'm clustering by firm (a factor variable that I'm not including in the regression models). I also have a bunch of NA values, which makes me think multiwayvcov is going to be the best package (see the bottom of landroni's answer here - Double clustered standard errors for panel data - and also https://sites.google.com/site/npgraham1/research/code)? Note that I do not want to use plm.
Edit: I think I found a solution using the multiwayvcov package...
library(lmtest) # load packages
library(multiwayvcov)
data(petersen) # load data
petersen$z <- petersen$y + 0.35 # create new variable
ols1 <- lm(y ~ x, data = petersen) # create models
ols2 <- lm(y ~ x + z, data = petersen)
cl.cov1 <- cluster.vcov(ols1, data$firmid) # cluster-robust SEs for ols1
cl.robust.se.1 <- sqrt(diag(cl.cov1))
cl.wald1 <- waldtest(ols1, vcov = cl.cov1)
cl.cov2 <- cluster.vcov(ols2, data$ticker) # cluster-robust SEs for ols2
cl.robust.se.2 <- sqrt(diag(cl.cov2))
cl.wald2 <- waldtest(ols2, vcov = cl.cov2)
stargazer(ols1, ols2, se=list(cl.robust.se.1, cl.robust.se.2), type = "text") # create table in stargazer
Only downside of this approach is you have to manually re-enter the F-stats from the waldtest() output for each model.
Using the packages lmtest and multiwayvcov causes a lot of unnecessary overhead. The easiest way to compute clustered standard errors in R is the modified summary() function. This function allows you to add an additional parameter, called cluster, to the conventional summary() function. The following post describes how to use this function to compute clustered standard errors in R:
https://economictheoryblog.com/2016/12/13/clustered-standard-errors-in-r/
You can easily the summary function to obtain clustered standard errors and add them to the stargazer output. Based on your example you could simply use the following code:
# estimate models
ols1 <- lm(y ~ x)
# summary with cluster-robust SEs
summary(ols1, cluster="cluster_id")
# create table in stargazer
stargazer(ols1, se=list(coef(summary(ols1,cluster = c("cluster_id")))[, 2]), type = "text")
I would recommend lfe package, which is much more powerful package than lm package. You can easily specify the cluster in the regression model:
ols1 <- felm(y ~ x + z|0|0|firmid, data = petersen)
summary(ols1)
stargazer(OLS1, type="html")
The clustered standard errors will be automatically produced. And stargazer will report the clustered-standard error accordingly.
By the way (allow me to do more marketing), for micro-econometric analysis, felm is highly recommended. You can specify fixed effects and IV easily using felm. The grammar is like:
ols1 <- felm(y ~ x + z|FixedEffect1 + FixedEffect2 | IV | Cluster, data = Data)
I can extract the p-values for my slope & intercept from an ols object this way:
library(rms)
m1 <- ols(wt ~ cyl, data= mtcars, x= TRUE, y= TRUE)
coef(summary.lm(m1))
But when I try the same thing with a robcov object, summary.lm gives me the p-values from the original model (m1), not the robcov model:
m2 <- robcov(m1)
m2
coef(summary.lm(m2))
I think this must be related to the Warning from the robcov help page,
Warnings
Adjusted ols fits do not have the corrected standard errors printed
with print.ols. Use sqrt(diag(adjfit$var)) to get this, where adjfit
is the result of robcov.
but I'm not sure how.
Is there a way to extract the p-values from a robcov object? (I'm really only interested in the one for the slope, if that makes a difference...)
Hacking through print.ols and prModFit, I came up with this.
errordf <- m2$df.residual
beta <- m2$coefficients
se <- sqrt(diag(m2$var))
Z <- beta/se
P <- 2 * (1 - pt(abs(Z), errordf))
Change m2 to another robcov model.
Try it for yourself by comparing the results of P to print(m2)
I am using NeweyWest standard errors to correct my lm() / dynlm() output. E.g.:
fit1<-dynlm(depvar~covariate1+covariate2)
coeftest(fit1,vcov=NeweyWest)
Coefficients are displayed the way I´d like to, but unfortunately I loose all the regression output information like R squared, F-Test etc. that is displayed by summary. So I wonder how I can display robust se and all the other stuff in the same summary output.
Is there a way to either get everything in one call or to overwrite the 'old' estimates?
I bet I just missed something badly, but that is really relevant when sweaving the output.
Test example, taken from ?dynlm.
require(dynlm)
require(sandwich)
data("UKDriverDeaths", package = "datasets")
uk <- log10(UKDriverDeaths)
dfm <- dynlm(uk ~ L(uk, 1) + L(uk, 12))
#shows R-squared, etc.
summary(dfm)
#no such information
coeftest(dfm, vcov = NeweyWest)
btw.: same applies for vcovHC
coefficients is just a matrix in the lm (or dynlm) summary object, so all you need to do is unclass the coeftest() output.
library(dynlm)
library(sandwich)
library(lmtest)
temp.lm <- dynlm(runif(100) ~ rnorm(100))
temp.summ <- summary(temp.lm)
temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest))
If you specify the covariance matrix, the F-statistics change and you need to compute it again using waldtest() right? Because
temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest))
only overwrites the coefficients.
F-statistics change but R^2 remains the same .