Regression table with clustered standard errors in R jupyter notebook?

Regression table with clustered standard errors in R jupyter notebook? - r

I'm using export_summs in R to make a regression table, but when I use coeftest to get clustered standard errors, the table no longer reports N or R^2 properly in those columns. The coefficients and standard errors look good, just missing those additional stats. (I'm used to outreg2 in Stata which is much simpler.)
I tried using tidy_override() as suggested in the last example here (https://hughjonesd.github.io/huxtable/huxreg.pdf), no change.
# Reproducible example
datareg <- NULL
datareg$y <- rnorm(1000)
datareg$x <- rnorm(1000)
datareg$cluster_var <- rnorm(1000)
datareg <- data.frame(datareg)
reg0 <- lm(y ~ x
, data = datareg)
reg1 <- coeftest(
lm(y ~ x
, data = datareg)
, vcovCL, cluster = datareg$cluster_var)
export_summs(reg0, reg1,
model.names = c("Basic", "Cluster SE"))
Issues warning and output:

This is a case where the error message is fairly clear: the broom package does not have a glance method for coeftest objects. This is not an accident--the nature of the coeftest object does not allow for broom to calculate model summary statistics. It retains very little information about the original model:
> str(reg1)
'coeftest' num [1:2, 1:4] 0.0483 0.0153 0.0329 0.0341 1.4668 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "(Intercept)" "x"
..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
- attr(*, "method")= chr "t test of coefficients"
- attr(*, "df")= int 998
One option is to use the lm_robust function from the estimatr package. It returns objects with robust standard errors that are amenable to both glance and tidy:
reg2 <- estimatr::lm_robust(y ~ x
, data = datareg)
export_summs(reg0, reg2,
model.names = c("Basic", "Cluster SE"), number_format = NA )
──────────────────────────────────────────────────────────────────
Basic Cluster SE
────────────────────────────────────────────────────
(Intercept) 0.0482678107925753 0.0482678107925755
(0.032842483472098) (0.0329070612421128)
x 0.0152928320138191 0.015292832013819
(0.0333488383365212) (0.034094868727288)
────────────────────────────────────────────────────
N 1000 1000
R2 0.000210664993144995 0.000210665
──────────────────────────────────────────────────────────────────
*** p < 0.001; ** p < 0.01; * p < 0.05.
Column names: names, Basic, Cluster SE

Huxtable author here. This is how to do it with tidy_override:
library(generics)
library(huxtable)
library(jtools)
library(lmtest)
library(sandwich)
datareg <- NULL
datareg$y <- rnorm(1000)
datareg$x <- rnorm(1000)
datareg$cluster_var <- rnorm(1000)
datareg <- data.frame(datareg)
reg0 <- lm(y ~ x, data = datareg)
reg1 <- coeftest(reg0, vcovCL, cluster = datareg$cluster_var)
reg1 <- tidy_override(reg1, glance = list(nobs = 1000L, r.squared = 0.000),
extend = TRUE) # extend = TRUE is important
export_summs(reg0, reg1, model.names = c("Basic", "Cluster SE"))
Which gives:
────────────────────────────────────────────────────
Basic Cluster SE
───────────────────────────────────
(Intercept) -0.01 -0.01
(0.03) (0.03)
x -0.05 -0.05
(0.03) (0.03)
───────────────────────────────────
N 1000 1000
R2 0.00 0.00
────────────────────────────────────────────────────
*** p < 0.001; ** p < 0.01; * p < 0.05.
Column names: names, Basic, Cluster SE
This was fairly tricky and I appreciate your difficulties... I have improved the error reporting in huxreg as a result!

Related

GLMM Random Intercept estimators in lme4

How do you get the random intercept effects estimators from a lme4 result object?
set.seed(247)
# Create Data
n=1000
x = runif(n)
id = rep(NA,n)
for (i in 1:10) {
id_s = (i-1)*100+1
id_e = i*100
id[id_s:id_e] = i
}
effects = rnorm(10)
lp = -0.5+0.5*x + effects[id]
probs = exp(lp)/(1+exp(lp))
Y2 = rbinom(n, 1, probs)
library(lme4)
fit_glmm2 = glmer(Y2 ~ x + (1|id), family = "binomial",control = glmerControl(calc.derivs = FALSE))
I thought maybe they are the u's but there's a slight difference between them:
yy = coef(fit_glmm2) # looking only at the intercept
fit_glmm2#u + fit_glmm2#beta[1]

If you want the random effects, ranef() is the best way to get them:
r <- ranef(fit_glmm2)
str(r)
## List of 1
## $ id:'data.frame': 10 obs. of 1 variable:
## ..$ (Intercept): num [1:10] -0.693 0.297 0.54 -0.467 0.755 ...
## ..- attr(*, "postVar")= num [1, 1, 1:10] 0.0463 0.0385 0.0392 0.0436 0.0409 ...
## - attr(*, "class")= chr "ranef.mer"
raw <- unname(unlist(ranef(fit_glmm2)$id))
identical(raw, fit_glmm2#u*fit_glmm2#theta) ## TRUE
As described in vignette("lmer", package = "lme4"), the #u values are the spherical random effects, i.e. they're iid N(0,1) and need to be transformed to get to the random effects b used in the formula X %*% beta + Z %*% b. In this case (an intercept-only RE), theta corresponds to the standard deviation of the random effect. u*theta won't work for more complicated cases ... in this case you need getME(fit_glmm2, "Lambda") %*% getME(fit_glmm2, "u").
getME(., "b") will also work, but again for more complex models you'll have to work out how the b-vector is split into random intercepts, slopes, different RE terms, etc..

Turns out you can get them by multiplying the u parameter with the theta parameter, or by calling getME(.,"b"):
yy = coef(fit_glmm2) # looking only at the intercept
fit_glmm2#u*fit_glmm2#theta + fit_glmm2#beta[1] # or
# getME(fit_glmm2,"b") + fit_glmm2#beta[1]

Clustered standard errors, stars, and summary statistics in modelsummary for multinom models

I want to create a regression table with modelsummary (amazing package!!!) for multinomial logistic models run with nnet::multinom that includes clustered standard errors, as well as corresponding "significance" stars and summary statistics.
Unfortunately, I cannot do this automatically with the vcov parameter within modelsummary because the sandwich package that modelsummary uses does not support nnet objects.
I was able to calculate robust standard errors with a customized function originally developed by Daina Chiba and modified by Davenport, Soule, Armstrong (available from: https://journals.sagepub.com/doi/suppl/10.1177/0003122410395370/suppl_file/Davenport_online_supplement.pdf).
I was also able to include these standard errors in the modelsummary table instead of the original ones. Yet, neither the "significance" stars nor the model summary statistics adapt to these new standard errors. I think this is because they are calculated via broom::tidy automatically by modelsummary.
I would be thankful for any advice for how to include stars and summary statistics that correspond to the clustered standard errors and respective p-values.
Another smaller question I have is whether there is any easy way of "spreading" the model statistics (e.g. number of observations or R2) such that they center below all response levels of the dependent variable and not just the first level. I am thinking about a multicolumn solution in Latex.
Here is some example code that includes how I calculate the standard errors. (Note, that the calculated clustered SEs are extremely small because they don't make sense with the example mtcars data. The only take-away is that the respective stars should correspond to the new SEs, and they don't).
# load data
dat_multinom <- mtcars
dat_multinom$cyl <- sprintf("Cyl: %s", dat_multinom$cyl)
# run multinomial logit model
mod <- nnet::multinom(cyl ~ mpg + wt + hp, data = dat_multinom, trace = FALSE)
# function to calculate clustered standard errors
mlogit.clust <- function(model,data,variable) {
beta <- c(t(coef(model)))
vcov <- vcov(model)
k <- length(beta)
n <- nrow(data)
max_lev <- length(model$lev)
xmat <- model.matrix(model)
# u is deviance residuals times model.matrix
u <- lapply(2:max_lev, function(x)
residuals(model, type = "response")[, x] * xmat)
u <- do.call(cbind, u)
m <- dim(table(data[,variable]))
u.clust <- matrix(NA, nrow = m, ncol = k)
fc <- factor(data[,variable])
for (i in 1:k) {
u.clust[, i] <- tapply(u[, i], fc, sum)
}
cl.vcov <- vcov %*% ((m / (m - 1)) * t(u.clust) %*% (u.clust)) %*% vcov
return(cl.vcov = cl.vcov)
}
# get coefficients, variance, clustered standard errors, and p values
b <- c(t(coef(mod)))
var <- mlogit.clust(mod,dat_multinom,"am")
se <- sqrt(diag(var))
p <- (1-pnorm(abs(b/se))) * 2
# modelsummary table with clustered standard errors and respective p-values
modelsummary(
mod,
statistic = "({round(se,3)}),[{round(p,3)}]",
shape = statistic ~ response,
stars = c('*' = .1, '**' = .05, '***' = .01)
)
# modelsummary table with original standard errors and respective p-values
modelsummary(
models = list(mod),
statistic = "({std.error}),[{p.value}]",
shape = statistic ~ response,
stars = c('*' = .1, '**' = .05, '***' = .01)
)
This code produces the following tables:
Model 1 / Cyl: 6
Model 1 / Cyl: 8
(Intercept)
22.759*
-6.096***
(0.286),[0]
(0.007),[0]
mpg
-38.699
-46.849
(5.169),[0]
(6.101),[0]
wt
23.196
39.327
(3.18),[0]
(4.434),[0]
hp
6.722
7.493
(0.967),[0]
(1.039),[0]
Num.Obs.
32
R2
1.000
R2 Adj.
0.971
AIC
16.0
BIC
27.7
RMSE
0.00
Note:
^^ * p < 0.1, ** p < 0.05, *** p < 0.01
Model 1 / Cyl: 6
Model 1 / Cyl: 8
(Intercept)
22.759*
-6.096***
(11.652),[0.063]
(0.371),[0.000]
mpg
-38.699
-46.849
(279.421),[0.891]
(448.578),[0.918]
wt
23.196
39.327
(210.902),[0.913]
(521.865),[0.941]
hp
6.722
7.493
(55.739),[0.905]
(72.367),[0.918]
Num.Obs.
32
R2
1.000
R2 Adj.
0.971
AIC
16.0
BIC
27.7
RMSE
0.00
Note:
^^ * p < 0.1, ** p < 0.05, *** p < 0.01

This is not super easy at the moment, I just opened a Github issue to track progress. This should be easy to improve, however, so I expect changes to be published in the next release of the package.
In the meantime, you can install the dev version of modelsummary:
library(remotes)
install_github("vincentarelbundock/modelsummary")
Them, you can use the tidy_custom mechanism described here to override standard errors and p values manually:
library(modelsummary)
tidy_custom.multinom <- function(x, ...) {
b <- coef(x)
var <- mlogit.clust(x, dat_multinom, "am")
out <- data.frame(
term = rep(colnames(b), times = nrow(b)),
response = rep(row.names(b), each = ncol(b)),
estimate = c(t(b)),
std.error = sqrt(diag(var))
)
out$p.value <- (1-pnorm(abs(out$estimate / out$std.error))) * 2
row.names(out) <- NULL
return(out)
}
modelsummary(
mod,
output = "markdown",
shape = term ~ model + response,
stars = TRUE)
Model 1 / Cyl: 6
Model 1 / Cyl: 8
(Intercept)
22.759***
-6.096***
(0.286)
(0.007)
mpg
-38.699***
-46.849***
(5.169)
(6.101)
wt
23.196***
39.327***
(3.180)
(4.434)
hp
6.722***
7.493***
(0.967)
(1.039)
Num.Obs.
32
R2
1.000
R2 Adj.
0.971
AIC
16.0
BIC
27.7
RMSE
0.00

Linear Regression Model based on Log-Cosh Loss Function in R

I have read about loss functions theoretically and also how to build regression models based on them in R.
I can apply all of the regression models based on different loss functions in R programming except for Log-Cosh Loss Function.
For example, I would like to build a linear regression model on 5-folds subsets of the DATA, and then extract the coefficients and calculate the individuals and the aggregated variance as follows.
data = read.csv("train.csv") # "critical_temp" is the dependent variable.
data_nom_df=as.data.frame(scale(data))#Normalization
#Cross Validation
set.seed(12345)
k = 5
folds <- createFolds(data_nom_df$critical_temp, k = k, list = TRUE, returnTrain = TRUE)
## Ordinary Least Square regression
#block A
lm = list()
for (i in 1:k) {
lm[[i]] = lm(critical_temp~ .,
data = data_nom_df[folds[[i]],])
}
#block B
lm_coef = list()
lm_coef_var = list()
for(j in 1:(lm[[1]]$coefficients %>% length())){
for(i in 1:k){
lm_coef[[i]] = lm[[i]]$coefficients[j]
lm_coef_var[[j]] = lm_coef %>% unlist() %>% var()
}
}
#block C
lm_var = unlist(lm_coef_var)
lm_df = cbind(coefficients = lm[[1]]$coefficients %>% names() %>% as.data.frame()
, variance = lm_var %>% as.data.frame())
colnames(lm_df) = c("coefficients", "variance_lm")
lm_df
#block D
lm_var_sum = sum(lm_var)
lm_var_sum
The same for the rest of the regression models. However, I do not find any code or package to apply a regression model based on Log-Cosh Loss Function in R.
Could you please guide me to any source that would help me to solve this problem.

This can be done from first principles. Also note the existence of the logcosh function in the limma package which could be used in place of log(cosh(.)) if you have numeric difficulties.
f <- function(b) with(cars, sum(log(cosh(dist - b[1] - b[2] * speed))))
fm0 <- lm(dist ~ speed, cars)
res <- optim(coef(fm0), f, method = "BFGS")
str(res)
## List of 5
## $ par : Named num [1:2] -12.82 3.47
## ..- attr(*, "names")= chr [1:2] "(Intercept)" "speed"
## $ value : num 532
## $ counts : Named int [1:2] 28 10
## ..- attr(*, "names")= chr [1:2] "function" "gradient"
## $ convergence: int 0
## $ message : NULL
Graphics
# the black line is the ordinary least squares regression line and
# the red line is the log cosh regression line
plot(cars)
abline(fm0)
yfit <- res$par[1] + res$par[2] * cars$speed
lines(cars$speed, yfit, col = "red")
ADDED
Note that the optimization can also be written like this which may be useful if you have many independent variables.
fm0 <- lm(dist ~ speed, cars)
X <- model.matrix(fm0)
f <- function(b) with(cars, sum(log(cosh(dist - X %*% b))))
res <- optim(coef(fm0), f, method = "BFGS")
res
giving:
$par
(Intercept) speed
-12.816190 3.469536
$value
[1] 531.5872
$counts
function gradient
28 10
$convergence
[1] 0
$message
NULL

Confidence intervals with clustered standard errors and texreg?

I'm trying to reproduce the 95% CI that Stata produces when you run a model with clustered standard errors. For example:
regress api00 acs_k3 acs_46 full enroll, cluster(dnum)
Regression with robust standard errors Number of obs = 395
F( 4, 36) = 31.18
Prob > F = 0.0000
R-squared = 0.3849
Number of clusters (dnum) = 37 Root MSE = 112.20
------------------------------------------------------------------------------
| Robust
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
acs_k3 | 6.954381 6.901117 1.008 0.320 -7.041734 20.9505
acs_46 | 5.966015 2.531075 2.357 0.024 .8327565 11.09927
full | 4.668221 .7034641 6.636 0.000 3.24153 6.094913
enroll | -.1059909 .0429478 -2.468 0.018 -.1930931 -.0188888
_cons | -5.200407 121.7856 -0.043 0.966 -252.193 241.7922
------------------------------------------------------------------------------
I am able to reproduce the coefficients and the standard errors:
library(readstata13)
library(texreg)
library(sandwich)
library(lmtest)
clustered.se <- function(model_result, data, cluster) {
model_variables <-
intersect(colnames(data), c(colnames(model_result$model), cluster))
model_rows <- rownames(model_result$model)
data <- data[model_rows, model_variables]
cl <- data[[cluster]]
M <- length(unique(cl))
N <- nrow(data)
K <- model_result$rank
dfc <- (M / (M - 1)) * ((N - 1) / (N - K))
uj <-
apply(estfun(model_result), 2, function(x)
tapply(x, cl, sum))
vcovCL <- dfc * sandwich(model_result, meat = crossprod(uj) / N)
standard.errors <- coeftest(model_result, vcov. = vcovCL)[, 2]
p.values <- coeftest(model_result, vcov. = vcovCL)[, 4]
clustered.se <-
list(vcovCL = vcovCL,
standard.errors = standard.errors,
p.values = p.values)
return(clustered.se)
}
elemapi2 <- read.dta13(file = 'elemapi2.dta')
lm1 <-
lm(formula = api00 ~ acs_k3 + acs_46 + full + enroll,
data = elemapi2)
clustered_se <-
clustered.se(model_result = lm1,
data = elemapi2,
cluster = "dnum")
htmlreg(
lm1,
override.se = clustered_se$standard.errors,
override.p = clustered_se$p.value,
star.symbol = "\\*",
digits = 7
)
=============================
Model 1
-----------------------------
(Intercept) -5.2004067
(121.7855938)
acs_k3 6.9543811
(6.9011174)
acs_46 5.9660147 *
(2.5310751)
full 4.6682211 ***
(0.7034641)
enroll -0.1059909 *
(0.0429478)
-----------------------------
R^2 0.3848830
Adj. R^2 0.3785741
Num. obs. 395
RMSE 112.1983218
=============================
*** p < 0.001, ** p < 0.01, * p < 0.05
Alas, I cannot reproduce the 95% confidence Interval:
screenreg(
lm1,
override.se = clustered_se$standard.errors,
override.p = clustered_se$p.value,
digits = 7,
ci.force = TRUE
)
========================================
Model 1
----------------------------------------
(Intercept) -5.2004067
[-243.8957845; 233.4949710]
acs_k3 6.9543811
[ -6.5715605; 20.4803228]
acs_46 5.9660147 *
[ 1.0051987; 10.9268307]
full 4.6682211 *
[ 3.2894567; 6.0469855]
enroll -0.1059909 *
[ -0.1901670; -0.0218148]
----------------------------------------
R^2 0.3848830
Adj. R^2 0.3785741
Num. obs. 395
RMSE 112.1983218
========================================
* 0 outside the confidence interval
If I do it 'by hand', I get the same thing than with texreg:
level <- 0.95
a <- 1-(1 - level)/2
coeff <- lm1$coefficients
se <- clustered_se$standard.errors
lb <- coeff - qnorm(a)*se
ub <- coeff + qnorm(a)*se
> lb
(Intercept) acs_k3 acs_46 full enroll
-243.895784 -6.571560 1.005199 3.289457 -0.190167
> ub
(Intercept) acs_k3 acs_46 full enroll
233.49497100 20.48032276 10.92683074 6.04698550 -0.02181481
What is Stata doing and how can I reproduce it in R?
PS: This is a follow up question.
PS2: The Stata data is available here.

It looks like Stata is using confidence intervals based on t(36) rather than Z (i.e. Normal errors).
Taking the values from the Stata output
coef=6.954381; rse= 6.901117 ; lwr= -7.041734; upr= 20.9505
(upr-coef)/rse
## [1] 2.028095
(lwr-coef)/rse
## [1] -2.028094
Computing/cross-checking the tail values for t(36):
pt(2.028094,36)
## [1] 0.975
qt(0.975,36)
## [1] 2.028094
I don't know how you pass confidence intervals to texreg. Since you haven't given a reproducible example (I don't have elemapi2.dta) I can't say exactly how you would get the df, but it looks like you would want tdf <- length(unique(elemapi2$dnum))-1
level <- 0.95
a <- 1- (1 - level)/2
bounds <- coef(lm1) + c(-1,1)*clustered_se*qt(a,tdf)

Indeed Stata is using the t distribution rather than the normal distribution. There is now a really easy solution to getting confidence intervals that match Stata into texreg using lm_robust from the estimatr package, which you can install from CRAN install.packages(estimatr).
> library(estimatr)
> lmro <- lm_robust(mpg ~ hp, data = mtcars, clusters = cyl, se_type = "stata")
> screenreg(lmro)
===========================
Model 1
---------------------------
(Intercept) 30.10 *
[13.48; 46.72]
hp -0.07
[-0.15; 0.01]
---------------------------
R^2 0.60
Adj. R^2 0.59
Num. obs. 32
RMSE 3.86
===========================
* 0 outside the confidence interval

Clustered standard errors with texreg?

I'm trying to reproduce this stata example and move from stargazer to texreg. The data is available here.
To run the regression and get the se I run this code:
library(readstata13)
library(sandwich)
cluster_se <- function(model_result, data, cluster){
model_variables <- intersect(colnames(data), c(colnames(model_result$model), cluster))
model_rows <- as.integer(rownames(model_result$model))
data <- data[model_rows, model_variables]
cl <- data[[cluster]]
M <- length(unique(cl))
N <- nrow(data)
K <- model_result$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(model_result), 2, function(x) tapply(x, cl, sum));
vcovCL <- dfc*sandwich(model_result, meat=crossprod(uj)/N)
sqrt(diag(vcovCL))
}
elemapi2 <- read.dta13(file = 'elemapi2.dta')
lm1 <- lm(formula = api00 ~ acs_k3 + acs_46 + full + enroll, data = elemapi2)
se.lm1 <- cluster_se(model_result = lm1, data = elemapi2, cluster = "dnum")
stargazer::stargazer(lm1, type = "text", style = "aer", se = list(se.lm1))
==========================================================
api00
----------------------------------------------------------
acs_k3 6.954
(6.901)
acs_46 5.966**
(2.531)
full 4.668***
(0.703)
enroll -0.106**
(0.043)
Constant -5.200
(121.786)
Observations 395
R2 0.385
Adjusted R2 0.379
Residual Std. Error 112.198 (df = 390)
F Statistic 61.006*** (df = 4; 390)
----------------------------------------------------------
Notes: ***Significant at the 1 percent level.
**Significant at the 5 percent level.
*Significant at the 10 percent level.
texreg produces this:
texreg::screenreg(lm1, override.se=list(se.lm1))
========================
Model 1
------------------------
(Intercept) -5.20
(121.79)
acs_k3 6.95
(6.90)
acs_46 5.97 ***
(2.53)
full 4.67 ***
(0.70)
enroll -0.11 ***
(0.04)
------------------------
R^2 0.38
Adj. R^2 0.38
Num. obs. 395
RMSE 112.20
========================
How can I fix the p-values?

Robust Standard Errors with texreg are easy: just pass the coeftest directly!
This has become much easier since the question was last answered: it appears you can now just pass the coeftest with the desired variance-covariance matrix directly. Downside: you lose the goodness of fit statistics (such as R^2 and number of observations), but depending on your needs, this may not be a big problem
How to include robust standard errors with texreg
> screenreg(list(reg1, coeftest(reg1,vcov = vcovHC(reg1, 'HC1'))),
custom.model.names = c('Standard Standard Errors', 'Robust Standard Errors'))
=============================================================
Standard Standard Errors Robust Standard Errors
-------------------------------------------------------------
(Intercept) -192.89 *** -192.89 *
(55.59) (75.38)
x 2.84 ** 2.84 **
(0.96) (1.04)
-------------------------------------------------------------
R^2 0.08
Adj. R^2 0.07
Num. obs. 100
RMSE 275.88
=============================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
To generate this example, I created a dataframe with heteroscedasticity, see below for full runnable sample code:
require(sandwich);
require(texreg);
set.seed(1234)
df <- data.frame(x = 1:100);
df$y <- 1 + 0.5*df$x + 5*100:1*rnorm(100)
reg1 <- lm(y ~ x, data = df)

First, notice that your usage of as.integer is dangerous and likely to cause problems once you use data with non-numeric rownames. For instance, using the built-in dataset mtcars whose rownames consist of car names, your function will coerce all rownames to NA, and your function will not work.
To your actual question, you can provide custom p-values to texreg, which means that you need to compute the corresponding p-values. To achieve this, you could compute the variance-covariance matrix, compute the test-statistics, and then compute the p-value manually, or you just compute the variance-covariance matrix and supply it to e.g. coeftest. Then you can extract the standard errors and p-values from there. Since I am unwilling to download any data, I use the mtcars-data for the following:
library(sandwich)
library(lmtest)
library(texreg)
cluster_se <- function(model_result, data, cluster){
model_variables <- intersect(colnames(data), c(colnames(model_result$model), cluster))
model_rows <- rownames(model_result$model) # changed to be able to work with mtcars, not tested with other data
data <- data[model_rows, model_variables]
cl <- data[[cluster]]
M <- length(unique(cl))
N <- nrow(data)
K <- model_result$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(model_result), 2, function(x) tapply(x, cl, sum));
vcovCL <- dfc*sandwich(model_result, meat=crossprod(uj)/N)
}
lm1 <- lm(formula = mpg ~ cyl + disp, data = mtcars)
vcov.lm1 <- cluster_se(model_result = lm1, data = mtcars, cluster = "carb")
standard.errors <- coeftest(lm1, vcov. = vcov.lm1)[,2]
p.values <- coeftest(lm1, vcov. = vcov.lm1)[,4]
texreg::screenreg(lm1, override.se=standard.errors, override.p = p.values)
And just for completeness sake, let's do it manually:
t.stats <- abs(coefficients(lm1) / sqrt(diag(vcov.lm1)))
t.stats
(Intercept) cyl disp
38.681699 5.365107 3.745143
These are your t-statistics using the cluster-robust standard errors. The degree of freedom is stored in lm1$df.residual, and using the built in functions for the t-distribution (see e.g. ?pt), we get:
manual.p <- 2*pt(-t.stats, df=lm1$df.residual)
manual.p
(Intercept) cyl disp
1.648628e-26 9.197470e-06 7.954759e-04
Here, pt is the distribution function, and we want to compute the probability of observing a statistic at least as extreme as the one we observe. Since we testing two-sided and it is a symmetric density, we first take the left extreme using the negative value, and then double it. This is identical to using 2*(1-pt(t.stats, df=lm1$df.residual)). Now, just to check that this yields the same result as before:
all.equal(p.values, manual.p)
[1] TRUE

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Regression table with clustered standard errors in R jupyter notebook? - r

Related

GLMM Random Intercept estimators in lme4

Clustered standard errors, stars, and summary statistics in modelsummary for multinom models

Linear Regression Model based on Log-Cosh Loss Function in R

Confidence intervals with clustered standard errors and texreg?

Clustered standard errors with texreg?

Categories

Resources