Stargazer and t-Statistics

Stargazer and t-Statistics - r

Does anyone know how to change to the format of t-stats in stargazer? I tried a bunch of things but haven't had any luck.
I would like the t-statistics shown below the coefficient and in brackets? i.e. drop the "t =" and replace with the t-statistic being shown in inside ( xxxx)
For example:
(1)
Variable 1 0.102
t = 3.494
I would like
(1)
Variable 1 0.102
(3.494)

If you are interested in a non-stargazer option, you might want to try the modelsummary package (disclaimer: I am the author). modelsummary accepts strings enclosed in curly braces in the glue package format, so you can do a bunch of weird things. You can read the details at this link, but here's an example:
library(modelsummary)
models <- list(
lm(hp ~ mpg, data = mtcars),
lm(hp ~ mpg + drat, data = mtcars))
modelsummary(models,
statistic = c(
"statistic",
"conf.int",
"Std. Error: {std.error}{stars}"))

There is no real option to do this with stargazer as the format of t-statistics is hard-coded.
Instead, replace the standard errors with the t-statistics and override p-values so the right stars appear.
I did this for multiple models below, as this is the more general solution (it works for one model too).
library(stargazer)
#>
#> Please cite as:
#> Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
#> R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
models <- list()
models[[1]] <- lm(mpg ~ cyl + disp, data = mtcars)
models[[2]] <- lm(mpg ~ cyl + disp + wt, data = mtcars)
get_ts <- function(fm) {
summary(fm)$coefficients[,3]
}
get_pvals <- function(fm) {
summary(fm)$coefficients[,4]
}
ts <- lapply(models, get_ts)
pvals <- lapply(models, get_pvals)
stargazer(models, type = "text", report=('vc*s'), se = ts, p = pvals)
#>
#> =================================================================
#> Dependent variable:
#> ---------------------------------------------
#> mpg
#> (1) (2)
#> -----------------------------------------------------------------
#> cyl -1.587** -1.785***
#> (-2.230) (-2.940)
#>
#> disp -0.021* 0.007
#> (-2.007) (0.631)
#>
#> wt -3.636***
#> (-3.495)
#>
#> Constant 34.661*** 41.108***
#> (13.609) (14.462)
#>
#> -----------------------------------------------------------------
#> Observations 32 32
#> R2 0.760 0.833
#> Adjusted R2 0.743 0.815
#> Residual Std. Error 3.055 (df = 29) 2.595 (df = 28)
#> F Statistic 45.808*** (df = 2; 29) 46.424*** (df = 3; 28)
#> =================================================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Created on 2021-05-11 by the reprex package (v2.0.0)

Can you give a minimal reproduciable example? I think there is no "t = " as default.
library(stargazer)
model <- lm(mpg ~ cyl + disp, data=mtcars)
stargazer(model, type="text")
===============================================
Dependent variable:
---------------------------
mpg
-----------------------------------------------
cyl -1.587**
(0.712)
disp -0.021*
(0.010)
Constant 34.661***
(2.547)
-----------------------------------------------
Observations 32
R2 0.760
Adjusted R2 0.743
Residual Std. Error 3.055 (df = 29)
F Statistic 45.808*** (df = 2; 29)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01

Related

R - passing object to function based on the NAME of the object

Suppose in R I have multiple GLM objects from multiple glm() function calls.
glm_01
glm_02
...
glm_nn
...and suppose that I want to do all possible pairwise comparisons using a chi-squared or F ANOVA test.
anova(glm_01, glm_02, test = "F")
anova(glm_01, glm_03, test = "F")
anova(glm_01, glm_04, test = "F")
...
I don't want to do this manually because the list of models is quite long. Instead I'd like to grab a list of relevant model objects (anything starting with "glm_") and do all pairwise comparisons automatically. However I'm unsure how to pass the model objects (rather than their names in string form) to the anova() function.
As a simple example:
data(mtcars)
# create some models
glm_01 <- glm(mpg ~ cyl , mtcars, family = gaussian())
glm_02 <- glm(mpg ~ cyl + disp , mtcars, family = gaussian())
glm_03 <- glm(mpg ~ cyl + disp + hp , mtcars, family = gaussian())
glm_04 <- glm(mpg ~ cyl + disp + hp + wt, mtcars, family = gaussian())
# get list of relevant model objects from the R environment
model_list <- ls()
model_list <- model_list[substr(model_list, 1, 4) == "glm_"]
# create a table to store the pairwise ANOVA results
n_models <- length(model_list)
anova_table <- matrix(0, nrow = n_models, ncol = n_models)
# loop through twice and do pairwise comparisons
for(row_index in 1:n_models) {
for(col_index in 1:n_models) {
anova_table[row_index, col_index] <- anova(model_list[row_index], model_list[col_index], test = "F")$'Pr(>F)'[2]
}
}
...but of course this loop at the end doesn't work because I'm not passing model objects to anova(), I'm passing the names of the objects as strings instead. How do I tell anova() to use the object that the string refers to, instead of the string itself?
Thank you.
======================
Possible solution:
data(mtcars)
glm_list <- list()
glm_list$glm_01 <- glm(mpg ~ cyl , mtcars, family = gaussian())
glm_list$glm_02 <- glm(mpg ~ cyl + disp , mtcars, family = gaussian())
glm_list$glm_03 <- glm(mpg ~ cyl + disp + hp , mtcars, family = gaussian())
glm_list$glm_04 <- glm(mpg ~ cyl + disp + hp + wt, mtcars, family = gaussian())
# create a table to store the pairwise ANOVA results
n_models <- length(glm_list)
anova_table <- matrix(0, nrow = n_models, ncol = n_models)
# loop through twice and do pairwise comparisons
row_idx <- 0
col_idx <- 0
for(row_glm in glm_list)
{
row_idx <- row_idx + 1
for(col_glm in glm_list)
{
col_idx <- col_idx + 1
anova_table[row_idx, col_idx] <- anova(row_glm, col_glm, test = "F")$'Pr(>F)'[2]
}
col_idx <- 0
}
row_idx <- 0

The easiest way to do this would be to keep all your models in a list. This makes it simple to iterate over them. For example, you can create all of your models and do a pairwise comparison between all of them like this:
data(mtcars)
f_list <- list(mpg ~ cyl,
mpg ~ cyl + disp,
mpg ~ cyl + disp + hp,
mpg ~ cyl + disp + hp + wt)
all_glms <- lapply(f_list, glm, data = mtcars, family = gaussian)
all_pairs <- as.data.frame(combn(length(all_glms), 2))
result <- lapply(all_pairs, function(i) anova(all_glms[[i[1]]], all_glms[[i[2]]]))
Which gives you:
result
#> $V1
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl
#> Model 2: mpg ~ cyl + disp
#> Resid. Df Resid. Dev Df Deviance
#> 1 30 308.33
#> 2 29 270.74 1 37.594
#>
#> $V2
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl
#> Model 2: mpg ~ cyl + disp + hp
#> Resid. Df Resid. Dev Df Deviance
#> 1 30 308.33
#> 2 28 261.37 2 46.965
#>
#> $V3
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl
#> Model 2: mpg ~ cyl + disp + hp + wt
#> Resid. Df Resid. Dev Df Deviance
#> 1 30 308.33
#> 2 27 170.44 3 137.89
#>
#> $V4
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl + disp
#> Model 2: mpg ~ cyl + disp + hp
#> Resid. Df Resid. Dev Df Deviance
#> 1 29 270.74
#> 2 28 261.37 1 9.3709
#>
#> $V5
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl + disp
#> Model 2: mpg ~ cyl + disp + hp + wt
#> Resid. Df Resid. Dev Df Deviance
#> 1 29 270.74
#> 2 27 170.44 2 100.3
#>
#> $V6
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl + disp + hp
#> Model 2: mpg ~ cyl + disp + hp + wt
#> Resid. Df Resid. Dev Df Deviance
#> 1 28 261.37
#> 2 27 170.44 1 90.925
Created on 2020-08-25 by the reprex package (v0.3.0)

If you want to reference arbitrary objects in an accessible environment by symbol without putting them into a list object, the standard way to return the top object on the search list whose symbol is equal to a string is get(), or the vector equivalent mget(). I.e. get("glm_01") gets you the top object on the search list that has the symbol glm_01. The most minimal modification to your approach would be to wrap your calls to model_list[row_index] and model_list[col_index] in get().
You can be more precise about where to look for objects by assigning the models in a named environment and only getting from that environment (using the envir parameter to get()).

Custom R lm function within the tidyverse?

I have a simple question about creating custom lm functions within the tidyverse framework.
I basically want a function that runs a custom model with me with one free variable.
model <- function(x){
lmer(paste("cyl ~", x, "+ (1|disp)"), data = .)
}
And then I want to use this in dplyr's do
mtcars %>%
do(x = model("hp"))
How should I approach this problem?

You could pass data to the function :
library(dplyr)
library(lme4)
model <- function(data, x){
lmer(paste("cyl ~", x, "+", "(1|disp)"), data = data)
}
and then call it like :
mtcars %>% model('hp')
#Linear mixed model fit by REML ['lmerMod']
#Formula: cyl ~ hp + (1 | disp)
# Data: data
#REML criterion at convergence: 96.2
#Random effects:
# Groups Name Std.Dev.
# disp (Intercept) 0.927
# Residual 0.441
#Number of obs: 32, groups: disp, 27
#Fixed Effects:
#(Intercept) hp
# 3.1866 0.0196
Or
mtcars %>% summarise(mod = list(model(., 'hp')))
# mod
#1 <S4 class ‘lmerMod’ [package “lme4”] with 13 slots>

How to get between and overall R2 from plm FE regression with stargazer?

Disclaimer: This question is extremely related to this one I asked two days ago - but now it relates to the implementation of between and overall R2 in stargazer() output not in summary() as before.
Is there a way to get plm() to calculate between R2 and overall R2 for me and include them in the stargazer() output?
To clarify what I mean with between, overall, and within R2 see this answer on StackExchange.
My understanding is that plm only calculates within R2.
I am running a Twoways effects Within Model.
library(plm)
library(stargazer)
# Create some random data
set.seed(1)
x=rnorm(100); fe=rep(rnorm(10),each=10); id=rep(1:10,each=10); ti=rep(1:10,10); e=rnorm(100)
y=x+fe+e
data=data.frame(y,x,id,ti)
# Get plm within R2
reg=plm(y~x,model="within",index=c("id","ti"), effect = "twoways", data=data)
stargazer(reg)
I now also want to include between and overall R2 in the stargazer() output. How can I do that?
To make it explicit what I mean with between and overall R2:
# Pooled Version (overall R2)
reg1=lm(y~x)
summary(reg1)$r.squared
# Between R2
y.means=tapply(y,id,mean)[id]
x.means=tapply(x,id,mean)[id]
reg2=lm(y.means~x.means)
summary(reg2)$r.squared

To do this in stargazer, you can use the add.lines() argument. However, this adds the lines to the beginning of the summary stats section and there is no way to alter this without messing with the source code, which is beastly. I much prefer huxtable, which provides a grammar of table building and is much more extensible and customizable.
library(tidyverse)
library(plm)
library(huxtable)
# Create some random data
set.seed(1)
x=rnorm(100); fe=rep(rnorm(10),each=10); id=rep(1:10,each=10); ti=rep(1:10,10); e=rnorm(100)
y=x+fe+e
data=data.frame(y,x,id,ti)
# Get plm within R2
reg=plm(y~x,model="within",index=c("id","ti"), effect = "twoways", data=data)
stargazer(reg, type = "text",
add.lines = list(c("Overall R2", round(r.squared(reg, model = "pooled"), 3)),
c("Between R2", round(r.squared(update(reg, effect = "individual", model = "between")), 3))))
#>
#> ========================================
#> Dependent variable:
#> ---------------------------
#> y
#> ----------------------------------------
#> x 1.128***
#> (0.113)
#>
#> ----------------------------------------
#> Overall R2 0.337
#> Between R2 0.174
#> Observations 100
#> R2 0.554
#> Adjusted R2 0.448
#> F Statistic 99.483*** (df = 1; 80)
#> ========================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
# I prefer huxreg, which is much more customizable!
# Create a data frame of the R2 values
r2s <- tibble(
name = c("Overall R2", "Between R2"),
value = c(r.squared(reg, model = "pooled"),
r.squared(update(reg, effect = "individual", model = "between"))))
tab <- huxreg(reg) %>%
# Add new R2 values
add_rows(hux(r2s), after = 4)
# Rename R2
tab[7, 1] <- "Within R2"
tab %>% huxtable::print_screen()
#> ─────────────────────────────────────────────────
#> (1)
#> ─────────────────────────
#> x 1.128 ***
#> (0.113)   
#> ─────────────────────────
#> N 100        
#> Overall R2 0.337    
#> Between R2 0.174    
#> Within R2 0.554    
#> ─────────────────────────────────────────────────
#> *** p < 0.001; ** p < 0.01; * p < 0.05.
#>
#> Column names: names, model1
Created on 2020-04-08 by the reprex package (v0.3.0)

How can I display only significant path lines on a path diagram? [R: lavaan, semPlot]

I would like to make changes to my path diagram that I made with the lavaan and semPlot packages.
require(lavaan); require(semPlot)
head(mtcars)
model <-'
mpg ~ hp + gear + cyl
hp ~ cyl + disp
'
fit <- sem(model, "std", data = mtcars)
semPaths(fit, "std", fade = F, residuals = F)
Because mpg <- gear and mpg <- cyl are not significant, I would like to have it displayed in a transparent way (e.g., adding * to the significant pathlines or preventing from non-significant pathlines from showing up on a path diagram). Is there any way to do that?
Thank you for your support!

I know it's an old thread but I found it while looking for this, and figured I should provide my solution for others.
require(lavaan); require(semPlot) ; require(tidyverse)
#> Loading required package: lavaan
#> This is lavaan 0.6-3
#> lavaan is BETA software! Please report any bugs.
#> Loading required package: semPlot
#> Registered S3 methods overwritten by 'huge':
#> method from
#> plot.sim BDgraph
#> print.sim BDgraph
#> Loading required package: tidyverse
model <-'
mpg ~ hp + gear + cyl
hp ~ cyl + disp
'
fit <- sem(model, "std", data = mtcars)
# got this warning, but simply ignored it.
#> Warning in lav_partable_check(lavpartable, categorical =
#> lavoptions$categorical, : lavaan WARNING: parameter table does not contain
#> thresholds
lavaan::standardizedSolution(fit) %>% dplyr::filter(!is.na(pvalue)) %>% arrange(desc(pvalue)) %>% mutate_if("is.numeric","round",3) %>% select(-ci.lower,-ci.upper,-z)
#> lhs op rhs est.std se pvalue
#> 1 mpg ~ gear 0.022 0.087 0.801
#> 2 mpg ~ cyl -0.166 0.260 0.524
#> 3 mpg ~ hp -0.694 0.242 0.004
#> 4 hp ~~ hp 0.101 0.034 0.003
#> 5 hp ~1 -2.674 0.600 0.000
#> 6 hp ~ disp 0.444 0.094 0.000
#> 7 hp ~ cyl 0.529 0.098 0.000
#> 8 mpg ~1 4.514 0.751 0.000
#> 9 mpg ~~ mpg 0.258 0.039 0.000
pvalue_cutoff <- 0.05
obj <- semPlot:::semPlotModel(fit)
# save a copy of the original, so we can compare it later and be sure we removed only what we intended to remove
original_Pars <- obj#Pars
check_Pars <- obj#Pars %>% dplyr::filter(!(edge %in% c("int","<->") | lhs == rhs)) # this is the list of paramater to sift thru
keep_Pars <- obj#Pars %>% dplyr::filter(edge %in% c("int","<->") | lhs == rhs) # this is the list of paramater to keep asis
test_against <- lavaan::standardizedSolution(fit) %>% dplyr::filter(pvalue < pvalue_cutoff, rhs != lhs)
test_against_rev <- test_against %>% rename(rhs2 = lhs, # for some reason, the rhs and lhs are reversed in the standardizedSolution() output, for some of the values
lhs = rhs) %>% # I'll have to reverse it myself, and test against both orders
rename(rhs = rhs2)
checked_Pars <-
check_Pars %>% semi_join(test_against, by = c("lhs", "rhs")) %>% bind_rows(
check_Pars %>% semi_join(test_against_rev, by = c("lhs", "rhs"))
)
obj#Pars <- keep_Pars %>% bind_rows(checked_Pars)
#let's verify by looking at the list of the edges we removed from the object
anti_join(original_Pars,obj#Pars)
#> Joining, by = c("label", "lhs", "edge", "rhs", "est", "std", "group", "fixed", "par")
#> label lhs edge rhs est std group fixed par
#> 1 gear ~> mpg 0.1582792 0.0218978 FALSE 2
#> 2 cyl ~> mpg -0.4956938 -0.1660012 FALSE 3
# great, let's plot
semPlot::semPaths(obj, "std",fade = F, residuals = F)
Note this is highly tinkered, and the criterion for exclusion should be modified to your needs (especially the (edge %in% c("int","<->") parts)
Created on 2019-07-09 by the reprex package (v0.3.0)
redacted session_info()
#> lavaan * 0.6-3 2018-09-22 [1] CRAN (R 3.6.0)
#> semPlot * 1.1.1 2019-04-05 [1] CRAN (R 3.6.0)
#> tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.6.0)

I have recently discovered the lavaanPlot package, which allows to show the coefficients for a specified significance criteria. The code is:
require(lavaan); require(lavaanPlot)
head(mtcars)
model <-'
mpg ~ hp + gear + cyl
hp ~ cyl + disp
'
fit <- sem(model, "std", data = mtcars)
sem.model <- lavaanPlot(model = fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, sig = 0.05)
The resulting image looks like this:
I believe it can be further customized.

map a vector of characters to lm formula in r

I'm trying to make a list of lm object using purrr::map.
use mtcars as an example:
vars <- c('hp', 'wt', 'disp')
map(vars, ~lm(mpg~.x, data=mtcars))
error:
Error in model.frame.default(formula = mpg ~ .x, data = mtcars, drop.unused.levels = TRUE) : variable lengths differ (found for '.x')
I also tried:
map(vars, function(x) {x=sym(x); lm(mpg~!!x, data=mtcars)})
I got error message:
Error in !x : invalid argument type
Can anyone tell what I did wrong? Thanks in advance.

The usual way is to paste together formulas as strings, convert them by mapping as.formula (you can't make a vector of formulas; it has to be a list), and then map lm. You can combine it all to a single call if you like, but I've come to prefer mapping single functions, which makes code easier to read:
library(purrr)
c('hp', 'wt', 'disp') %>%
paste('mpg ~', .) %>%
map(as.formula) %>%
map(lm, data = mtcars)
#> [[1]]
#>
#> Call:
#> .f(formula = .x[[i]], data = ..1)
#>
#> Coefficients:
#> (Intercept) hp
#> 30.09886 -0.06823
#>
#>
#> [[2]]
#>
#> Call:
#> .f(formula = .x[[i]], data = ..1)
#>
#> Coefficients:
#> (Intercept) wt
#> 37.285 -5.344
#>
#>
#> [[3]]
#>
#> Call:
#> .f(formula = .x[[i]], data = ..1)
#>
#> Coefficients:
#> (Intercept) disp
#> 29.59985 -0.04122
It's actually unnecessary to call map(as.formula) as lm will coerce it into a formula, but not all models are so generous (e.g. mgcv::gam).
A downside of this approach are that the call listed in the object looks funky, but the coefficients tell you which is which easily enough anyway. A useful alternative is to keep the formula as a string in one column of a data.frame and the model in a list column, e.g.
library(tidyverse)
data_frame(formula = paste('mpg ~', c('hp', 'wt', 'disp')),
model = map(formula, lm, data = mtcars))
#> # A tibble: 3 x 2
#> formula model
#> <chr> <list>
#> 1 mpg ~ hp <S3: lm>
#> 2 mpg ~ wt <S3: lm>
#> 3 mpg ~ disp <S3: lm>

The elegant tidyverse approach demonstrated by #alistaire worked well for me until I tried to pass the list column to the stargazer package and received "% Error: Unrecognized object type."
In case it is helpful for anyone else trying to use purrr map and stargazer, this slight modification solved the issue:
models_out <- data_frame(
formula = paste('mpg ~', c('hp', 'wt', 'disp')),
model = map(
.x = formula,
.f = function(x) lm(x, data = mtcars))
)
stargazer(models_out$model, type = 'text')
===========================================================
Dependent variable:
-----------------------------
mpg
(1) (2) (3)
-----------------------------------------------------------
hp -0.068***
(0.010)
wt -5.344***
(0.559)
disp -0.041***
(0.005)
Constant 30.099*** 37.285*** 29.600***
(1.634) (1.878) (1.230)
-----------------------------------------------------------
Observations 32 32 32
R2 0.602 0.753 0.718
Adjusted R2 0.589 0.745 0.709
Residual Std. Error (df = 30) 3.863 3.046 3.251
F Statistic (df = 1; 30) 45.460*** 91.375*** 76.513***
===========================================================
Note: *p<0.1; **p<0.05; ***p<0.01

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Stargazer and t-Statistics - r

Related

R - passing object to function based on the NAME of the object

Custom R lm function within the tidyverse?

How to get between and overall R2 from plm FE regression with stargazer?

How can I display only significant path lines on a path diagram? [R: lavaan, semPlot]

map a vector of characters to lm formula in r

Categories

Resources