I want to create a gls regression that includes the value R squared and observations where the values "log likelihood" etc. are. The p values should be below the coefficients in the table. Here is an example of a code:
`
# import the necessary packages
library(nlme)
library(dplyr)
library(stargazer)
# create a new subset that only includes observations with a value in the "Price.Book.Value" column
dotcom_subset_MBV <- dotcom_subset %>% filter(!is.na(Price.Book.Value))
financial_subset_MBV <- financial_subset %>% filter(!is.na(Price.Book.Value))
covid_subset_MBV <- covid_subset %>% filter(!is.na(Price.Book.Value))
# Hypothesis 2: Fit GLS models
dotcom_model_MBV <- gls(X1.Month.Equity.Premium ~ crisis*Price.Book.Value, data = dotcom_subset_MBV, method = "ML")
financial_model_MBV <- gls(X1.Month.Equity.Premium ~ crisis*Price.Book.Value, data = financial_subset_MBV, method = "ML")
covid_model_MBV <- gls(X1.Month.Equity.Premium ~ crisis*Price.Book.Value, data = covid_subset_MBV, method = "ML")
library(stargazer)
stargazer(dotcom_model_MBV, financial_model_MBV, covid_model_MBV, type = "text",column.labels = c("Dotcom","Financial","Covid"),report=('vc*p'))
The only problem with the code above is that it shows the Log Likelihood, Akaike Inf. Crit. and Bayesian Inf. Crit. instead of the R squared values. The rest would be okay.
I tried the following:
omit.stat = c("ll", "AIC", "BIC")
and it works. However, it still doesn't show me the R squared. Then I tried:
add.lines = list(c(paste0("R-squared = ", round(r2_dotcom, 2)
and it includes a line that is called "R Squared" but without any values.
Here's a working example using the mtcars data:
data(mtcars)
library(nlme)
library(stargazer)
#>
#> Please cite as:
#> Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
#> R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
m1 <- gls(qsec ~ cyl + wt, data=mtcars)
m2 <- gls(mpg ~ cyl + wt, data=mtcars)
r2 <- c(cor(fitted(m1), mtcars$qsec)^2,
cor(fitted(m2), mtcars$mpg)^2)
stargazer(m1, m2,
type="text",
omit.stat = c("ll", "AIC", "BIC"),
add.lines = list(c("R-squared", sprintf("%.2f", r2))))
#>
#> =========================================
#> Dependent variable:
#> ----------------------------
#> qsec mpg
#> (1) (2)
#> -----------------------------------------
#> cyl -1.173*** -1.508***
#> (0.197) (0.415)
#>
#> wt 1.356*** -3.191***
#> (0.360) (0.757)
#>
#> Constant 20.743*** 39.686***
#> (0.815) (1.715)
#>
#> -----------------------------------------
#> R-squared 0.56 0.83
#> Observations 32 32
#> =========================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Created on 2023-02-01 by the reprex package (v2.0.1)
The add.lines option expects a list and each vector in the list will be appended to the table with a different element in each column. So you would need the vector of values to be "R-squared" and then each of the r-squared values printed in a string (that's what sprintf() does).
Also, note, that I've calculated the r-squared as the squared correlation between the observed and fitted values, but make no claim that this is statistically sound (though it is one way of calculating R-squared in the OLS model).
Related
I would like to display the t value and p value from my regression output using the stargazer package. So far, I've found ways to show one or the other. I did find something online that displays both; however, it omits the coefficient names. Is there a function or something else that will enable me to show all three?
library(stargazer)
data("cars")
model <- lm(hp ~ wt, mtcars)
stargazer(model, type='text',report = "csp")
Shows p-values, but no beta coefficient names
shows beta coefficient names, but no pvalues. I would like to keep all three: coefficient names, t stat, and p-value
edit
removed a function used on the website that I did not use
You could mention it by using t in the report argument:
a character string containing only elements of "v", "c", "s","t", "p",
"" that determines whether, and in which order, variable names ("v"),
coefficients ("c"), standard errors/confidence intervals ("s"), test
statistics ("t") and p-values ("p") should be reported in regression
tables. If one of the aforementioned letters is followed by an
asterisk (""), significance stars will be reported next to the
corresponding statistic.
Here is a reproducible example:
library(stargazer)
data("cars")
model <- lm(hp ~ wt, mtcars)
stargazer(model, type='text', report = "vcstp")
#>
#> ===============================================
#> Dependent variable:
#> ---------------------------
#> hp
#> -----------------------------------------------
#> wt 46.160
#> (9.625)
#> t = 4.796
#> p = 0.00005
#>
#> Constant -1.821
#> (32.325)
#> t = -0.056
#> p = 0.956
#>
#> -----------------------------------------------
#> Observations 32
#> R2 0.434
#> Adjusted R2 0.415
#> Residual Std. Error 52.437 (df = 30)
#> F Statistic 22.999*** (df = 1; 30)
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Created on 2022-11-12 with reprex v2.0.2
I'd like to create an LM with HC3 corrected standard errors and a fixest model with cluster robust standard errors in the same table.
see my MRE below:
df <- mtcars
models <- list()
models[[1]] <- lm(cyl~disp, data = df)
models[[2]] <- feols(cyl~disp|as.factor(gear), data = df)
library(modelsummary)
# this works
modelsummary::modelsummary(models)
# but these do not
modelsummary::modelsummary(models, vcov = c("HC3", "cluster"))
modelsummary::modelsummary(models, vcov = c(HC3, cluster))
modelsummary::modelsummary(models, vcov = list(HC3, cluster))
modelsummary::modelsummary(models, vcov = list(vcovHC, cluster))
modelsummary::modelsummary(models, vcov = list(vcovHC, vcovHC))
modelsummary::modelsummary(models, vcov = c(vcovHC, vcovHC))
Okay--I figured out a hack, but still leaving question open in case someone finds out a more slick solution.
df <- mtcars
models <- list()
fit <- lm(cyl~disp, data = df)
models[[1]] <- coeftest(fit, vcovHC(fit, type = "HC3"))
models[[2]] <- summary(feols(cyl~disp|as.factor(gear), data = df), "cluster")
library(modelsummary)
# this works
modelsummary::modelsummary(models)
By default fixest automatically computes cluster-robust standard errors when you include fixed effects. If you set vcov=NULL, modelsummary will return the default standard errors, which will then be cluster-robust.
Alternatively, you can set vcov=~gear to compute the standard errors using the sandwich::vcovCL function under the hood.
Using version 0.10.0 of modelsummary and 0.10.4 of fixest, we can do:
library(modelsummary)
library(fixest)
models <- list(
lm(cyl~disp, data = mtcars),
feols(cyl ~ disp | gear, data = mtcars),
feols(cyl ~ disp | gear, data = mtcars))
modelsummary(models,
fmt = 6,
vcov = list("HC3", NULL, ~gear))
Model 1
Model 2
Model 3
(Intercept)
3.188568
(0.289130)
disp
0.012998
0.012061
0.012061
(0.001310)
(0.002803)
(0.002803)
Num.Obs.
32
32
32
R2
0.814
0.819
0.819
R2 Adj.
0.807
0.800
0.800
R2 Within
0.614
0.614
R2 Pseudo
AIC
79.1
80.2
80.2
BIC
83.5
86.1
86.1
Log.Lik.
-36.573
-36.107
-36.107
F
98.503
Std.Errors
HC3
by: gear
by: gear
FE: gear
X
X
Notice that the results are the same in the 2nd and 3rd column, and also equal to the plain fixest summary:
summary(models[[2]])
#> OLS estimation, Dep. Var.: cyl
#> Observations: 32
#> Fixed-effects: gear: 3
#> Standard-errors: Clustered (gear)
#> Estimate Std. Error t value Pr(>|t|)
#> disp 0.012061 0.002803 4.30299 0.049993 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.747808 Adj. R2: 0.799623
#> Within R2: 0.614334
In some past versions of modelsummary there were issues with labelling of the standard errors at the bottom of the table. I will soon be working on a more robust system to make sure the label matches the standard errors. In most cases, modelsummary calculates the robust standard errors itself, so it has full control over labelling and computation. Packages like fixest and estimatr make things a bit more tricky because they sometimes hold several SEs, and because the default is not always “classical”.
I am trying to build a fixed effects regression with the plm package in R. I am using country level panel data with year and country fixed effects.
My problem concerns 2 explanatory variables. One is an interaction term of two varibels and one is a squared term of one of the variables.
model is basically:
y = x1 + x1^2+ x3 + x1*x3+ ...+xn , with the variables all being in log form
It is central to the model to include the squared term, but when I run the regression it always gets excluded because of "singularities", as x1 and x1^2 are obviously correlated.
Meaning the regression works and I get estimates for my variables, just not for x1^2 and x1*x2.
How do I circumvent this?
library(plm)
fe_reg<- plm(log(y) ~ log(x1)+log(x2)+log(x2^2)+log(x1*x2)+dummy,
data = df,
index = c("country", "year"),
model = "within",
effect = "twoways")
summary(fe_reg)
´´´
#I have tried defining the interaction and squared terms as vectors, which helped with the #interaction term but not the squared term.
df1.pd<- df1 %>% mutate_at(c('x1'), ~(scale(.) %>% as.vector))
df1.pd<- df1 %>% mutate_at(c('x2'), ~(scale(.) %>% as.vector))
´´´
I am pretty new to R, so apologies if this not a very well structured question.
You just found two properties of the logarithm function:
log(x^2) = 2 * log(x)
log(x*y) = log(x) + log(y)
Then, obviously, log(x) is collinear with 2*log(x) and one of the two collinear variables is dropped from the estimation. Same for log(x*y) and log(x) + log(y).
So, the model you want to estimate is not estimable by linear regression methods. You might want to take different data transformations than log into account or the original variables.
See also the reproducible example below wher I just used log(x^2) = 2*log(x). Linear dependence can be detected, e.g., via function detect.lindep from package plm (see also below). Dropping of coefficients from estimation also hints at collinear columns in the model estimation matrix. At times, linear dependence appears only after data transformations invovled in the estimation functions, see for an example of the within transformation the help page ?detect.lindep in the Example section).
library(plm)
data("Grunfeld")
pGrun <- pdata.frame(Grunfeld)
pGrun$lvalue <- log(pGrun$value) # log(x)
pGrun$lvalue2 <- log(pGrun$value^2) # log(x^2) == 2 * log(x)
mod <- plm(inv ~ lvalue + lvalue2 + capital, data = pGrun, model = "within")
summary(mod)
#> Oneway (individual) effect Within Model
#>
#> Call:
#> plm(formula = inv ~ lvalue + lvalue2 + capital, data = pGrun,
#> model = "within")
#>
#> Balanced Panel: n = 10, T = 20, N = 200
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -186.62916 -20.56311 -0.17669 20.66673 300.87714
#>
#> Coefficients: (1 dropped because of singularities)
#> Estimate Std. Error t-value Pr(>|t|)
#> lvalue 30.979345 17.592730 1.7609 0.07988 .
#> capital 0.360764 0.020078 17.9678 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Total Sum of Squares: 2244400
#> Residual Sum of Squares: 751290
#> R-Squared: 0.66525
#> Adj. R-Squared: 0.64567
#> F-statistic: 186.81 on 2 and 188 DF, p-value: < 2.22e-16
detect.lindep(mod) # run on the model
#> [1] "Suspicious column number(s): 1, 2"
#> [1] "Suspicious column name(s): lvalue, lvalue2"
detect.lindep(pGrun) # run on the data
#> [1] "Suspicious column number(s): 6, 7"
#> [1] "Suspicious column name(s): lvalue, lvalue2"
Forgive me as I'm brand new to R and if this is silly/easy, but I've been looking for hours but to no avail.
I have a series of GLM models, and I'd like to report the standardized/reparametrized coefficients for each alongside the direct coefficients in a Stargazer table. I created two separate models, one with standardized coefficients using the arm package.
require(arm)
model1 <- glm(...)
model1.2 <- standardize(model1)
Both models work, find and give the outputs I want, but
I can't seem to figure out a way to get Stargazer to emulate this structure/look:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.699&rep=rep1&type=pdf
This can be done by asking stargazer to produce output for both models and making sure that the coefficients in the models have the same names so that stargazer knows that the standardized coefficient and the un-standardized coefficient should go on the same row.
The code below should help you get started.
# generate fake data
x <- runif(100)
y <- rbinom(100, 1, x)
# fit the model
m1 <- glm(y~x, family = binomial())
# standardize it
m2 <- arm::standardize(m1)
# we make sure the coefficients have the same names in both models
names(m2$coefficients) <- names(m1$coefficients)
# we feed to stargazer
stargazer::stargazer(m1, m2, type = "text",
column.labels = c("coef (s.e.)", "standarized coef (s.e.)"),
single.row = TRUE)
#>
#> ===========================================================
#> Dependent variable:
#> -----------------------------------------
#> y
#> coef (s.e.) standarized coef (s.e.)
#> (1) (2)
#> -----------------------------------------------------------
#> x 4.916*** (0.987) 2.947*** (0.592)
#> Constant -2.123*** (0.506) 0.248 (0.247)
#> -----------------------------------------------------------
#> Observations 100 100
#> Log Likelihood -50.851 -50.851
#> Akaike Inf. Crit. 105.703 105.703
#> ===========================================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Created on 2019-02-13 by the reprex package (v0.2.1)
You can usually figure out how to achieve what you want from the output by diving into the stargazer help file and looking at this helpful webpage https://www.jakeruss.com/cheatsheets/stargazer/
I have a glm model for which I use coeftest from the lmtest package to estimate robust standard errors. When I use stargazer to produce regression tables I get the correct results but without the number of observations and other relevant statistics like the null deviance and the model deviance.
Here's an example:
library(lmtest)
library(stargazer)
m1 <- glm(am ~ mpg + cyl + disp, mtcars, family = binomial)
# Simple binomial regression
# For whatever reason, let's say I want to use coeftest to estimate something
m <- coeftest(m1)
stargazer(m, type = "text", single.row = T) # This is fine, but I want to also include the number of observations
# the null deviance and the model deviance.
I'm specifically interested in the number of observations, the null deviance and the residual deviance.
I thought that If I replaced the old coefficient matrix with the new one, I'd get the correct estimates with the correct statistics and stargazer would recognize the model and print it correctly. For that, I've tried substituting the coefficients, SE's, z statistic and p values from the coeftest model in the m1 model but some of these statistics are computed with summary.glm and are not included in the m1 output. I could easily substitute these coefficients in the summary output but stargazer doesn't recognize summary type class. I've tried adding attributes to the m object with the specific statistics but they don't show up in the output and stargazer doesn't recognize it.
Note: I know stargazer can compute robust SE's but I'm also doing other computations, so the example needs to include the coeftest output.
Any help is appreciated.
It may be easiest to pass the original models into stargazer, and then use coeftest to pass in custom values for standard errors (se = ), confidence intervals (ci.custom = ) and/or p values (p = ). See below for how to easily handle a list containing multiple models.
suppressPackageStartupMessages(library(lmtest))
suppressPackageStartupMessages(library(stargazer))
mdls <- list(
m1 = glm(am ~ mpg, mtcars, family = poisson),
m2 = glm(am ~ mpg + cyl + disp, mtcars, family = poisson)
)
# Calculate robust confidence intervals
se_robust <- function(x)
coeftest(x, vcov. = sandwich::sandwich)[, "Std. Error"]
# Original SE
stargazer(mdls, type = "text", single.row = T, report = "vcsp")
#>
#> ===============================================
#> Dependent variable:
#> -----------------------------
#> am
#> (1) (2)
#> -----------------------------------------------
#> mpg 0.106 (0.042) 0.028 (0.083)
#> p = 0.012 p = 0.742
#> cyl 0.435 (0.496)
#> p = 0.381
#> disp -0.014 (0.009)
#> p = 0.151
#> Constant -3.247 (1.064) -1.488 (3.411)
#> p = 0.003 p = 0.663
#> -----------------------------------------------
#> Observations 32 32
#> Log Likelihood -21.647 -20.299
#> Akaike Inf. Crit. 47.293 48.598
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
# With robust SE
stargazer(
mdls, type = "text", single.row = TRUE, report = "vcsp",
se = lapply(mdls, se_robust))
#>
#> ===============================================
#> Dependent variable:
#> -----------------------------
#> am
#> (1) (2)
#> -----------------------------------------------
#> mpg 0.106 (0.025) 0.028 (0.047)
#> p = 0.00002 p = 0.560
#> cyl 0.435 (0.292)
#> p = 0.137
#> disp -0.014 (0.007)
#> p = 0.042
#> Constant -3.247 (0.737) -1.488 (2.162)
#> p = 0.00002 p = 0.492
#> -----------------------------------------------
#> Observations 32 32
#> Log Likelihood -21.647 -20.299
#> Akaike Inf. Crit. 47.293 48.598
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Created on 2020-11-09 by the reprex package (v0.3.0)
If I get you right, you could try the following:
First, assign your stargazer analysis to an object like this
stargazer.values <- stargazer(m, type = "text", single.row = T)
then check the code of the stargazer command with body(stargazer).
Hopefully you can find objects for values that stargazers uses but does not report. You can then address them like this (if there is, for example, an object named "null.deviance"
stargazers.values$null.deviance
Or, if it is part of another data frame, say df, it could go like this
stargazers.values$df$null.deviance
maybe a code like this could be helpful
print(null.deviance <- stargazers.values$null.deviance)
Hope this helps!