Stargazer pulls apart variables when observations dropped - r

I use stargazer to create a table for multiple models. They are actually the same model but the first is based on all observations, while the other drop different observations respectively. All variables are named the same, so what surprises me is that when I export the table to Latex, two lines, one for a dummy variable and another for an interaction term, are duplicated.
What is really strange is that I cannot replicate the results, but I will post a minimal working example nonetheless. Perhaps you can help me based on my description alone.
This is the code for my MWE:
library(tibble)
library(stargazer)
df <- as_tibble(data.frame(first = rnorm(100, 50), second = rnorm(100, 30), third = rnorm(100, 100), fourth = c(rep(0, 50), rep(1, 50))))
model.1 <- lm(first ~ second + third + fourth + third*fourth, data = df)
model.2 <- lm(first ~ second + third + fourth + third*fourth, data = df[!rownames(df) %in% "99",])
stargazer(model.1, model.2)
I will now post the Latex output includes the error that I am trying to fix (with this snippet it seems to work just fine).
What I would like to have, of course is the code as produced by this snippet (I feel very stupid for not being able to reproduce it):

you could take a look at the names of your model's coefficients using coefficients(). Mare sure they are identical, i.e. identical(names(model.1), names(model.2)) Then use stargazer's keep statement to make sure you get the coefficients you want,
Here with the example above keeping selected variables;
coefficients(model.1)
#> (Intercept) second third fourth third:fourth
#> 57.27352606 0.02674072 -0.08236250 20.23596216 -0.20288137
coefficients(model.2)
#> (Intercept) second third fourth third:fourth
#> 57.06149556 0.03305134 -0.08214812 20.85087288 -0.20885718
identical(names(model.1), names(model.2))
#> [1] TRUE
I'm using the type = "text" to make it more friendly to SO, but I guess it's the same with LaTeX,
stargazer(model.1, model.2, type = "text", keep=c("third","third:fourth"))
#>
#> =========================================================
#> Dependent variable:
#> -------------------------------------
#> first
#> (1) (2)
#> ---------------------------------------------------------
#> third -0.082 -0.082
#> (0.166) (0.167)
#>
#> third:fourth -0.203 -0.209
#> (0.222) (0.223)
#>
#> ---------------------------------------------------------
#> Observations 100 99
#> R2 0.043 0.044
#> Adjusted R2 0.002 0.004
#> Residual Std. Error 1.044 (df = 95) 1.047 (df = 94)
#> F Statistic 1.056 (df = 4; 95) 1.089 (df = 4; 94)
#> =========================================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
but it might be hard to rule out that it's a local issue if we cannot find a way to reproduce your issue.

Related

How to rename complicated variable name in fixest etable

I'm wondering how to change a complex variable name with dict in etable in fixest package.
For example, I have a regression Y ~ x1 + x2:abs(x3):x4 and I'd like to change the name of x2:abs(x3):x4.
I have tried
etable(...,
dict = c(`x2:abs(x3):x4` = 'myvar')
)
etable(...,
dict = c("x2:abs(x3):x4" = 'myvar')
)
etable(...,
dict = c("x2*abs(x3)*x4" = 'myvar')
)
But no success. Is there a easy fix for this?
It works. It's likely a version problem:
library(fixest)
est = feols(mpg ~ cyl:abs(disp):hp, mtcars)
etable(est, dict=c("cyl:abs(disp):hp" = "New coef"))
#> est
#> Dependent Var.: mpg
#>
#> (Intercept) 25.05*** (0.9560)
#> New coef -1.65e-5*** (2.3e-6)
#> _______________ ____________________
#> S.E. type Standard
#> Observations 32
#> R2 0.63073
#> Adj. R2 0.61842
Otherwise, please provide a minimal reproducible example.

contrast of contrast with emmeans (second differences)

I am using emmeans to conduct a contrast of a contrast (i.e., testing for an interaction effect through 1st/2nd differences).
It involves 3 steps:
estimate means using “emmeans”
estimate if there is a difference in means (1st difference) using “pairs”
estimate if there is a difference in the difference (2nd difference) using ????
While I can execute steps 1 and 2 (see reprex below with fictions data), i’m stuck on step 3. Tips?
(the contrast of a contrast shown in the vignette here is for alternative functional forms, which is somewhat different than what I want to test)
suppressPackageStartupMessages({
library(emmeans)})
# create ex. data set. 1 row per respondent (dataset shows 2 resp).
cedata.1 <- data.frame( id = c(1,1,1,1,1,1,2,2,2,2,2,2),
QES = c(1,1,2,2,3,3,1,1,2,2,3,3), # Choice set
Alt = c(1,2,1,2,1,2,1,2,1,2,1,2), # Alt 1 or Alt 2 in choice set
Choice = c(0,1,1,0,1,0,0,1,0,1,0,1), # Dep variable. if Chosen (1) or not (0)
LOC = c(0,0,1,1,0,1,0,1,1,0,0,1), # Indep variable per Choice set, binary categorical
SIZE = c(1,1,1,0,0,1,0,0,1,1,0,1), # Indep variable per Choice set, binary categorical
gender = c(1,1,1,1,1,1,0,0,0,0,0,0) # Indep variable per indvidual, binary categorical
)
# estimate model
glm.model <- glm(Choice ~ LOC*SIZE, data=cedata.1, family = binomial(link = "logit"))
# estimate means (i.e., values used to calc 1st diff).
comp1.loc.size <- emmeans(glm.model, ~ LOC * SIZE)
# calculate 1st diff (and p value)
pairs(comp1.loc.size, simple = "SIZE") # gives result I want
#> LOC = 0:
#> contrast estimate SE df z.ratio p.value
#> 0 - 1 -1.39 1.73 Inf -0.800 0.4235
#>
#> LOC = 1:
#> contrast estimate SE df z.ratio p.value
#> 0 - 1 0.00 1.73 Inf 0.000 1.0000
#>
#> Results are given on the log odds ratio (not the response) scale.
# calculate 2nd diff (and p value)
# ** the following gives the relevant values for doing the 2nd diff comparison (i.e., -1.39 and 0.00)...but how to make the statistical comparison?
pairs(comp1.loc.size, simple = "SIZE")
#> LOC = 0:
#> contrast estimate SE df z.ratio p.value
#> 0 - 1 -1.39 1.73 Inf -0.800 0.4235
#>
#> LOC = 1:
#> contrast estimate SE df z.ratio p.value
#> 0 - 1 0.00 1.73 Inf 0.000 1.0000
#>
#> Results are given on the log odds ratio (not the response) scale.
pairs(pairs(comp1.loc.size, simple = "SIZE"), by = NULL)
Another solution:
# estimate means (i.e., values used to calc 1st diff).
comp1.loc.size <- emmeans(glm.model, ~ LOC | SIZE)
# second difference:
pairs(pairs(emmeans::regrid(comp1.loc.size)), by = NULL)
PS: This solution is almost a copy of the solution here: Testing contrast of contrast (first/second difference) in outcome

How to get between and overall R2 from plm FE regression with stargazer?

Disclaimer: This question is extremely related to this one I asked two days ago - but now it relates to the implementation of between and overall R2 in stargazer() output not in summary() as before.
Is there a way to get plm() to calculate between R2 and overall R2 for me and include them in the stargazer() output?
To clarify what I mean with between, overall, and within R2 see this answer on StackExchange.
My understanding is that plm only calculates within R2.
I am running a Twoways effects Within Model.
library(plm)
library(stargazer)
# Create some random data
set.seed(1)
x=rnorm(100); fe=rep(rnorm(10),each=10); id=rep(1:10,each=10); ti=rep(1:10,10); e=rnorm(100)
y=x+fe+e
data=data.frame(y,x,id,ti)
# Get plm within R2
reg=plm(y~x,model="within",index=c("id","ti"), effect = "twoways", data=data)
stargazer(reg)
I now also want to include between and overall R2 in the stargazer() output. How can I do that?
To make it explicit what I mean with between and overall R2:
# Pooled Version (overall R2)
reg1=lm(y~x)
summary(reg1)$r.squared
# Between R2
y.means=tapply(y,id,mean)[id]
x.means=tapply(x,id,mean)[id]
reg2=lm(y.means~x.means)
summary(reg2)$r.squared
To do this in stargazer, you can use the add.lines() argument. However, this adds the lines to the beginning of the summary stats section and there is no way to alter this without messing with the source code, which is beastly. I much prefer huxtable, which provides a grammar of table building and is much more extensible and customizable.
library(tidyverse)
library(plm)
library(huxtable)
# Create some random data
set.seed(1)
x=rnorm(100); fe=rep(rnorm(10),each=10); id=rep(1:10,each=10); ti=rep(1:10,10); e=rnorm(100)
y=x+fe+e
data=data.frame(y,x,id,ti)
# Get plm within R2
reg=plm(y~x,model="within",index=c("id","ti"), effect = "twoways", data=data)
stargazer(reg, type = "text",
add.lines = list(c("Overall R2", round(r.squared(reg, model = "pooled"), 3)),
c("Between R2", round(r.squared(update(reg, effect = "individual", model = "between")), 3))))
#>
#> ========================================
#> Dependent variable:
#> ---------------------------
#> y
#> ----------------------------------------
#> x 1.128***
#> (0.113)
#>
#> ----------------------------------------
#> Overall R2 0.337
#> Between R2 0.174
#> Observations 100
#> R2 0.554
#> Adjusted R2 0.448
#> F Statistic 99.483*** (df = 1; 80)
#> ========================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
# I prefer huxreg, which is much more customizable!
# Create a data frame of the R2 values
r2s <- tibble(
name = c("Overall R2", "Between R2"),
value = c(r.squared(reg, model = "pooled"),
r.squared(update(reg, effect = "individual", model = "between"))))
tab <- huxreg(reg) %>%
# Add new R2 values
add_rows(hux(r2s), after = 4)
# Rename R2
tab[7, 1] <- "Within R2"
tab %>% huxtable::print_screen()
#> ─────────────────────────────────────────────────
#> (1)
#> ─────────────────────────
#> x 1.128 ***
#> (0.113)   
#> ─────────────────────────
#> N 100        
#> Overall R2 0.337    
#> Between R2 0.174    
#> Within R2 0.554    
#> ─────────────────────────────────────────────────
#> *** p < 0.001; ** p < 0.01; * p < 0.05.
#>
#> Column names: names, model1
Created on 2020-04-08 by the reprex package (v0.3.0)

How can I omit the regression intercept from my results table in stargazer

I run a regression of the type
model <- lm(y~x1+x2+x3, weights = wei, data=data1)
and then create my table
,t <- stargazer(model, omit="x2", omit.labels="x1")
but I haven't found a way to omit the intercept results from the table. I need it in the regression, yet I don't want to show it in the table.
Is there a way to do it through stargazer?
I haven't your dataset, but typing omit = c("Constant", "x2") should work.
As a reproducible example (stargazer 5.2)
stargazer::stargazer(
lm(Fertility ~ . ,
data = swiss),
type = "text",
omit = c("Constant", "Agriculture"))
Edit: Add in omit.labels
mdls <- list(
m1 = lm(Days ~ -1 + Reaction, data = lme4::sleepstudy),
m2 = lm(Days ~ Reaction, data = lme4::sleepstudy),
m3 = lm(Days ~ Reaction + Subject, data = lme4::sleepstudy)
)
stargazer::stargazer(
mdls, type = "text", column.labels = c("Omit none", "Omit int.", "Omit int/subj"),
omit = c("Constant", "Subject"),
omit.labels = c("Intercept", "Subj."),
keep.stat = "n")
#>
#> ==============================================
#> Dependent variable:
#> ---------------------------------
#> Days
#> Omit none Omit int. Omit int/subj
#> (1) (2) (3)
#> ----------------------------------------------
#> Reaction 0.015*** 0.027*** 0.049***
#> (0.001) (0.003) (0.004)
#>
#> ----------------------------------------------
#> Intercept No No No
#> Subj. No No No
#> ----------------------------------------------
#> Observations 180 180 180
#> ==============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Created on 2020-05-08 by the reprex package (v0.3.0)
Note the table should read. This appears to be a bug (stargazer 5.2.2).
#> Intercept No Yes Yes
#> Subj. No No Yes
I got a way of doing it. It is not the most clever way, but works.
I just change the omit command to a keep command. In my example above:
library(stargazer)
model <- lm(y~x1+x2+x3, weights = wei, data=data1)
t <- stargazer(model, keep=c("x1","x3"), omit.labels="x1")
However, it's not an efficient way when you have many variables you want to keep in the regression table

Exporting R regression summary for publishable paper

I have multiple regression models in R, which I want to summarize in a nice table format that could be included in the publication. I have all the results ready, but couldn't find a way to export them, and it wouldn't be efficient to do this by hand as I need about 20 tables.
So, one of my models is:
felm1=felm(ROA~BC+size+sizesq+age | stateyeard+industryyeard, data=data)
And I'm getting desired summary in R.
However, what I want for my paper is to have only the following in the table, the estimates with t-statistic in the brackets and also the significance codes (*,,etc.).
Is there a way to create any type of table which will include the above? Lyx, excel, word, .rft, anything really.
Even better, another model that I have is (with some variables different):
felm2=felm(ROA~BC+BCHHI+size+sizesq+age | stateyeard+industryyeard, data=data)
could I have summary of the two regressions combined in one table (where same variables would be on the same row, and others would produce empty cells)?
Thank you in advance, and I'll appreciated any attempt of help.
Here is a reproducible example:
x<-rnorm(1:20)
y<-(1:20)/10+x
summary(lm(y~x))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Itercept) 1.0539 0.1368 7.702 4.19e-07 ***
x 1.0257 0.1156 8.869 5.48e-08 ***
This is the result in R. I want the result in a table to look like
(Itercept) 1.0539*** (7.702)
X 1.0257*** (8.869)
Is this possible?
The Broom package is very good for making regression tables nice for export. Results can then be exported to csv for tarting up with Excel or one can use Rmarkdown and the kable function from knitr to make Word documents (or latex).
require(broom) # for tidy()
require(knitr) # for kable()
x<-rnorm(1:20)
y<-(1:20)/10+x
model <- lm(y~x)
out <- tidy(model)
out
term estimate std.error statistic p.value
1 (Intercept) 1.036583 0.1390777 7.453261 6.615701e-07
2 x 1.055189 0.1329951 7.934044 2.756835e-07
kable(out)
|term | estimate| std.error| statistic| p.value|
|:-----------|--------:|---------:|---------:|-------:|
|(Intercept) | 1.036583| 0.1390777| 7.453261| 7e-07|
|x | 1.055189| 0.1329951| 7.934044| 3e-07|
I should mention that I now use the excellent pixiedust for exporting regression results as it allows much finer control of the output, allowing the user to do more in R and less in any other package.
see the vignette on Cran
library(dplyr) # for pipe (%>%) command
library(pixiedust)
dust(model) %>%
sprinkle(cols = c("estimate", "std.error", "statistic"), round = 2) %>%
sprinkle(cols = "p.value", fn = quote(pvalString(value))) %>%
sprinkle_colnames("Term", "Coefficient", "SE", "T-statistic",
"P-value")
Term Coefficient SE T-statistic P-value
1 (Intercept) 1.08 0.14 7.44 < 0.001
2 x 0.93 0.14 6.65 < 0.001
For text table, try this:
x<-rnorm(1:20)
y<-(1:20)/10+x
result <- lm(y~x)
library(stargazer)
stargazer(result, type = "text")
results in...
===============================================
Dependent variable:
---------------------------
y
-----------------------------------------------
x 0.854***
(0.108)
Constant 1.041***
(0.130)
-----------------------------------------------
Observations 20
R2 0.777
Adjusted R2 0.765
Residual Std. Error 0.579 (df = 18)
F Statistic 62.680*** (df = 1; 18)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
For multiple regression, just do
stargazer(result, result, type = "text")
And, just for the sake of making the asked outcome.
addStars <- function(coeffs) {
fb <- format(coeffs[, 1], digits = 4)
s <- cut(coeffs[, 4],
breaks = c(-1, 0.01, 0.05, 0.1, 1),
labels = c("***", "**", "*", ""))
sb <- paste0(fb, s)
}
addPar <- function(coeffs) {
se <- format(coeffs[, 2], digits = 3)
pse <- paste0("(", se, ")")
}
textTable <- function(result){
coeffs <- result$coefficients
lab <- rownames(coeffs)
sb <- addStars(coeffs)
pse <- addPar(coeffs)
out <- cbind(lab,sb, pse)
colnames(out) <- NULL
out
}
print(textTable(result), quote = FALSE)
You can use xtable::xtable, Hmisc::latex, Gmisc::htmltable etc. once you have a text table. Someone posted a link in comments. :)

Resources