How to hide certain lines from modelsummary of regression analysis - r

I have a regression output as follows:
regression1 <- lm(cnt ~ temperature + weathersit + humidity + windvelocity, data=captialbikedata)
modelsummary(regression1)
I am using modelsummary in order to display it in a table in markdown.
I want to hide the following rows of the regression output:
AIC
BIC
Log.Lik.
F
How would I do that?

The gof_omit allows you to omit goodness-of-fit statistics, which is what modelsummary calls all the statistics reported in the bottom section of the table.
This argument accepts "regular expressions", which allows you to use partial matches and a variety of other tricks. One nice trick is to use the vertical bar (|, meaning "OR") to say that you want to omit any one of many patterns.
library("modelsummary")
mod <- lm(hp ~ mpg, mtcars)
modelsummary(mod, gof_omit = "AIC|BIC|Log|F")
Model 1
(Intercept)
324.082
(27.433)
mpg
-8.830
(1.310)
Num.Obs.
32
R2
0.602
R2 Adj.
0.589
In addition, you can omit coefficients using the coef_omit argument in a similar way. Finally, you can omit the standard errors in parentheses by setting statistic=NULL.

Related

Marginal effects plot of PLM

I’ve run an individual-fixed effects panel model in R using the plm-package. I now want to plot the marginal effects.
However, neither plot_model() nor effect_plot() work for plm-objects. plot_model() works for type = “est” but not for type = “pred”.
My online search so far only suggests using ggplot (which however only displays OLS-regressions, not fixed effects) or outdated functions (i.e, sjp.lm())
Does anyone have any recommendations how I can visualize effects of plm-objects?
IFE_Aut_uc <- plm(LoC_Authorities_rec ~ Compassion_rec, index = c("id","wave"), model = "within", effect = "individual", data = D3_long2)
summary(IFE_Aut_uc)
plot_model(IFE_Aut_uc, type = "pred”)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 50238, 82308
and:
effect_plot(IFE_Pol_uc, pred = Compassion_rec)
Error in `stop_wrap()`:
! ~does not appear to be a one- or two-sided formula.
LoC_Politicians_recdoes not appear to be a one- or two-sided formula.
Compassion_recdoes not appear to be a one- or two-sided formula.
Backtrace:
1. jtools::effect_plot(IFE_Pol_uc, pred = Compassion_rec)
2. jtools::get_data(model, warn = FALSE)
4. jtools:::get_lhs(formula)
Edit 2022-08-20: The latest version of plm on CRAN now includes a predict() method for within models. In principle, the commands illustrated below using fixest should now work with plm as well.
In my experience, plm models are kind of tricky to deal with, and many of the packages which specialize in “post-processing” fail to handle these objects properly.
One alternative would be to estimate your “within” model using the fixest package and to plot the results using the marginaleffects package. (Disclaimer: I am the marginaleffects author.)
Note that many of the models estimated by plm are officially supported and tested with marginaleffects (e.g., random effects, Amemiya, Swaymy-Arora). However, this is not the case of this specific "within" model, which is even trickier than the others to support.
First, we estimate two models to show that the plm and fixest versions are equivalent:
library(plm)
library(fixest)
library(marginaleffects)
library(modelsummary)
data("EmplUK")
mod1 <- plm(
emp ~ wage * capital,
index = c("firm", "year"),
model = "within",
effect = "individual",
data = EmplUK)
mod2 <- feols(
emp ~ wage * capital | firm,
se = "standard",
data = EmplUK)
models <- list("PLM" = mod1, "FIXEST" = mod2)
modelsummary(models)
PLM
FIXEST
wage
0.000
0.000
(0.034)
(0.034)
capital
2.014
2.014
(0.126)
(0.126)
wage × capital
-0.043
-0.043
(0.004)
(0.004)
Num.Obs.
1031
1031
R2
0.263
0.986
R2 Adj.
0.145
0.984
R2 Within
0.263
R2 Within Adj.
0.260
AIC
4253.9
4253.9
BIC
4273.7
4273.7
RMSE
1.90
1.90
Std.Errors
IID
FE: firm
X
Now, we use the marginaleffects package to plot the results. There are two main functions for this:
plot_cap(): plot conditional adjusted predictions. How does my predicted outcome change as a function of a covariate?
plot_cme(): plot conditional marginal effects. How does the slope of my model with respect to one variable (i.e., a derivative or “marginal effect”) change with respect to another variable?
See the website for definitions and details: https://vincentarelbundock.github.io/marginaleffects/
plot_cap(mod2, condition = "capital")
plot_cme(mod2, effect = "wage", condition = "capital")

In R, the output of my linear model shows a positive correlation but my ggplot graph indicates a negative correlation?

I'm trying to identify the impact of how Sycamore_biomass affects the day which a bird lays its first_egg. My model output indicates a weak positive relationship - i.e. as sycamore biomass increases, the day of the first egg being laid should increase (i.e. should be later) (note I'm including confounding factors in this model):
Call:
lm(formula = First_egg ~ Sycamore_biomass + Distance_to_road +
Distance_to_light + Anthropogenic_cover + Canopy_cover, data = egglay_date)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.61055 16.21391 2.443 0.0347 *
Sycamore_biomass 0.15123 0.53977 0.280 0.7851
Distance_to_road 0.01773 0.46323 0.038 0.9702
Distance_to_light -0.02626 0.44225 -0.059 0.9538
Anthropogenic_cover -0.13879 0.28306 -0.490 0.6345
Canopy_cover -0.30219 0.20057 -1.507 0.1628
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.99 on 10 degrees of freedom
Multiple R-squared: 0.2363, Adjusted R-squared: -0.1455
F-statistic: 0.6189 on 5 and 10 DF, p-value: 0.6891
However, when I plot this using ggplot, the regression line indicates a negative relationship? Can anyone help me out with what is happening here?
ggplot(egglay_date, aes(x=Sycamore_biomass, y=First_egg)) +
geom_point(shape=19, alpha=1/4) +
geom_smooth(method=lm)
GG PLOT of Sycamore biomass and First egg date
I suppose this is because you look at the raw data you fed into the model, not the model predictions. In the plot, you don't "isolate" a single predictor. You look at the result of all predictors doing something to the response variable. I suppose the effect of this predictor is "overshadowed" by the effects of the other predictors.
To take a look at the effect of solely one predictor, you need to predict new values from the model while fixing all other predictors. You can try something along the lines of:
preds <- predict(yourmodel, newdata = data.frame(
"Sycamore_biomass" = 0:25,
"Distance_to_road" = mean(egglay_date$Distance_to_road),
"Distance_to_light" = mean(egglay_date$Distance_to_light),
"Anthropogenic_cover" = mean(egglay_date$Anthropogenic_cover),
"Canopy_cover" = mean(egglay_date$Canopy_cover)))
new_data <- data.frame(
"Sycamore_biomass" = 0:25,
"First_egg" = preds)
ggplot(new_data, aes(x=Sycamore_biomass, y=First_egg)) +
geom_point(shape=19, alpha=1/4) +
geom_smooth(method=lm)
This should give you the predictions of your model when only considering the effect of the one predictor.
The answer to your question is quite simple (but I understand why it may seems complex at first).
First off, your model indicates a positive relationship because you have included all your other variables. Keep in mind, your best fit line through your data here is when you take all your data points, and fit a line to make the sum of residuals = 0. Note: this is not the same as sum of residuals squared.
Since you didn't provide your data (please do on future posts, or at least, something to work with), I will illustrate my point with the data(mtcars) built into R
data("mtcars")
df <- mtcars
This dataset has many variables, to see them all, just type names(df)
Lets just work with three of them to see if miles per gallon (mpg) is explained by:
1) cyl : # of cylinders
2) hp : horse power
3) drat : rear axle ratio
attach(df)
model <-lm(mpg~cyl+hp+drat)
summary(model)
Let's say, I just want to plot the relationship between cylinders and mpg (for you, it would be sycamour biomass and bird lay). Here, from our model summary, we see that our relationship is negative (negative estimate, aka, coefficient), and that the intercept is at 22.5.
So I do what you just did and just plot mpg~cly (without considering my other variables)
plot(mpg~cyl, pch=15, col="blue",cex=2, cex.axis=2, ylab="MPG", xlab="Number of Cylinders", cex.lab=1.5)
abline(lm(mpg~cyl),lwd=2,col="red")
First off, we see that the y intercept is not 22.5, but rather above 25.
If I were to do the math from first model, if I had 4 cylinders, I should predict:
22.51406 + (4 * -1.3606) = 17.07
So lets see if our prediction is correct on our graph
Definitely not.
So lets run a new model (which you need to do), where we model just mpg~cly
reduced_model <- lm(mpg~cyl)
summary(reduced_model)
See how the intercept and coefficent (estimates) changed? Yours will too when you run a reduced model. Lets see if the plots now make sense following the same steps as above with predicting 4 cylinders
37.8846 + (4 * -2.8758 ) # 26.38
plot(mpg~cyl, pch=15, col="blue",cex=2, cex.axis=2, ylab="MPG", xlab="Number of Cylinders", cex.lab=1.5)
abline(lm(mpg~cyl),lwd=2,col="red")
abline(h=26.38,v=4,lwd=2, col="green")
Looks like everything checks out.
Summary: You need to run a simple model with just your two variables of interest if you want to correctly understand your plot

How do I get standardized beta coefficients using modelsummary in R? And how to omit more than one variable using modelsummary?

I am using modelsummary to create a table. I would like to the estimate (regression coefficients) to be standardized. I used lm.beta() but the estimate = is giving me the non-standardized coefficient. Also, I would like to use coef.omitt to take out more than one variable. How might I do this?
This solution only works using the development version of modelsummary. This version should be on CRAN in the next few weeks, but you can install it now:
library(remotes)
install_github("vincentarelbundock/modelsummary")
Under the hood, modelsummary uses the parameters package to extract parameters from model objects. As you can see here, that package can apply several different kinds of standardization. You can pass a standardize argument directly to modelsummary(), which will then pass it down to parameters.
The coef_omit argument accepts regular expressions. See the documentation to learn how to omit several coefficients. ex: coef_omit="x|y|z"
For example:
library(modelsummary)
mod <- lm(mpg ~ hp + factor(cyl), data = mtcars)
modelsummary(mod, standardize = "basic")
Model 1
(Intercept)
0.000
(0.000)
hp
-0.273
(0.175)
factor(cyl)6
-0.416
(0.114)
factor(cyl)8
-0.713
(0.195)
Num.Obs.
32
R2
0.754
R2 Adj.
0.727
AIC
169.9
BIC
177.2
F
28.585
RMSE
2.94

How to transpose a regression output with modelsummary package?

I JUST found out about this amazing R package, modelsummary.
It doesn't seem like it offers an ability to transpose regression outputs.
I know that you cannot do a tranposition within kable-extra, which is my go-to for ordinary table outputs in R. Since modelsummary relies on kable-extra for post-processing, I'm wondering if this is possible. Has anyone else figured it out?
Ideally I'd like to preserve the stars of my regression output.
This is available in STATA (below):
Thanks in advance!
You can flip the order of the terms in the group argument formula. See documentation here and also here for many examples.
library(modelsummary)
mod <- list(
lm(mpg ~ hp, mtcars),
lm(mpg ~ hp + drat, mtcars))
modelsummary(mod, group = model ~ term)
(Intercept)
hp
drat
Model 1
30.099
-0.068
(1.634)
(0.010)
Model 2
10.790
-0.052
4.698
(5.078)
(0.009)
(1.192)
The main problem with this strategy is that there is not (yet) an automatic way to append goodness of fit statistics. So you would probably have to rig something up by creating a data.frame and feeding it to the add_columns argument. For example:
N <- sapply(mod, function(x) get_gof(x)$nobs)
N <- data.frame(N = c(N[1], "", N[2], ""))
modelsummary(mod,
group = model ~ term,
add_columns = N,
align = "lcccc")
(Intercept)
hp
drat
N
Model 1
30.099
-0.068
32
(1.634)
(0.010)
Model 2
10.790
-0.052
4.698
32
(5.078)
(0.009)
(1.192)
If you have ideas about the best default behavior for goodness of fit statistics, please file a feature request on Github.

How to output several variables in the same row using stargazer in R

I would like to output the interaction terms from several regressions in the same row and call it "Interaction". So far what I have is that the interaction terms show up in two different rows called "Interaction" (see code below).
This question has already been asked here, but my score isn't high enough yet to upvote it or comment on it: https://stackoverflow.com/questions/28859569/several-coefficients-in-one-line.
library("stargazer")
stargazer(attitude)
stargazer(attitude, summary=FALSE)
# 2 OLS models with Interactions
linear.1 <- lm(rating ~ complaints + privileges + complaints*privileges
, data=attitude)
linear.2 <- lm(rating ~ complaints + learning + complaints*learning, data=attitude)
stargazer(linear.1, linear.2, title="Regression Results", type="text",
covariate.labels=c("Complaints", "Privileges", "Interaction", "Learning", "Interaction"))
Thank you for your help.
I think this is not natively supported because it is not a good idea. You're asking to obfuscate the meaning of the numbers in your table, which won't help your reader.
That caveat now stated, you can do this by modifying the contents of the lm objects:
# copy objects just for demonstration
m1 <- linear.1
m2 <- linear.2
# see names of coefficients
names(m1$coefficients)
# [1] "(Intercept)" "complaints" "privileges" "complaints:privileges"
names(m2$coefficients)
# [1] "(Intercept)" "complaints" "learning" "complaints:learning"
# replace names
names(m1$coefficients)[names(m1$coefficients) == "complaints:privileges"] <- "interaction"
names(m2$coefficients)[names(m2$coefficients) == "complaints:learning"] <- "interaction"
The result:
> stargazer(m1, m2, title="Regression Results", type="text")
Regression Results
==========================================================
Dependent variable:
----------------------------
rating
(1) (2)
----------------------------------------------------------
complaints 1.114** 0.307
(0.401) (0.503)
privileges 0.434
(0.570)
learning -0.171
(0.570)
interaction -0.007 0.006
(0.008) (0.009)
Constant -7.737 31.203
(27.409) (31.734)
----------------------------------------------------------
Observations 30 30
R2 0.692 0.713
Adjusted R2 0.657 0.680
Residual Std. Error (df = 26) 7.134 6.884
F Statistic (df = 3; 26) 19.478*** 21.559***
==========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
The following response:
reg ~ felm(....)
rownames(reg$coefficients)[rownames(reg$coefficients)=='oldname']<-'newname'
rownames(reg$beta)[rownames(reg$beta)=='oldname']<-'newname'
seems to work for the majority of cases.
Although I've had issues with it at times. This is needed when IV is used with felm. While it is nice to differentiate between variables fit with and without IV, the tables will come out cumbersome when compared with other models! This syntax is helpful then.
In case anyone is wondering, I needed this for a different purpose for the felm package. The following code is required for that:
reg ~ felm(....)
rownames(reg$coefficients)[rownames(reg$coefficients)=='oldname']<-'newname'
rownames(reg$beta)[rownames(reg$beta)=='oldname']<-'newname'

Resources