lapply to estimate with many dependent variables then tabulate with Stargazer - r

I'm trying to: (1) estimate multiple models where only the dependent variable changes (2) Tabulate the results with the Stargazer package
The following code works, but I have to repeat a line of code for each model:
library(stargazer)
data(mtcars)
reg1 <- lm(mpg ~ am + gear + carb, data=mtcars)
reg2 <- lm(cyl ~ am + gear + carb, data=mtcars)
reg3 <- lm(disp ~ am + gear + carb, data=mtcars)
stargazer(reg1, reg2, reg3,
title="Regression Results", type="text",
df=FALSE, digits=3)
You can see that the (trimmed) output has the correct headings for the dependent variables (mpg, cyl, disp):
Regression Results
==================================================
Dependent variable:
------------------------------
mpg cyl disp
(1) (2) (3)
--------------------------------------------------
am 3.545* -0.176 -40.223
(1.897) (0.615) (48.081)
If I use lapply and paste, it ends up changing the headings of the dependent variables in stargazer:
dependents <- c('mpg', 'cyl', 'disp')
outs <- lapply(dependents, function(x) {
fit <- lm(paste(x,'~', 'am + gear + carb'), data=mtcars)})
stargazer(outs[[1]], outs[[2]], outs[[3]],
title="Regression Results", type="text",
df=FALSE, digits=3)
gives the output where x is the heading for the dependent variables:
Regression Results
==================================================
Dependent variable:
------------------------------
x
(1) (2) (3)
--------------------------------------------------
am 3.545* -0.176 -40.223
(1.897) (0.615) (48.081)
Is there any way for me to fix this? Thank you.

If you create the formula before you run the regression it should work. I just separated the formula creation and the regression.
dependents <- c('mpg', 'cyl', 'disp')
outs <- lapply(dependents, function(x) {
formula <- as.formula(paste(x,'~', 'am + gear + carb'))
fit <- lm(formula, data=mtcars)})
stargazer(outs[[1]], outs[[2]], outs[[3]],
title="Regression Results", type="text",
df=FALSE, digits=3)

Related

Iterating and looping over multiple columns in glm in r using a name from another variable

I am trying to iterate over multiple columns for a glm function in R.
view(mtcars)
names <- names(mtcars[-c(1,2)])
for(i in 1:length(names)){
print(paste0("Starting iterations for ",names[i]))
model <- glm(mpg ~ cyl + paste0(names[i]), data=mtcars, family = gaussian())
summary(model)
print(paste0("Iterations for ",names[i], " finished"))
}
however, I am getting the following error:
[1] "Starting iterations for disp"
Error in model.frame.default(formula = mpg ~ cyl + paste0(names[i]), data = mtcars, :
variable lengths differ (found for 'paste0(names[i])')
Not sure, how I can correct this.
mpg ~ cyl + paste0(names[i]) or even mpg ~ cyl + names[i] is not a valid syntax for a formula. Use
reformulate(c("cyl", names[i]), "mpg")
instead, which dynamically creates a formula from variable names.
Since you need to build your model formula dynamically from string you need as.formula. Alternatively, consider reformulate which receives response and RHS variable names:
...
fml <- reformulate(c("cyl", names[i]), "mpg")
model <- glm(fml, data=mtcars, family = gaussian())
summary(model)
...
glm takes a formula which you can create using as.formula()
predictors <- names(mtcars[-c(1,2)])
for(predictor in predictors){
print(paste0("Starting iterations for ",predictor))
model <- glm(as.formula(paste0("mpg ~ cyl + ",predictor)),
data=mtcars,
family = gaussian())
print(summary(model))
print(paste0("Iterations for ",predictor, " finished"))
}

margins.plot: using the 'which' argument to choose which margins to include in plot

I am trying to plot marginal effects in r based on a logistic regression. For example:
data <- mtcars
mod <- glm(am ~ cyl + hp + wt + mpg, family = binomial, data = data)
library(margins)
marg <- margins(mod, atmeans = TRUE)
summary(marg)
I can run the margins plot command:
plot(marg)
which plots marginal effects and confidence intervals for all of the IVs. I only want to include in the plot cyl and hp, my explanatory variables of interest. According to r documentation, this can be accomplished using the 'which' argument, which takes a character vector. However, the documentation doesn't say how to use this argument. Does anyone know how to use the 'which' argument to ask margins.plot to plot only select marginal effects? Unfortunately, the margins plot help page, linked above, does not have any examples.
plot image
Before plotting, we can specify variables of interest with the variables option within the margins()function.
mod <- glm(am ~ cyl + hp + wt + mpg, family=binomial, data=mtcars)
library(margins)
marg <- margins(mod, variables=c("cyl", "hp"))
plot(marg)
Gives:

create a variable to replace variables in a model in R

When writing statistic model, I usually use a lot of co-variables to adjust the model, so I need to rewrite the variable again and again. Even though I could copy and paste, the model looks very long. Could I create a variable which could replace many variables? E.g.:
fm <- lm(y ~ a+b+c+d+e, data)
I could create a variable like: model1 = a+b+c+d+e, then the model looks like:
fm <-lm(y ~ model1, data)
I tried many ways, but it did successful, like model1 <- c(a+b+c+d),
Could someone help me with this?
how about saving it as a formula?
model <- ~a+b+c+d
You can then extract the terms using terms, or update the formula using update
Example:
model <- mpg ~ disp + wt + cyl
lm(model, mtcars)
## Call:
## lm(formula = model, data = mtcars)
##
## Coefficients:
## (Intercept) disp wt cyl
## 41.107678 0.007473 -3.635677 -1.784944
model <- update(model, ~. + qsec)
lm(model, mtcars)
## Call:
## lm(formula = model, data = mtcars)
##
## Coefficients:
## (Intercept) disp wt cyl qsec
## 30.17771 0.01029 -4.55318 -1.24109 0.55277
Edit:
As Kristoffer Winther Balling mentioned in the comments, a cleverer way to do this is to save the formula as a string (e.g. "mpg ~ disp + wt + cyl") and then use as.formula. You can then use familiar paste or other string manipulation functions to change the formula.

Dependent variable labels in stargazer tables

When passing a character vector to the dep.var.labels argument to stargazer, I expected the dependent variable labels to consist of this vector. But this only seems to happen when the models are of different types.
data(mtcars)
m0 <- lm(mpg ~ hp, data=mtcars)
m1 <- lm(mpg ~ wt, data=mtcars)
m2 <- glm(cyl ~ disp, data=mtcars)
## Only shows the label 'foo' for both models.
stargazer(m0, m1, dep.var.labels=c('foo','bar'))
## shows 'foo' and 'bar' as labels.
stargazer(m0, m2, dep.var.labels=c('foo','bar'))
How can I get stargazer to show different dependent variable labels even when the models are of the same type?
stargazer uses the same dependent variable because your dependent variables are the same, not because you are using the same kind of statistical model. You may be interested in using the column.labels argument:
data(mtcars)
m0 <- lm(mpg ~ hp, data=mtcars)
m1 <- lm(mpg ~ wt, data=mtcars)
m2 <- glm(cyl ~ disp, data=mtcars)
## Only shows the label 'foo' for both models.
stargazer(m0, m1, column.labels=c('foo','bar'), type="text")
## shows 'foo' and 'bar' as labels.
stargazer(m0, m2, column.labels=c('foo','bar'), type="text")

Linear regression with interaction fails in the rms-package

I'm playing around with interaction in the formula. I wondered if it's possible to do a regression with interaction for one of the two dummy variables. This seems to work in regular linear regression using the lm() function but with the ols() function in the rms package the same formula fails. Anyone know why?
Here's my example
data(mtcars)
mtcars$gear <- factor(mtcars$gear)
regular_lm <- lm(mpg ~ wt + cyl + gear + cyl:gear, data=mtcars)
summary(regular_lm)
regular_lm <- lm(mpg ~ wt + cyl + gear + cyl:I(gear == "4"), data=mtcars)
summary(regular_lm)
And now the rms example
library(rms)
dd <- datadist(mtcars)
options(datadist = "dd")
regular_ols <- ols(mpg ~ wt + cyl + gear + cyl:gear, data=mtcars)
regular_ols
# Fails with:
# Error in if (!length(fname) || !any(fname == zname)) { :
# missing value where TRUE/FALSE needed
regular_ols <- ols(mpg ~ wt + cyl + gear + cyl:I(gear == "4"), data=mtcars)
This experiment might not be the wisest statistic to do as it seems that the estimates change significantly but I'm a little curious to why ols() fails since it should do the "same fitting routines used by lm"
I don't know exactly, but it has to do with the way the formula is evaluated rather than with the way the fit is done once the model has been translated. Using traceback() shows that the problem occurs within Design(eval.parent(m)); using options(error=recover) gets you to the point where you can see that
Browse[1]> fname
[1] "wt" "cyl" "gear"
Browse[1]> zname
[1] NA
in other words, zname is some internal variable that hasn't been set right because the Design function can't quite handle defining the interaction between cylinders and the (gear==4) dummy on the fly.
This works though:
mtcars$cylgr <- with(mtcars,interaction(cyl,gear == "4"))
regular_ols <- ols(mpg ~ wt + cyl + gear + cylgr, data=mtcars)

Resources