Using list of LM estimates as stargazer input - r

I'm trying to use stargazer over a several LM estimates at once, say "OLS1",...,"OLS5".
I would usually insert them as separate arguments at the beginning of the stargazer input. What I'm looking for is a way to input them all with a list that contains them all, being one argument. Something like
stargazer(list,...)
stargazer arguments explanation states that
one or more model objects (for regression analysis tables) or data frames/vectors/matrices (for summary statistics, or direct output of content). They can also be included as lists (or even lists within lists).
I was wondering what is the correct way to gather LM estimates in a list so that this would work. When I just save the results in a list I get the following error
Error in list.of.objects[[i]] : subscript out of bounds
I will mention that I create the elements storing the estimate using assign. E.G:
assign(some_string,lm(...))
So what I have is a string, called some_string, and I want to put the LM result names some_string inside a list. Using get doesn't help with that.

EDIT: I think you want mget
library(stargazer)
Y <- rnorm(100)
X <- rnorm(100)
assign("string_1", lm(Y ~ X))
assign("string_2", lm(Y ~ X))
my_list <- mget(x = c("string_1", "string_2"))
stargazer(my_list)
works for me?
library(stargazer)
Y <- rnorm(100)
X <- rnorm(100)
fit_1 <- lm(Y ~ X)
fit_2 <- lm(Y ~ X)
stargazer(list(fit_1, fit_2))
did you name your list list? maybe it's grabbing the function?

Related

using lm function in R with a variable name in a loop

I am trying to create a simple linear model in R a for loop where one of the variables will be specified as a parameter and thus looped through, creating a different model for each pass of the loop. The following does NOT work:
model <- lm(test_par[i] ~ weeks, data=all_data_plant)
If I tried the same model with the "test_par[i]" replaced with the variable's explicit name, it works just as expected:
model <- lm(weight_dry ~ weeks, data=all_data_plant)
I tried reformulate and paste ineffectively. Any thoughts?
Maybe try something like this:
n <- #add the column position of first variable
m <- #add the column position of last variable
lm_models <- lapply(n:m, function(x) lm(all_data_plant[,x] ~ weeks, data=all_data_plant))
You can pass the argument "formula" in lm() as character using paste(). Here a working example:
data("trees")
test_par <- names(trees)
model <- lm(Girth ~ Height, data = trees)
model <- lm("Girth ~ Height", data = trees) # character formula works
model <- lm(paste(test_par[1], "~ Height"), data=trees)

Error including correlation structure in function with gamm

I am trying to create my own function that contains 1.) the mgcv gamm function and 2.) a nested autocorrelation (ARMA) argument. I am getting an error when I try to run the function like this:
df <- AirPassengers
df <- as.data.frame(df)
df$month <- rep(1:12)
df$yr <- rep(1949:1960,each=12)
df$datediff <- 1:nrow(df)
try_fxn1 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+s(datediff,bs="ts",k=20), data=dfz,correlation = corARMA(form = ~ 1|yr, p=2))}
try_fxn1(df,"x")
Error in eval(predvars, data, env) : object 'dfz' not found
I know the issue is with the correlation portion of the formula, as when I run the same function without the correlation structure included (as seen below), the function behaves as expected.
try_fxn2 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+ s(datediff,bs="ts",k=20), data=dfz)}
try_fxn2(df,"x")
Any ideas on how I can modify try_fxn1 to make the function behave as expected? Thank you!
You are confusing a vector with the symbolic representation of that vector when building a formula.
You don't want dfz[[colz]] as the response in the formula, you want x or whatever you set colz too. What you are getting is
dfz[[colz]] ~ ...
when what you really want is the variable colz:
colz ~ ...
And you don't want a literal colz but whatever colz evaluates to. To do this you can create a formula by pasting the parts together:
fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
This turns colz into whatever it was storing, not the literal colz:
> fml
[1] "x ~ s(month, bs=\"cc\", k=12) + s(datediff,bs=\"ts\",k=20)"
Then convert the string into a formula object using formula() or as.formula().
The final solution then is:
fit_fun <- function(dfz, colz) {
fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
fml <- formula(fml)
gamm(fml, data = df, correlation = corARMA(form = ~ 1|yr, p=2))
}
This really is not an issue with corARMA() part, other than that triggers somewhat different evaluation code for the formula. The guiding mantra here is to always get a formula as you would type it if not programming with formulas. You would never (or should never) write a formula like
gamm(df[[var]] ~ x + s(z), ....)
While this might work in some settings, it will fail miserably if you ever want to use predict()` and it fails when you have to do something a little more complicated.

Using Vector of Character Variables within certain Part of the lm() Function of R

I am performing a regression analysis within R that looks the following:
lm_carclass_mod <- lm(log(count_faves+1)~log(views+1)+dateadded+group_url+license+log(precontext.nextphoto.views+1)+log(precontext.prevphoto.views+1)+log(oid.Bridge+1)+log(oid.Face+1)+log(oid.Quail+1)+log(oid.Sky+1)+log(oid.Car+1)+log(oid.Auditorium+1)+log(oid.Font+1)+log(oid.Lane+1)+log(oid.Bmw+1)+log(oid.Racing+1)+log(oid.Wheel+1),data=flickrcar_wo_country)
confint(lm_carclass_mod,level=0.95)
summary(lm_carclass_mod)
The dependent variable as well as some of the independent variables are quite variable throughout my analysis, which is why I would like to keep inserting them manually.
However, I am looking for a way to replace all of the "oid. ..." variables with one single function.
So far I have come up with the following:
g <- paste("log(",variables,"+1)", collapse="+")
Unfortuntaley this does not work inside the lm() function. Neither does a formula like this:
g <- as.formula(
paste("log(",variables,"+1)", collapse="+")
)
The vector variables has the following elements in it:
variables <- ("oid.Bridge", "oid.Face", "oid.Quail", "oid.Off-roading", "oid.Sky", "oid.Car", "oid.Auditorium", "oid.Font", "oid.Lane", "oid.Bmw", "oid.Racing", "oid.Wheel")
In the end my regression model should look something like this:
lm_carclass_mod <- lm(log(count_faves+1)~log(views+1)+dateadded+group_url+license+log(precontext.nextphoto.views+1)+log(precontext.prevphoto.views+1)+g,data=flickrcar_wo_country)
confint(lm_carclass_mod,level=0.95)
summary(lm_carclass_mod)
Thanks for your helpm in advance!
You would need to convert both of the parts into a string and then make the formula:
#the manual bit
manual <- "log(count_faves+1)~log(views+1)+dateadded+group_url+license+log(precontext.nextphoto.views+1)+log(precontext.prevphoto.views+1)"
#the variables:
oid_variables <- c("oid.Bridge", "oid.Face", "oid.Quail", "oid.Off-roading", "oid.Sky", "oid.Car", "oid.Auditorium", "oid.Font", "oid.Lane", "oid.Bmw", "oid.Racing", "oid.Wheel")
#paste them together
g <- paste("log(", oid_variables, "+1)", collapse="+")
#make the formula
myformula <- as.formula(paste(manual, '+', g))
Then you add the formula into lm:
lm_carclass_mod <- lm(myformula, data=flickrcar_wo_country

Exporting output of custom multiple regressions from R to Latex

I am trying to export the results of multiple regressions in a single table. Ideally, it should be formatted similar to stargazer() output.
The problem is that I have not found reliably working R functions for the kind of regressions I need (Fama-MacBeth regressions), so I use my custom regression functions, which produce all necessary output (estimates of coefficients, standard errors, t-stat, R^2).
Does stargazer() or other similar function have the parameters, which allow me to export results of multiple regressions to Latex in a nice form when output of my regression is just a dataframe?
EDIT: I was just wondering whether it is possible to create publication-style tables, looking like this:
Here's a simple example that might help you forward (example is too long for a comment, so making this an answer):
library(stargazer)
library(broom)
## generate dummy data
set.seed(123)
x <- runif(1000)
z <- x^0.5
y <- x + z + rnorm(1000, sd=.05)
model1 <- lm(y ~ x)
model2 <- lm(y ~ z)
## transform model summaries into dataframes
tidy(model1) -> model1_tidy
tidy(model2) -> model2_tidy
merge(model1_tidy, model2_tidy, by='term', all.x=T, all.y=T) -> output
stargazer(output, type='latex', summary=FALSE)
You will need to figure out the column headers by yourself but I believe you get the idea.

Extracting p-value from lapply list of glm fits

I am using lapply to perform several glm regressions on one dependent variable by one independent variable at a time. Right now I am specifically interested in the Pr(>|z|) of each independent variable. However, I am unsure on how to report just Pr(>|z|) using the list from lapply.
If I was just running one model at a time:
coef(summary(fit))[,"Pr(>|z|)"]
or
summary(fit)$coefficients[,4]
Would work (as described here), but trying something similar with lapply does not seem to work. Can I get just the p-values using lapply and glm with an accessor method or from directly calling from the models?
#mtcars dataset
vars <- names(mtcars)[2:8]
fits <- lapply(vars, function(x) {glm(substitute(mpg ~ i, list(i = as.name(x))), family=binomial, data = mtcars)})
lapply(fits,summary) # this works
lapply(fits, coefficients) # this works
#lapply(fits, summary(fits)$coefficients[,4])# this for example does not work
You want to do:
lapply(fits, function(f) summary(f)$coefficients[,4])
However, if each item is just a p-value, you would probably rather have a vector than a list, so you could use sapply instead of lapply:
sapply(fits, function(f) summary(f)$coefficients[,4])
When you run lapply(fits, summary) it creates a list of summary.glm objects each of which is printed using print.summary.glm
If you save this
summaries <- lapply(fits, summary)
You can then go through and extract the coefficient matrix
coefmat <- lapply(summaries, '[[', 'coefficients')
and then the 4th column
lapply(coefmat, '[', , 4)

Resources