Exporting output of custom multiple regressions from R to Latex - r

I am trying to export the results of multiple regressions in a single table. Ideally, it should be formatted similar to stargazer() output.
The problem is that I have not found reliably working R functions for the kind of regressions I need (Fama-MacBeth regressions), so I use my custom regression functions, which produce all necessary output (estimates of coefficients, standard errors, t-stat, R^2).
Does stargazer() or other similar function have the parameters, which allow me to export results of multiple regressions to Latex in a nice form when output of my regression is just a dataframe?
EDIT: I was just wondering whether it is possible to create publication-style tables, looking like this:

Here's a simple example that might help you forward (example is too long for a comment, so making this an answer):
library(stargazer)
library(broom)
## generate dummy data
set.seed(123)
x <- runif(1000)
z <- x^0.5
y <- x + z + rnorm(1000, sd=.05)
model1 <- lm(y ~ x)
model2 <- lm(y ~ z)
## transform model summaries into dataframes
tidy(model1) -> model1_tidy
tidy(model2) -> model2_tidy
merge(model1_tidy, model2_tidy, by='term', all.x=T, all.y=T) -> output
stargazer(output, type='latex', summary=FALSE)
You will need to figure out the column headers by yourself but I believe you get the idea.

Related

Using Vector of Character Variables within certain Part of the lm() Function of R

I am performing a regression analysis within R that looks the following:
lm_carclass_mod <- lm(log(count_faves+1)~log(views+1)+dateadded+group_url+license+log(precontext.nextphoto.views+1)+log(precontext.prevphoto.views+1)+log(oid.Bridge+1)+log(oid.Face+1)+log(oid.Quail+1)+log(oid.Sky+1)+log(oid.Car+1)+log(oid.Auditorium+1)+log(oid.Font+1)+log(oid.Lane+1)+log(oid.Bmw+1)+log(oid.Racing+1)+log(oid.Wheel+1),data=flickrcar_wo_country)
confint(lm_carclass_mod,level=0.95)
summary(lm_carclass_mod)
The dependent variable as well as some of the independent variables are quite variable throughout my analysis, which is why I would like to keep inserting them manually.
However, I am looking for a way to replace all of the "oid. ..." variables with one single function.
So far I have come up with the following:
g <- paste("log(",variables,"+1)", collapse="+")
Unfortuntaley this does not work inside the lm() function. Neither does a formula like this:
g <- as.formula(
paste("log(",variables,"+1)", collapse="+")
)
The vector variables has the following elements in it:
variables <- ("oid.Bridge", "oid.Face", "oid.Quail", "oid.Off-roading", "oid.Sky", "oid.Car", "oid.Auditorium", "oid.Font", "oid.Lane", "oid.Bmw", "oid.Racing", "oid.Wheel")
In the end my regression model should look something like this:
lm_carclass_mod <- lm(log(count_faves+1)~log(views+1)+dateadded+group_url+license+log(precontext.nextphoto.views+1)+log(precontext.prevphoto.views+1)+g,data=flickrcar_wo_country)
confint(lm_carclass_mod,level=0.95)
summary(lm_carclass_mod)
Thanks for your helpm in advance!
You would need to convert both of the parts into a string and then make the formula:
#the manual bit
manual <- "log(count_faves+1)~log(views+1)+dateadded+group_url+license+log(precontext.nextphoto.views+1)+log(precontext.prevphoto.views+1)"
#the variables:
oid_variables <- c("oid.Bridge", "oid.Face", "oid.Quail", "oid.Off-roading", "oid.Sky", "oid.Car", "oid.Auditorium", "oid.Font", "oid.Lane", "oid.Bmw", "oid.Racing", "oid.Wheel")
#paste them together
g <- paste("log(", oid_variables, "+1)", collapse="+")
#make the formula
myformula <- as.formula(paste(manual, '+', g))
Then you add the formula into lm:
lm_carclass_mod <- lm(myformula, data=flickrcar_wo_country

Using list of LM estimates as stargazer input

I'm trying to use stargazer over a several LM estimates at once, say "OLS1",...,"OLS5".
I would usually insert them as separate arguments at the beginning of the stargazer input. What I'm looking for is a way to input them all with a list that contains them all, being one argument. Something like
stargazer(list,...)
stargazer arguments explanation states that
one or more model objects (for regression analysis tables) or data frames/vectors/matrices (for summary statistics, or direct output of content). They can also be included as lists (or even lists within lists).
I was wondering what is the correct way to gather LM estimates in a list so that this would work. When I just save the results in a list I get the following error
Error in list.of.objects[[i]] : subscript out of bounds
I will mention that I create the elements storing the estimate using assign. E.G:
assign(some_string,lm(...))
So what I have is a string, called some_string, and I want to put the LM result names some_string inside a list. Using get doesn't help with that.
EDIT: I think you want mget
library(stargazer)
Y <- rnorm(100)
X <- rnorm(100)
assign("string_1", lm(Y ~ X))
assign("string_2", lm(Y ~ X))
my_list <- mget(x = c("string_1", "string_2"))
stargazer(my_list)
works for me?
library(stargazer)
Y <- rnorm(100)
X <- rnorm(100)
fit_1 <- lm(Y ~ X)
fit_2 <- lm(Y ~ X)
stargazer(list(fit_1, fit_2))
did you name your list list? maybe it's grabbing the function?

Using Zelig with ggplot2, graphing simulations and models

I am attempting to using ggplot2 to graph some basic simulations and multi-variable regression models but am at a loss.
I am using Zelig 3.5 (as newer Zeligs have glitches with simulations)
Based on a blog I found, I tried this
AppMod1 <- (s1$qi)
AppMod1 <- data.frame(AppMod1$ev)
AppMod1 <- melt(AppMod1, measure=1:86)
AppMod1 <- ggplot(AppMod1, aes(approve, year)) +
geom_point() +
geom_smooth(colour="blue") +
theme_tufte()
AppMod1
`
This didn't work. I got an error
"Error: measure variables not found in data:NA"
My models are m1, m2, and m3, and my simulations are m1 and m2. I am using the "approval" data set which comes in Zelig.
The models are calculated as follows
data(approval)
m1 <- zelig(approve~avg.price, model="ls", data=approval)
m2 <- zelig(approve~avg.price+sept.oct.2001+iraq.war, model="ls", data=approval)
m3 <- zelig(approve~avg.price+sept.oct.2001+avg.price:sept.oct.2001, model="ls", data=approval)
And the simulations are
x1 <- setx(m2, sept.oct.2001= 1)
s1 <- sim(m2, x=x1)
summary(s1)
x1 <- setx(m2, sept.oct.2001= 0)
s1 <- sim(m2, x=x1)
summary(s1)
oilprice <- min(approval$avg.price):max(approval$avg.price)
x2 <- setx(m2, sept.oct.2001=0, avg.price=oilprice)
s2 <-sim (m2, x=x2)
plot.ci(s2)
oilprice <- min(approval$avg.price):max(approval$avg.price)
x2 <- setx(m2, sept.oct.2001=1, avg.price=oilprice)
s2 <-sim (m2, x=x2)
plot.ci(s2)
It looks like the error resulted from your call to melt.
Note that in the second line of code AppMod1 <- data.frame(AppMod1$ev) you overwrite the assignment you made in your first line of code AppMod1 <- (s1$qi). So after these two lines of code AppMod1 is equal to a data frame with the single column ev.
Now melt tries to melt this data frame and the call to melt indicates that there are 86 columns of measure.vars, when in fact there is only one column in the data frame. That results in the error you described.
I can't quite tell from your code what you're expecting AppMod1 to look like. When I run your code, s1$qi contains only NULL values. At the very least, you'll need AppMod1 to include columns for approve and year in order for your ggplot code to work as written.
Hopefully this is enough information to go on for now. It will be easier to provide additional help if you show what you expect AppMod1 to look like before and after the call to melt.

Ideas to re-write looping regression with 'for' loops

I'm having a brain freeze, and hoping one of you can point me in the right direction. My end goal is the output of various regression coefficients (mainly interested in price elasticity), which I achieved via simple multiple regression, using the "by" function.
I am using the "by" function to loop through the regression formula for each iteration of the "State.UPC" variable. Since my data is quite large (~1MM rows), I had to subset my data into groups of 3-4 states (see mystates1...mystates10). I am then performing the regression on those subsets, each time changing my data source in the "datastep3" data frame. And this is where I need your help:
What is the best way to efficiently re-write this with a combination of my existing "by" regression function, and the "for" loops, so I can bypass the step of constantly changing the data frame name in "datastep3" and the "write.csv" steps. Essentially R looping through each "mystates" data subset and doing the regression by the "State.UPC" attributes?
I have tried several combinations with no success. Pardon the amateurish question...still learning R. Here is my code:
data <-read.csv("PriceData.csv")
datastep1 <-subset(data, subset=c(X..Vol>0, Unit.Vol>0))
datastep2 <- transform(datastep1, State.UPC = paste(State,UPC, sep="."))
mystates1 <- c("AL","AR","AZ")
mystates2 <- c("CA","CO","FL")
mystates3 <- c("GA","IA","IL")
mystates4 <- c("IN","KS","KY")
mystates5 <- c("LA","MI","MN")
mystates6 <- c("MO","MS","NC")
mystates7 <- c("NJ","NM","NV")
mystates8 <- c("NY","OH","OK")
mystates9 <- c("SC","TN","TX")
mystates10 <- c("UT","VA","WI","WV")
datastep3 <-subset(datastep2, subset=State %in% mystates10)
datastep4 <-na.omit(datastep3)
PEbyItem <- by(datastep4, datastep4$State.UPC, function(df)
lm(log(Unit.Vol)~log(Price) + Distribution+Independence.Day+Labor.Day+Memorial.Day+Thanksgiving+Christmas+New.Years+
Year+Month, data=df))
x <- do.call("rbind",lapply(PEbyItem, coef))
y <-data.frame(x)
write.csv(x, file="mystates10.csv", row.names=TRUE)
Impossible to test this because you do not provide any data, but theoretically you could just combine the various mystatesN into a list and then run lapply(...) on that.
## Not tested...
get.PEbyItem <- function(i) {
datastep3 <-subset(datastep2, subset=State %in% mystates[[i]])
datastep4 <-na.omit(datastep3)
PEbyItem <- by(datastep4, datastep4$State.UPC, function(df)
lm(log(Unit.Vol)~log(Price) + Distribution+Independence.Day+Labor.Day+
Memorial.Day+Thanksgiving+Christmas+New.Years+Year+Month,
data=df))
x <- do.call("rbind",lapply(PEbyItem, coef))
y <-data.frame(x)
write.csv(x, file=paste(names(mystates[i]),"csv",sep="."), row.names=TRUE)
}
mystates <- list(ms1=mystates1, ms2=mystates2, ..., ms10=mystates10)
lapply(1:length(mystates),get.PEbyItem)
There are lots of other things that could be improved but without the dataset it's pointless to try.

Creating a matrix of summary output

How can I insert summary outputs from multiple regression analyses in a matrix type variable in R statistics package?
Here is my script, which runs the regression and collect intercepts and co-eff in a variable:
for (i in 2:(ncol(data.base))) {
Test <- lm(data.base[,i] ~ log(database$var.1))
results <- rbind(results, c(Test$coefficients))
}
I would like to do is to import summary(lm-test) for each regression in to a matrix type variable. I assume the matrix type variable is what I need.
I appreciate your help.
Yuck! Some nasty variable naming there, in my opinion.
I see data.base has outcomes, and you don't want the first column but each is a separate outcome. You also have database which is a data.frame with a variable var.1. Run each regression, store them in a matrix format.
This is a start:
fits <- apply(data.base[, -1], 2, function(y) lm(y ~ log(database$var.1))
summ <- lapply(fits, summary)
summ <- lapply(fits, coef)
Reduce(cbind, summ)

Resources