Creating a matrix of summary output - r

How can I insert summary outputs from multiple regression analyses in a matrix type variable in R statistics package?
Here is my script, which runs the regression and collect intercepts and co-eff in a variable:
for (i in 2:(ncol(data.base))) {
Test <- lm(data.base[,i] ~ log(database$var.1))
results <- rbind(results, c(Test$coefficients))
}
I would like to do is to import summary(lm-test) for each regression in to a matrix type variable. I assume the matrix type variable is what I need.
I appreciate your help.

Yuck! Some nasty variable naming there, in my opinion.
I see data.base has outcomes, and you don't want the first column but each is a separate outcome. You also have database which is a data.frame with a variable var.1. Run each regression, store them in a matrix format.
This is a start:
fits <- apply(data.base[, -1], 2, function(y) lm(y ~ log(database$var.1))
summ <- lapply(fits, summary)
summ <- lapply(fits, coef)
Reduce(cbind, summ)

Related

Exporting output of custom multiple regressions from R to Latex

I am trying to export the results of multiple regressions in a single table. Ideally, it should be formatted similar to stargazer() output.
The problem is that I have not found reliably working R functions for the kind of regressions I need (Fama-MacBeth regressions), so I use my custom regression functions, which produce all necessary output (estimates of coefficients, standard errors, t-stat, R^2).
Does stargazer() or other similar function have the parameters, which allow me to export results of multiple regressions to Latex in a nice form when output of my regression is just a dataframe?
EDIT: I was just wondering whether it is possible to create publication-style tables, looking like this:
Here's a simple example that might help you forward (example is too long for a comment, so making this an answer):
library(stargazer)
library(broom)
## generate dummy data
set.seed(123)
x <- runif(1000)
z <- x^0.5
y <- x + z + rnorm(1000, sd=.05)
model1 <- lm(y ~ x)
model2 <- lm(y ~ z)
## transform model summaries into dataframes
tidy(model1) -> model1_tidy
tidy(model2) -> model2_tidy
merge(model1_tidy, model2_tidy, by='term', all.x=T, all.y=T) -> output
stargazer(output, type='latex', summary=FALSE)
You will need to figure out the column headers by yourself but I believe you get the idea.

Loop through a list of variables to add to a base survival model then keep the key output in a table

Two-part question:
Firstly, I have a list of n variables in a data frame that I want to sequentially substitute into a survival model (thus creating n new models), and from the output of each, I want to retain only the summary table line (HR, SE's etc) related to that variable (so an n-row table).
#create list of variables from dataset
bloods <- colnames(data)[c(123,127, 129:132, 135:140, 143:144, 190:195)]
then loop through creating a new model each time. The following doesn't work but not sure why...
for (i in 1:length(bloods)){
x <- coxph(Surv(time, event) ~ i + var1+var2+var3, data=data, na.action=na.omit)
}
Not sure how to select and append the first row of the summary table (summary(x)[7]) to a table each time? I suppose I must create the table before the loop?
Any help very much appreciated!
Consider lapply on a dynamic formula build which will result in a list of summary tables:
bloods <- colnames(data)[c(123,127, 129:132, 135:140, 143:144, 190:195)]
sumtables <- lapply(bloods, function(i) {
# STRING INTERPOLATION WITH sprintf, THEN CONVERTED TO FORMULA OBJECT
iformula <- as.formula(sprintf("Surv(time, event) ~ %s + var1+var2+var3", i))
# RUN MODEL REFERENCING DYNAMIC FORMULA
x <- coxph(iformula, data=data, na.action=na.omit)
# RETURN COEFF MATRIX RESULTS
summary(x)[7][[1]]
})

Ideas to re-write looping regression with 'for' loops

I'm having a brain freeze, and hoping one of you can point me in the right direction. My end goal is the output of various regression coefficients (mainly interested in price elasticity), which I achieved via simple multiple regression, using the "by" function.
I am using the "by" function to loop through the regression formula for each iteration of the "State.UPC" variable. Since my data is quite large (~1MM rows), I had to subset my data into groups of 3-4 states (see mystates1...mystates10). I am then performing the regression on those subsets, each time changing my data source in the "datastep3" data frame. And this is where I need your help:
What is the best way to efficiently re-write this with a combination of my existing "by" regression function, and the "for" loops, so I can bypass the step of constantly changing the data frame name in "datastep3" and the "write.csv" steps. Essentially R looping through each "mystates" data subset and doing the regression by the "State.UPC" attributes?
I have tried several combinations with no success. Pardon the amateurish question...still learning R. Here is my code:
data <-read.csv("PriceData.csv")
datastep1 <-subset(data, subset=c(X..Vol>0, Unit.Vol>0))
datastep2 <- transform(datastep1, State.UPC = paste(State,UPC, sep="."))
mystates1 <- c("AL","AR","AZ")
mystates2 <- c("CA","CO","FL")
mystates3 <- c("GA","IA","IL")
mystates4 <- c("IN","KS","KY")
mystates5 <- c("LA","MI","MN")
mystates6 <- c("MO","MS","NC")
mystates7 <- c("NJ","NM","NV")
mystates8 <- c("NY","OH","OK")
mystates9 <- c("SC","TN","TX")
mystates10 <- c("UT","VA","WI","WV")
datastep3 <-subset(datastep2, subset=State %in% mystates10)
datastep4 <-na.omit(datastep3)
PEbyItem <- by(datastep4, datastep4$State.UPC, function(df)
lm(log(Unit.Vol)~log(Price) + Distribution+Independence.Day+Labor.Day+Memorial.Day+Thanksgiving+Christmas+New.Years+
Year+Month, data=df))
x <- do.call("rbind",lapply(PEbyItem, coef))
y <-data.frame(x)
write.csv(x, file="mystates10.csv", row.names=TRUE)
Impossible to test this because you do not provide any data, but theoretically you could just combine the various mystatesN into a list and then run lapply(...) on that.
## Not tested...
get.PEbyItem <- function(i) {
datastep3 <-subset(datastep2, subset=State %in% mystates[[i]])
datastep4 <-na.omit(datastep3)
PEbyItem <- by(datastep4, datastep4$State.UPC, function(df)
lm(log(Unit.Vol)~log(Price) + Distribution+Independence.Day+Labor.Day+
Memorial.Day+Thanksgiving+Christmas+New.Years+Year+Month,
data=df))
x <- do.call("rbind",lapply(PEbyItem, coef))
y <-data.frame(x)
write.csv(x, file=paste(names(mystates[i]),"csv",sep="."), row.names=TRUE)
}
mystates <- list(ms1=mystates1, ms2=mystates2, ..., ms10=mystates10)
lapply(1:length(mystates),get.PEbyItem)
There are lots of other things that could be improved but without the dataset it's pointless to try.

apply series of commands to split data frame

I'm having some difficulties figuring out how to approach this problem. I have a data frame that I am splitting into distinct sites (link5). Once split I basically want to run a linear regression model on the subsets. Here is the code I'm working with, but it's definitely not correct. Also, It would be great if I could output the model results to a new data frame such that each site would have one row with the model parameter estimates - that is just a wish and not a necessity right now. Thank you for any help!
les_events <- split(les, les$link5)
result <- lapply(les_events) {
lm1 <-lm(cpe~K,data=les_events)
coef <- coef(lm1)
q.hat <- -coef(lm1)[2]
les_events$N0.hat <- coef(lm1[1]/q.hat)
}
You have a number of issues.
You haven't passed a function (the FUN argument) to lapply
Your closure ( The bit inside {} is almost, but not quite the body you want for your function)
something like th following will return the coefficients from your models
result <- lapply(les_events, function(DD){
lm1 <-lm(cpe~K,data=DD)
coef <- coef(lm1)
data.frame(as.list(coef))
})
This will return a list of data.frames containing columns for each coefficient.
lapply(les_events, lm, formula = 'cpe~K')
will return a list of linear model objects, which may be more useful.
For a more general split / apply / combine approaches use plyr or data.table
data.table
library(data.table)
DT <- data.table(les)
result <- les[, {lm1 <- lm(cpe ~ K, data = .SD)
as.list(lm1)}, by = link5]
plyr
library(plyr)
result <- ddply(les, .(link5), function(DD){
lm1 <-lm(cpe~K,data=DD)
coef <- coef(lm1)
data.frame(as.list(coef))
})
# or to return a list of linear model objects
dlply(les, link5, function(DD){ lm(cpe ~K, data =DD)})

Extracting p-value from lapply list of glm fits

I am using lapply to perform several glm regressions on one dependent variable by one independent variable at a time. Right now I am specifically interested in the Pr(>|z|) of each independent variable. However, I am unsure on how to report just Pr(>|z|) using the list from lapply.
If I was just running one model at a time:
coef(summary(fit))[,"Pr(>|z|)"]
or
summary(fit)$coefficients[,4]
Would work (as described here), but trying something similar with lapply does not seem to work. Can I get just the p-values using lapply and glm with an accessor method or from directly calling from the models?
#mtcars dataset
vars <- names(mtcars)[2:8]
fits <- lapply(vars, function(x) {glm(substitute(mpg ~ i, list(i = as.name(x))), family=binomial, data = mtcars)})
lapply(fits,summary) # this works
lapply(fits, coefficients) # this works
#lapply(fits, summary(fits)$coefficients[,4])# this for example does not work
You want to do:
lapply(fits, function(f) summary(f)$coefficients[,4])
However, if each item is just a p-value, you would probably rather have a vector than a list, so you could use sapply instead of lapply:
sapply(fits, function(f) summary(f)$coefficients[,4])
When you run lapply(fits, summary) it creates a list of summary.glm objects each of which is printed using print.summary.glm
If you save this
summaries <- lapply(fits, summary)
You can then go through and extract the coefficient matrix
coefmat <- lapply(summaries, '[[', 'coefficients')
and then the 4th column
lapply(coefmat, '[', , 4)

Resources