Extracting p-value from lapply list of glm fits - r

I am using lapply to perform several glm regressions on one dependent variable by one independent variable at a time. Right now I am specifically interested in the Pr(>|z|) of each independent variable. However, I am unsure on how to report just Pr(>|z|) using the list from lapply.
If I was just running one model at a time:
coef(summary(fit))[,"Pr(>|z|)"]
or
summary(fit)$coefficients[,4]
Would work (as described here), but trying something similar with lapply does not seem to work. Can I get just the p-values using lapply and glm with an accessor method or from directly calling from the models?
#mtcars dataset
vars <- names(mtcars)[2:8]
fits <- lapply(vars, function(x) {glm(substitute(mpg ~ i, list(i = as.name(x))), family=binomial, data = mtcars)})
lapply(fits,summary) # this works
lapply(fits, coefficients) # this works
#lapply(fits, summary(fits)$coefficients[,4])# this for example does not work

You want to do:
lapply(fits, function(f) summary(f)$coefficients[,4])
However, if each item is just a p-value, you would probably rather have a vector than a list, so you could use sapply instead of lapply:
sapply(fits, function(f) summary(f)$coefficients[,4])

When you run lapply(fits, summary) it creates a list of summary.glm objects each of which is printed using print.summary.glm
If you save this
summaries <- lapply(fits, summary)
You can then go through and extract the coefficient matrix
coefmat <- lapply(summaries, '[[', 'coefficients')
and then the 4th column
lapply(coefmat, '[', , 4)

Related

Extracting p-values from a large list of lm

I am trying to extract p-values from a large list of lm which I created by:
S_models <- dlply(S, "BayStation", function(df)
lm(Temp ~ Date, data = df))
This leaves me with 24 lists of lm for each group (BayStation). I was able to extract the intersect and slope using:
coef<-ldply(S_models, coef)
However, I cannot seem to figure out how to get a list or dataframe of p-values from this large list without doing it individually or manually.
Try using sapply/lapply :
result <- sapply(S_models, function(x) summary(x)$coefficients[,4])
#With `lapply`
#result <- lapply(S_models, function(x) summary(x)$coefficients[,4])
We can use map
library(purrr)
result <- map(S_models, ~ summary(.x)$coefficients[,4])
The following uses base R only and gets the p-values with the function pf. The quantile and the degrees of freedom are in the list returned by summary.
sapply(S_models, function(x){
ff <- summary(x)$fstatistic
pf(ff[1], df1 = ff[2], df2 = ff[3], lower.tail = FALSE)
})

Using list of LM estimates as stargazer input

I'm trying to use stargazer over a several LM estimates at once, say "OLS1",...,"OLS5".
I would usually insert them as separate arguments at the beginning of the stargazer input. What I'm looking for is a way to input them all with a list that contains them all, being one argument. Something like
stargazer(list,...)
stargazer arguments explanation states that
one or more model objects (for regression analysis tables) or data frames/vectors/matrices (for summary statistics, or direct output of content). They can also be included as lists (or even lists within lists).
I was wondering what is the correct way to gather LM estimates in a list so that this would work. When I just save the results in a list I get the following error
Error in list.of.objects[[i]] : subscript out of bounds
I will mention that I create the elements storing the estimate using assign. E.G:
assign(some_string,lm(...))
So what I have is a string, called some_string, and I want to put the LM result names some_string inside a list. Using get doesn't help with that.
EDIT: I think you want mget
library(stargazer)
Y <- rnorm(100)
X <- rnorm(100)
assign("string_1", lm(Y ~ X))
assign("string_2", lm(Y ~ X))
my_list <- mget(x = c("string_1", "string_2"))
stargazer(my_list)
works for me?
library(stargazer)
Y <- rnorm(100)
X <- rnorm(100)
fit_1 <- lm(Y ~ X)
fit_2 <- lm(Y ~ X)
stargazer(list(fit_1, fit_2))
did you name your list list? maybe it's grabbing the function?

Using apply to loop over different datasets in a regression

I found this way of looping over variables in an lm() when the variable names are stored as characters (http://www.ats.ucla.edu/stat/r/pages/looping_strings.htm):
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})
My first question is: Is there a more efficient/faster way?
What if I want to loop over different data instead of looping over variables?
Example:
reg1 <- lm(a~b, data=dataset1)
reg2 <- lm(a~b, data=dataset2)
Can I apply something similar to the code shown above? Using the substitute function for the data did not work.
Thank You!
The substitute in your example is used to construct the formula. If you want to to apply lm to a number of data.frames use:
lapply(list(dataset1, dataset2), lm, formula = a ~ b)

apply series of commands to split data frame

I'm having some difficulties figuring out how to approach this problem. I have a data frame that I am splitting into distinct sites (link5). Once split I basically want to run a linear regression model on the subsets. Here is the code I'm working with, but it's definitely not correct. Also, It would be great if I could output the model results to a new data frame such that each site would have one row with the model parameter estimates - that is just a wish and not a necessity right now. Thank you for any help!
les_events <- split(les, les$link5)
result <- lapply(les_events) {
lm1 <-lm(cpe~K,data=les_events)
coef <- coef(lm1)
q.hat <- -coef(lm1)[2]
les_events$N0.hat <- coef(lm1[1]/q.hat)
}
You have a number of issues.
You haven't passed a function (the FUN argument) to lapply
Your closure ( The bit inside {} is almost, but not quite the body you want for your function)
something like th following will return the coefficients from your models
result <- lapply(les_events, function(DD){
lm1 <-lm(cpe~K,data=DD)
coef <- coef(lm1)
data.frame(as.list(coef))
})
This will return a list of data.frames containing columns for each coefficient.
lapply(les_events, lm, formula = 'cpe~K')
will return a list of linear model objects, which may be more useful.
For a more general split / apply / combine approaches use plyr or data.table
data.table
library(data.table)
DT <- data.table(les)
result <- les[, {lm1 <- lm(cpe ~ K, data = .SD)
as.list(lm1)}, by = link5]
plyr
library(plyr)
result <- ddply(les, .(link5), function(DD){
lm1 <-lm(cpe~K,data=DD)
coef <- coef(lm1)
data.frame(as.list(coef))
})
# or to return a list of linear model objects
dlply(les, link5, function(DD){ lm(cpe ~K, data =DD)})

Creating a matrix of summary output

How can I insert summary outputs from multiple regression analyses in a matrix type variable in R statistics package?
Here is my script, which runs the regression and collect intercepts and co-eff in a variable:
for (i in 2:(ncol(data.base))) {
Test <- lm(data.base[,i] ~ log(database$var.1))
results <- rbind(results, c(Test$coefficients))
}
I would like to do is to import summary(lm-test) for each regression in to a matrix type variable. I assume the matrix type variable is what I need.
I appreciate your help.
Yuck! Some nasty variable naming there, in my opinion.
I see data.base has outcomes, and you don't want the first column but each is a separate outcome. You also have database which is a data.frame with a variable var.1. Run each regression, store them in a matrix format.
This is a start:
fits <- apply(data.base[, -1], 2, function(y) lm(y ~ log(database$var.1))
summ <- lapply(fits, summary)
summ <- lapply(fits, coef)
Reduce(cbind, summ)

Resources