Extracting p-values from a large list of lm - r

I am trying to extract p-values from a large list of lm which I created by:
S_models <- dlply(S, "BayStation", function(df)
lm(Temp ~ Date, data = df))
This leaves me with 24 lists of lm for each group (BayStation). I was able to extract the intersect and slope using:
coef<-ldply(S_models, coef)
However, I cannot seem to figure out how to get a list or dataframe of p-values from this large list without doing it individually or manually.

Try using sapply/lapply :
result <- sapply(S_models, function(x) summary(x)$coefficients[,4])
#With `lapply`
#result <- lapply(S_models, function(x) summary(x)$coefficients[,4])

We can use map
library(purrr)
result <- map(S_models, ~ summary(.x)$coefficients[,4])

The following uses base R only and gets the p-values with the function pf. The quantile and the degrees of freedom are in the list returned by summary.
sapply(S_models, function(x){
ff <- summary(x)$fstatistic
pf(ff[1], df1 = ff[2], df2 = ff[3], lower.tail = FALSE)
})

Related

Looping over objects in R

I am trying to loop over objects in R.
myfunc.linear.pred <- function(x){
linear.pred <- predict(object = x)
w <- exp(linear.pred)/(1+exp(linear.pred))
as.vector(w)
}
The function here works perfectly as it should. It returns a vector of 48 rows and it comes from the object x. Now 'x' is nothing but the full regression model from a GLM function (think: mod.fit <- glm (dep~indep, data = data)). The problem is that I have 20 different such ('mod.fit') objects and need to find predictions for each of these. I could literally repeat the code, but I was looking to find a neater solution. So what I want is a matrix with 48 rows and 20 columns for the above function. This is probably basic for an advanced user, but I have only ever used "apply" and "for" loops for numbers and never objects. I looked into lapply but couldn't figure it out.
I tried: (and this is probably dumb)
allmodels <- c(mod.fit, mod.fit2, mod.fit3)
lpred.matrix <- matrix(data=NA, nrow=48, ncol=20)
for(i in allmodels){
lpred.matrix[i,] <- myfunc.linear.pred(i)
}
which obviously won't work because allmodels has a class of "list" and it contains all the stuff from the GLM function. Hope someone can help. Thanks!
In order to use lapply, you must have a list object not a vector object. Something like this should work:
## Load data
data("mtcars")
# fit models
mod.fit1 <- glm (mpg~disp, data = mtcars)
mod.fit2 <- glm (mpg~drat, data = mtcars)
mod.fit3 <- glm (mpg~wt, data = mtcars)
# build function
myfunc.linear.pred <- function(x){
linear.pred <- predict(object = x)
w <- exp(linear.pred)/(1+exp(linear.pred))
as.vector(w)
}
# put models in a list
allmodels <- list("mod1" = mod.fit1, "mod2" = mod.fit2, "mod2" =
mod.fit3)
# use lapply and do.call to generate matrix of prediction results
df <- do.call('cbind', lapply(allmodels, function(x){
a <- myfunc.linear.pred(x)
}))
Hope this helps

lm function gives estimate for the y-variable also

I am trying to run a simple lm model. I am using the following
dt <- data.table(
y=rnorm(100,0,1),
x1=rnorm(100,0,1),
x2=rnorm(100,0,1),
x3=rnorm(100,0,1))
y_var2 <- names(dt)[names(dt)%like%"y"]
x_var2 <- names(dt)[names(dt)%like%"x"]
tmp2 <- summary(a <- lm(get(y_var2)~.,dt[,c(x_var2,y_var2),with=F]))
coefs2 <- as.data.table(tmp2$coefficients,keep.rownames = T)
So in the end, coefs2 should contain the estimates, p-values etc. But in the last row of the coefs2 i also see the y-variable.
But if I use
tmp2 <- summary(a <- lm(y~.,dt[,c(x_var2,y_var2),with=F]))
Then this does not happen. Why is that ?
This has to do with how R stores variables. y_var2 is a character "y" and you fill it into the formula as a character variable which you wish to model with all variables in your data.table dt. However, you have to tell R that you wish to evaluate the formula y~. and not "y"~. which are two different expressions for R.
lm( formula(paste(y_var2,"~.")),dt[,c(x_var2,y_var2),with=F])
will do the trick. formula constructs a formula out of the string variable with which a contructed the expression.
Actually it would probably be cleaner just to make the formula with reformulate() and the data= parameter of lm
tmp2 <- summary(a <- lm(reformulate(x_var2, y_var2), dt))

Deduplicate a list of lm objects in R

I have a list of lm models objects with possible repeated, so I'd like to find a way of checking if some of these lm objects are equal, if so them delete it. In words, I want to "deduplicate" my list.
I'd appreciate very much any help.
An example of the problem:
## Creates outcome and predictors
outcome <- c(names(mtcars)[1:3])
predictors <- c(names(mtcars)[4:11])
dataset <- mtcars
## Creates model list
model_list <- lapply(seq_along((predictors)), function(n) {
left_hand_side <- outcome[1]
right_hand_side <- apply(X = combn(predictors, n), MARGIN = 2, paste, collapse = " + ")
paste(left_hand_side, right_hand_side, sep = " ~ ")
})
## Convert model list into a verctor
model_vector <- unlist(model_list)
## Fit linear models to all itens from the vector of models
list_of_fit <- lapply(model_vector, function(x) {
formula <- as.formula(x)
fit <- step(lm(formula, data = dataset))
fit
})
# Exclude possible missing
list_of_fit <- Filter(Negate(function(x) is.null(unlist(x))), list_of_fit)
# These models are the same in my list
lm253 <- list_of_fit[[253]];lm253
lm254 <- list_of_fit[[254]];lm254
lm255 <- list_of_fit[[255]];lm255
I want to exclude duplicated entries in list_of_fit.
It seems wasteful to fit so many models and then throw away most of them. Your object names make your code hard to read for me, but it seems your models can be distinguished based on their formula. Maybe this helps:
lista_de_ajustes[!duplicated(vapply(lista_de_ajustes,
function(m) deparse(m$call),
FUN.VALUE = "a"))]
I made a simple correction in you code Roland, so it worked for me.
I changed from deparse(m$call) to deparse(formula(m)), due this I'm able to compare the complete formulas.
lista_de_ajustes[!duplicated(vapply(lista_de_ajustes, function(m) deparse(formula(m)), FUN.VALUE = "a"))]
Thank you very much!

apply series of commands to split data frame

I'm having some difficulties figuring out how to approach this problem. I have a data frame that I am splitting into distinct sites (link5). Once split I basically want to run a linear regression model on the subsets. Here is the code I'm working with, but it's definitely not correct. Also, It would be great if I could output the model results to a new data frame such that each site would have one row with the model parameter estimates - that is just a wish and not a necessity right now. Thank you for any help!
les_events <- split(les, les$link5)
result <- lapply(les_events) {
lm1 <-lm(cpe~K,data=les_events)
coef <- coef(lm1)
q.hat <- -coef(lm1)[2]
les_events$N0.hat <- coef(lm1[1]/q.hat)
}
You have a number of issues.
You haven't passed a function (the FUN argument) to lapply
Your closure ( The bit inside {} is almost, but not quite the body you want for your function)
something like th following will return the coefficients from your models
result <- lapply(les_events, function(DD){
lm1 <-lm(cpe~K,data=DD)
coef <- coef(lm1)
data.frame(as.list(coef))
})
This will return a list of data.frames containing columns for each coefficient.
lapply(les_events, lm, formula = 'cpe~K')
will return a list of linear model objects, which may be more useful.
For a more general split / apply / combine approaches use plyr or data.table
data.table
library(data.table)
DT <- data.table(les)
result <- les[, {lm1 <- lm(cpe ~ K, data = .SD)
as.list(lm1)}, by = link5]
plyr
library(plyr)
result <- ddply(les, .(link5), function(DD){
lm1 <-lm(cpe~K,data=DD)
coef <- coef(lm1)
data.frame(as.list(coef))
})
# or to return a list of linear model objects
dlply(les, link5, function(DD){ lm(cpe ~K, data =DD)})

Extracting p-value from lapply list of glm fits

I am using lapply to perform several glm regressions on one dependent variable by one independent variable at a time. Right now I am specifically interested in the Pr(>|z|) of each independent variable. However, I am unsure on how to report just Pr(>|z|) using the list from lapply.
If I was just running one model at a time:
coef(summary(fit))[,"Pr(>|z|)"]
or
summary(fit)$coefficients[,4]
Would work (as described here), but trying something similar with lapply does not seem to work. Can I get just the p-values using lapply and glm with an accessor method or from directly calling from the models?
#mtcars dataset
vars <- names(mtcars)[2:8]
fits <- lapply(vars, function(x) {glm(substitute(mpg ~ i, list(i = as.name(x))), family=binomial, data = mtcars)})
lapply(fits,summary) # this works
lapply(fits, coefficients) # this works
#lapply(fits, summary(fits)$coefficients[,4])# this for example does not work
You want to do:
lapply(fits, function(f) summary(f)$coefficients[,4])
However, if each item is just a p-value, you would probably rather have a vector than a list, so you could use sapply instead of lapply:
sapply(fits, function(f) summary(f)$coefficients[,4])
When you run lapply(fits, summary) it creates a list of summary.glm objects each of which is printed using print.summary.glm
If you save this
summaries <- lapply(fits, summary)
You can then go through and extract the coefficient matrix
coefmat <- lapply(summaries, '[[', 'coefficients')
and then the 4th column
lapply(coefmat, '[', , 4)

Resources