Use mapply to fit list of lm models to list of data frames [R] - r

Is this possible? I can get mapply to work with the help examples, but I can't get a trivial example with lm to work. Here's my attempt which returns a matrix, instead of a list of lm objects.
temp.df <- list(
data.frame(a = rep(1:10, each = 10), b = 1:100, c = rnorm(100), d = rnorm(100, 2))
)
temp.df[[2]] <- subset(temp.df[[1]], a > 2)
temp.mod <- list(a ~ b,
a ~ b + c,
a ~ b + c + d)
temp.lm <- mapply(lm, formula = temp.mod, data = temp.df[c(1,1,2)])
temp.sum <- lapply(temp.lm, summary)
Should I just stick with lapply and specifying data = each time? Thanks!

Not sure what you mean by specifying data each time, but if you pack everything into a bigger (nested) list, and write your own wrapper function that calls lm() and summary(), lapply is a good option:
bigList <- list(m1=list(dat=temp.df[[1]],mod=temp.mod[[1]]),
m1=list(dat=temp.df[[2]],mod=temp.mod[[2]]))
fitLM <- function(x){
lm1 <- lm(x$mod,data=x$dat)
return(summary(lm1))
}
temp.lm <- lapply(bigList,FUN=fitLM)
Edit: Just to follow up on the mapply behavior, when I run your code, I get a 12x3 matrix that actually does contain all the relevant lm object information, but the class attribute has been lost. Resetting that and then lapply-ing summary() works with your original code, I believe. But I think a nested list of arguments and lapply() is simpler in this case.
lmList <- list(a=temp.lm[,1],b=temp.lm[,2],c=temp.lm[,3])
lmList <- lapply(lmList,function(x){class(x) <- "lm";return(x)})
temp.sum <- lapply(lmList, summary)

Related

Creating a loop through a list of variables for an LM model in R

I am trying to create multiple linear regression models from a list of variable combinations (I also have them separately as a data-frame if that is more useful!)
The list of variables looks like this:
Vars
x1+x2+x3
x1+x2+x4
x1+x2+x5
x1+x2+x6
x1+x2+x7
The loop I'm using looks like this:
for (i in 1:length(var_list)){
lm(independent_variable ~ var_list[i],data = training_data)
i+1
}
However it is not recognizing the string of var_list[i] which gives x1+x2+x3 etc. as a model input.
Does any-one know how to fix it?
Thanks for your help.
You don't even have to use loops. Apply should work nicely.
training_data <- as.data.frame(matrix(sample(1:64), nrow = 8))
colnames(training_data) <- c("independent_variable", paste0("x", 1:7))
Vars <- as.list(c("x1+x2+x3",
"x1+x2+x4",
"x1+x2+x5",
"x1+x2+x6",
"x1+x2+x7"))
allModelsList <- lapply(paste("independent_variable ~", Vars), as.formula)
allModelsResults <- lapply(allModelsList, function(x) lm(x, data = training_data))
If you need models summaries you can add :
allModelsSummaries = lapply(allModelsResults, summary)
For example you can access the coefficient R² of the model lm(independent_variable ~ x1+x2+x3) by doing this:
allModelsSummaries[[1]]$r.squared
I hope it helps.
We can create the formula with paste
out <- vector('list', length(var_list))
for (i in seq_along(var_list)){
out[[i]] <- lm(paste('independent_variable', '~', var_list[i]),
data = training_data)
}
Or otherwise, it can be done with reformulate
lm(reformulate(var_list[i], 'independent_variable'), data = training_data)

Looping over objects in R

I am trying to loop over objects in R.
myfunc.linear.pred <- function(x){
linear.pred <- predict(object = x)
w <- exp(linear.pred)/(1+exp(linear.pred))
as.vector(w)
}
The function here works perfectly as it should. It returns a vector of 48 rows and it comes from the object x. Now 'x' is nothing but the full regression model from a GLM function (think: mod.fit <- glm (dep~indep, data = data)). The problem is that I have 20 different such ('mod.fit') objects and need to find predictions for each of these. I could literally repeat the code, but I was looking to find a neater solution. So what I want is a matrix with 48 rows and 20 columns for the above function. This is probably basic for an advanced user, but I have only ever used "apply" and "for" loops for numbers and never objects. I looked into lapply but couldn't figure it out.
I tried: (and this is probably dumb)
allmodels <- c(mod.fit, mod.fit2, mod.fit3)
lpred.matrix <- matrix(data=NA, nrow=48, ncol=20)
for(i in allmodels){
lpred.matrix[i,] <- myfunc.linear.pred(i)
}
which obviously won't work because allmodels has a class of "list" and it contains all the stuff from the GLM function. Hope someone can help. Thanks!
In order to use lapply, you must have a list object not a vector object. Something like this should work:
## Load data
data("mtcars")
# fit models
mod.fit1 <- glm (mpg~disp, data = mtcars)
mod.fit2 <- glm (mpg~drat, data = mtcars)
mod.fit3 <- glm (mpg~wt, data = mtcars)
# build function
myfunc.linear.pred <- function(x){
linear.pred <- predict(object = x)
w <- exp(linear.pred)/(1+exp(linear.pred))
as.vector(w)
}
# put models in a list
allmodels <- list("mod1" = mod.fit1, "mod2" = mod.fit2, "mod2" =
mod.fit3)
# use lapply and do.call to generate matrix of prediction results
df <- do.call('cbind', lapply(allmodels, function(x){
a <- myfunc.linear.pred(x)
}))
Hope this helps

Deduplicate a list of lm objects in R

I have a list of lm models objects with possible repeated, so I'd like to find a way of checking if some of these lm objects are equal, if so them delete it. In words, I want to "deduplicate" my list.
I'd appreciate very much any help.
An example of the problem:
## Creates outcome and predictors
outcome <- c(names(mtcars)[1:3])
predictors <- c(names(mtcars)[4:11])
dataset <- mtcars
## Creates model list
model_list <- lapply(seq_along((predictors)), function(n) {
left_hand_side <- outcome[1]
right_hand_side <- apply(X = combn(predictors, n), MARGIN = 2, paste, collapse = " + ")
paste(left_hand_side, right_hand_side, sep = " ~ ")
})
## Convert model list into a verctor
model_vector <- unlist(model_list)
## Fit linear models to all itens from the vector of models
list_of_fit <- lapply(model_vector, function(x) {
formula <- as.formula(x)
fit <- step(lm(formula, data = dataset))
fit
})
# Exclude possible missing
list_of_fit <- Filter(Negate(function(x) is.null(unlist(x))), list_of_fit)
# These models are the same in my list
lm253 <- list_of_fit[[253]];lm253
lm254 <- list_of_fit[[254]];lm254
lm255 <- list_of_fit[[255]];lm255
I want to exclude duplicated entries in list_of_fit.
It seems wasteful to fit so many models and then throw away most of them. Your object names make your code hard to read for me, but it seems your models can be distinguished based on their formula. Maybe this helps:
lista_de_ajustes[!duplicated(vapply(lista_de_ajustes,
function(m) deparse(m$call),
FUN.VALUE = "a"))]
I made a simple correction in you code Roland, so it worked for me.
I changed from deparse(m$call) to deparse(formula(m)), due this I'm able to compare the complete formulas.
lista_de_ajustes[!duplicated(vapply(lista_de_ajustes, function(m) deparse(formula(m)), FUN.VALUE = "a"))]
Thank you very much!

variable scope & resolution in R function

I want to loop through the vars in a dataframe, calling lm() on each one, and so I wrote this:
findvars <- function(x = samsungData, dv = 'activity', id = 'subject') {
# Loops through the possible predictor vars, does an lm() predicting the dv
# from each, and returns a data.frame of coefficients, one row per IV.
r <- data.frame()
# All varnames apart from the dependent var, and the case identifier
ivs <- setdiff(names(x), c(dv, id))
for (iv in ivs) {
print(paste("trying", iv))
m <- lm(dv ~ iv, data = x, na.rm = TRUE)
# Take the absolute value of the coefficient, then transpose.
c <- t(as.data.frame(sapply(m$coefficients, abs)))
c$iv <- iv # which IV produced this row?
r <- c(r, c)
}
return(r)
}
This doesn't work, I believe b/c the formula in the lm() call consists of function-local variables that hold strings naming vars in the passed-in dataframe (e.g., "my_dependant_var" and "this_iv") as opposed to pointers to the actual variable objects.
I tried wrapping that formula in eval(parse(text = )), but could not get that to work.
If I'm right about the problem, can someone explain to me how to get R to resolve the contents of those vars iv & dv into the pointers I need? Or if I'm wrong, can someone explain what else is going on?
Many thanks!
Here is some repro code:
library(datasets)
data(USJudgeRatings)
findvars(x = USJudgeRatings, dv = 'CONT', id = 'DILG')
So there's enough bad stuff happening in your function besides your trouble with the formula, that I think someone should walk you through it all. Here are some annotations, followed by a better version:
#For small examples, "growing" objects isn't a huge deal,
# but you will regret it very, very quickly. It's a bad
# habit. Learn to ditch it now. So don't inititalize
# empty lists and data frames.
r <- data.frame()
ivs <- setdiff(names(x), c(dv, id))
for (iv in ivs) {
print(paste("trying", iv))
#There is no na.rm argument to lm, only na.action
m <- lm(dv ~ iv, data = x, na.rm = TRUE)
#Best not to name variables c, its a common function, see two lines from now!
# Also, use the coef() extractor functions, not $. That way, if/when
# authors change the object structure your code won't break.
#Finally, abs is vectorized, no need for sapply
c <- t(as.data.frame(sapply(m$coefficients, abs)))
#This is probably best stored in the name
c$iv <- iv # which IV produced this row?
#Growing objects == bad! Also, are you sure you know what happens when
# you concatenate two data frames?
r <- c(r, c)
}
return(r)
}
Try something like this instead:
findvars <- function(x,dv,id){
ivs <- setdiff(names(x),c(dv,id))
#initialize result list of the appropriate length
result <- setNames(vector("list",length(ivs)),ivs)
for (i in seq_along(ivs)){
result[[i]] <- abs(coef(lm(paste(dv,ivs[i],sep = "~"),data = x,na.action = na.omit)))
}
result
}

Pass glm predictors from a list

I have a large set of model specifications to test, which share a dv but have unique IVs. In the following example
foo <- data.frame(dv = sample(c(0,1), 100, replace=T),
x1 = runif(100),
x2 = runif(100))
I want the first model to only include x1, the second x2, the third both, and the fourth their interaction. So I thought a sensible way would be to build a list of formula statements:
bar <- list("x1",
"x2",
"x1+x2",
"x1*x2")
which I would then use in a llply call from the plyr package to obtain a list of model objects.
require(plyr)
res <- llply(bar, function(i) glm(dv ~ i, data = foo, family = binomial()))
Unfortunately I'm told
Error in model.frame.default(formula = dv ~ i, data = foo, drop.unused.levels = TRUE):variable lengths differ (found for 'i')
Obviously I'm mixing up something fundamental--do I need to manipulate the original foo list in some fashion?
Your problem is with how you are specifying the formula, since inside the function i is a variable. This would work:
glm(paste("dv ~", i), data = foo, family = binomial())
The problem is that dv ~ i isn't a formula. i is (inside the anonymous function) simply a symbol that represents a variable containing a character value.
Try this:
bar <- list("dv~x1",
"dv~x2",
"dv~x1+x2",
"dv~x1*x2")
res <- llply(bar, function(i) glm(i, data = foo, family = binomial()))
But setting statistical issues aside, it might possibly be easier to use something like ?step or ?stepAIC in the MASS package for tasks similar to this?

Resources