Creating a loop through a list of variables for an LM model in R - r

I am trying to create multiple linear regression models from a list of variable combinations (I also have them separately as a data-frame if that is more useful!)
The list of variables looks like this:
Vars
x1+x2+x3
x1+x2+x4
x1+x2+x5
x1+x2+x6
x1+x2+x7
The loop I'm using looks like this:
for (i in 1:length(var_list)){
lm(independent_variable ~ var_list[i],data = training_data)
i+1
}
However it is not recognizing the string of var_list[i] which gives x1+x2+x3 etc. as a model input.
Does any-one know how to fix it?
Thanks for your help.

You don't even have to use loops. Apply should work nicely.
training_data <- as.data.frame(matrix(sample(1:64), nrow = 8))
colnames(training_data) <- c("independent_variable", paste0("x", 1:7))
Vars <- as.list(c("x1+x2+x3",
"x1+x2+x4",
"x1+x2+x5",
"x1+x2+x6",
"x1+x2+x7"))
allModelsList <- lapply(paste("independent_variable ~", Vars), as.formula)
allModelsResults <- lapply(allModelsList, function(x) lm(x, data = training_data))
If you need models summaries you can add :
allModelsSummaries = lapply(allModelsResults, summary)
For example you can access the coefficient R² of the model lm(independent_variable ~ x1+x2+x3) by doing this:
allModelsSummaries[[1]]$r.squared
I hope it helps.

We can create the formula with paste
out <- vector('list', length(var_list))
for (i in seq_along(var_list)){
out[[i]] <- lm(paste('independent_variable', '~', var_list[i]),
data = training_data)
}
Or otherwise, it can be done with reformulate
lm(reformulate(var_list[i], 'independent_variable'), data = training_data)

Related

Looping over objects in R

I am trying to loop over objects in R.
myfunc.linear.pred <- function(x){
linear.pred <- predict(object = x)
w <- exp(linear.pred)/(1+exp(linear.pred))
as.vector(w)
}
The function here works perfectly as it should. It returns a vector of 48 rows and it comes from the object x. Now 'x' is nothing but the full regression model from a GLM function (think: mod.fit <- glm (dep~indep, data = data)). The problem is that I have 20 different such ('mod.fit') objects and need to find predictions for each of these. I could literally repeat the code, but I was looking to find a neater solution. So what I want is a matrix with 48 rows and 20 columns for the above function. This is probably basic for an advanced user, but I have only ever used "apply" and "for" loops for numbers and never objects. I looked into lapply but couldn't figure it out.
I tried: (and this is probably dumb)
allmodels <- c(mod.fit, mod.fit2, mod.fit3)
lpred.matrix <- matrix(data=NA, nrow=48, ncol=20)
for(i in allmodels){
lpred.matrix[i,] <- myfunc.linear.pred(i)
}
which obviously won't work because allmodels has a class of "list" and it contains all the stuff from the GLM function. Hope someone can help. Thanks!
In order to use lapply, you must have a list object not a vector object. Something like this should work:
## Load data
data("mtcars")
# fit models
mod.fit1 <- glm (mpg~disp, data = mtcars)
mod.fit2 <- glm (mpg~drat, data = mtcars)
mod.fit3 <- glm (mpg~wt, data = mtcars)
# build function
myfunc.linear.pred <- function(x){
linear.pred <- predict(object = x)
w <- exp(linear.pred)/(1+exp(linear.pred))
as.vector(w)
}
# put models in a list
allmodels <- list("mod1" = mod.fit1, "mod2" = mod.fit2, "mod2" =
mod.fit3)
# use lapply and do.call to generate matrix of prediction results
df <- do.call('cbind', lapply(allmodels, function(x){
a <- myfunc.linear.pred(x)
}))
Hope this helps

how to loop through a list of variable names to use it with the update function from lm

I have a list of variable names that I would like to sequentially exclude from a best fitted model using the function update from lm. Because the list of variables are likely to change I want to loop through a given list but I can not get the elements of the list to be read as dependent variable.
I found some code that I thought it could work:
Example code
hsb2 <-read.csv("www.ats.ucla.edu/stat/data/hsb2.csv")
names(hsb2)
varlist <- names(hsb2)[8:11]
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})
But not if I use the update function on a previous lm object
words<-c('Age','Sex', 'Residuals')
models <- lapply(words, function(x){update(substitute(
lmobject,~.-i,list(i = as.name(x))),data =data_complete)})
I also tried
re<-c()
for (i in 1:3) {
lmt<-update(lmobject,~.-words[i])
r2no_i<-summary(lmt)$r.squared
re<-c(re, r2no_i)
}
I think this is pretty simple but I could not make the variable to be read properly
Any tip is highly appreciated
Is it possible that the built-in stats::drop1() function would do what you need?
Read data (note "http://..." is needed)
hsb2 <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv")
varlist <- names(hsb2)[8:11]
Fit all models: construct list of formulas:
formList <- lapply(varlist,reformulate,response="read")
Construct models (a little bit of fancy footwork is needed to get the $call component to look right)
modList <- lapply(formList,
function(x) {
m <- lm(x,data=hsb2)
m$call$formula <- eval(m$call$formula)
return(m)
})
words <- c('Age','Sex', 'Residuals')
pick one of the fitted models:
lmobject <- modList[[1]]
construct formulas of the form . ~ . - w
minus_form <- function(w)
reformulate(c(".",paste0("-",w)),response=".")
minus_form("abc") ## . ~ . + -abc
Refit models with dropped terms:
newMods <- lapply(words,
function(w) {
update(lmobject,minus_form(w))
})

Deduplicate a list of lm objects in R

I have a list of lm models objects with possible repeated, so I'd like to find a way of checking if some of these lm objects are equal, if so them delete it. In words, I want to "deduplicate" my list.
I'd appreciate very much any help.
An example of the problem:
## Creates outcome and predictors
outcome <- c(names(mtcars)[1:3])
predictors <- c(names(mtcars)[4:11])
dataset <- mtcars
## Creates model list
model_list <- lapply(seq_along((predictors)), function(n) {
left_hand_side <- outcome[1]
right_hand_side <- apply(X = combn(predictors, n), MARGIN = 2, paste, collapse = " + ")
paste(left_hand_side, right_hand_side, sep = " ~ ")
})
## Convert model list into a verctor
model_vector <- unlist(model_list)
## Fit linear models to all itens from the vector of models
list_of_fit <- lapply(model_vector, function(x) {
formula <- as.formula(x)
fit <- step(lm(formula, data = dataset))
fit
})
# Exclude possible missing
list_of_fit <- Filter(Negate(function(x) is.null(unlist(x))), list_of_fit)
# These models are the same in my list
lm253 <- list_of_fit[[253]];lm253
lm254 <- list_of_fit[[254]];lm254
lm255 <- list_of_fit[[255]];lm255
I want to exclude duplicated entries in list_of_fit.
It seems wasteful to fit so many models and then throw away most of them. Your object names make your code hard to read for me, but it seems your models can be distinguished based on their formula. Maybe this helps:
lista_de_ajustes[!duplicated(vapply(lista_de_ajustes,
function(m) deparse(m$call),
FUN.VALUE = "a"))]
I made a simple correction in you code Roland, so it worked for me.
I changed from deparse(m$call) to deparse(formula(m)), due this I'm able to compare the complete formulas.
lista_de_ajustes[!duplicated(vapply(lista_de_ajustes, function(m) deparse(formula(m)), FUN.VALUE = "a"))]
Thank you very much!

Using apply to loop over different datasets in a regression

I found this way of looping over variables in an lm() when the variable names are stored as characters (http://www.ats.ucla.edu/stat/r/pages/looping_strings.htm):
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})
My first question is: Is there a more efficient/faster way?
What if I want to loop over different data instead of looping over variables?
Example:
reg1 <- lm(a~b, data=dataset1)
reg2 <- lm(a~b, data=dataset2)
Can I apply something similar to the code shown above? Using the substitute function for the data did not work.
Thank You!
The substitute in your example is used to construct the formula. If you want to to apply lm to a number of data.frames use:
lapply(list(dataset1, dataset2), lm, formula = a ~ b)

apply series of commands to split data frame

I'm having some difficulties figuring out how to approach this problem. I have a data frame that I am splitting into distinct sites (link5). Once split I basically want to run a linear regression model on the subsets. Here is the code I'm working with, but it's definitely not correct. Also, It would be great if I could output the model results to a new data frame such that each site would have one row with the model parameter estimates - that is just a wish and not a necessity right now. Thank you for any help!
les_events <- split(les, les$link5)
result <- lapply(les_events) {
lm1 <-lm(cpe~K,data=les_events)
coef <- coef(lm1)
q.hat <- -coef(lm1)[2]
les_events$N0.hat <- coef(lm1[1]/q.hat)
}
You have a number of issues.
You haven't passed a function (the FUN argument) to lapply
Your closure ( The bit inside {} is almost, but not quite the body you want for your function)
something like th following will return the coefficients from your models
result <- lapply(les_events, function(DD){
lm1 <-lm(cpe~K,data=DD)
coef <- coef(lm1)
data.frame(as.list(coef))
})
This will return a list of data.frames containing columns for each coefficient.
lapply(les_events, lm, formula = 'cpe~K')
will return a list of linear model objects, which may be more useful.
For a more general split / apply / combine approaches use plyr or data.table
data.table
library(data.table)
DT <- data.table(les)
result <- les[, {lm1 <- lm(cpe ~ K, data = .SD)
as.list(lm1)}, by = link5]
plyr
library(plyr)
result <- ddply(les, .(link5), function(DD){
lm1 <-lm(cpe~K,data=DD)
coef <- coef(lm1)
data.frame(as.list(coef))
})
# or to return a list of linear model objects
dlply(les, link5, function(DD){ lm(cpe ~K, data =DD)})

Resources