multiple linear models in the same data frame - r

I have a function that takes a data frame, the first column must be Y and the user selects which column will be X. I need to run multiple linear models in the same data.frame (find which lm has the best results for my user).
Using mtcars dataset, what I have for only one linear model:
results_LM <- function(data, var) {
fm1 <- as.formula(paste(colnames(data)[1], "~", var))
lm1(fm, data = data)
return(lm1)
}
fit <- results_LM(mtcars, "disp")
I would do the same for each linear model I'll test (and store in a final list that I'll use later):
results_LM <- function(data, var) {
fm1 <- as.formula(paste(colnames(data)[1], "~", var))
lm1(fm, data = data)
fm2 <- as.formula(paste(colnames(data)[1], "~", var, "+ I(", var, "^2)"))
lm2(fm, data = data)
all_lm <- list("FirstLM" = lm1, "SeconLM" = lm2)
return(all_lm)
}
And this goes on for fm3, lm3... fm99, lm 99
This would work, but I guess that are a MUCH better way to do this
Any ideas on how to run multiple linear models in the same data frame?

Alreay solved, looking at this post
I put all my models inside a list like and used lapply to run all of them
results_LM <- function(data, var) {
formulas <- list(as.formula(paste(colnames(data)[1], "~", var),
as.formula(paste(colnames(data)[1], "~", var, "+ I(", var, "^2)")))
models <- lapply(formulas, lm, data = data)
return(models)
}

Related

How to loop over columns to evaluate different fixed effects in consecutive lme4 mixed models and extract the coefficients and P values?

I am new to R and am trying to loop a mixed model across 90 columns in a dataset.
My dataset looks like the following one but has 90 predictors instead of 7 that I need to evaluate as fixed effects in consecutive models.
I then need to store the model output (coefficients and P values) to finally construct a figure summarizing the size effects of each predictor. I know the discussion of P value estimates from lme4 mixed models.
For example:
set.seed(101)
mydata <- tibble(id = rep(1:32, times=25),
time = sample(1:800),
experiment = rep(1:4, times=200),
Y = sample(1:800),
predictor_1 = runif(800),
predictor_2 = rnorm(800),
predictor_3 = sample(1:800),
predictor_4 = sample(1:800),
predictor_5 = seq(1:800),
predictor_6 = sample(1:800),
predictor_7 = runif(800)) %>% arrange (id, time)
The model to iterate across the N predictors is:
library(lme4)
library(lmerTest) # To obtain new values
mixed.model <- lmer(Y ~ predictor_1 + time + (1|id) + (1|experiment), data = mydata)
summary(mixed.model)
My coding skills are far from being able to set a loop to repeat the model across the N predictors in my dataset and store the coefficients and P values in a dataframe.
I have been able to iterate across all the predictors fitting linear models instead of mixed models using lapply. But I have failed to apply this strategy with mixed models.
varlist <- names(mydata)[5:11]
lm_models <- lapply(varlist, function(x) {
lm(substitute(Y ~ i, list(i = as.name(x))), data = mydata)
})
One option is to update the formula of a restricted model (w/o predictor) in an lapply loop over the predictors. Then summaryze the resulting list and subset the coefficient matrix using a Vectorized function.
library(lmerTest)
mixed.model <- lmer(Y ~ time + (1|id) + (1|experiment), data = mydata)
preds <- grep('pred', names(mydata), value=TRUE)
fits <- lapply(preds, \(x) update(mixed.model, paste('. ~ . + ', x)))
extract_coef_p <- Vectorize(\(x) x |> summary() |> coef() |> {\(.) .[3, c(1, 5)]}())
res <- `rownames<-`(t(extract_coef_p(fits)), preds)
res
# Estimate Pr(>|t|)
# predictor_1 -7.177579138 0.8002737
# predictor_2 -5.010342111 0.5377551
# predictor_3 -0.013030513 0.7126500
# predictor_4 -0.041702039 0.2383835
# predictor_5 -0.001437124 0.9676346
# predictor_6 0.005259293 0.8818644
# predictor_7 31.304496255 0.2511275

Write a function to list all possible combinations of models

I'm attempting to write a function to run all possible regression models for variables in a dataset. I was able to get it to run each variable, this is what I have so far.
library(tidyverse)
library(broom)
data("mtcars")
model1 <- function (DATA) {
DATA %>%
map(~lm(mpg ~ .x, data = DATA), tidy)%>% map(summary) %>%
map_dbl("adj.r.squared") %>%
tidy %>%
rename(adj.r.squared = x)
}
model1(mtcars)
I am new to R and writing functions so I am sure there are some issues with it. I want a tibble of all the adjusted r squared values for all possible models. How do I write a function that will do the same thing for two, three, or more variables?
I am not aware of any packages that allow one to automate this. So, let's try a brute force approach. The idea is to generate all possible combinations by hand and iterate over them.
vars <- names(mtcars)[-1]
models <- list()
for (i in 1:5){
vc <- combn(vars,i)
for (j in 1:ncol(vc)){
model <- as.formula(paste0("mpg ~", paste0(vc[,j], collapse = "+")))
models <- c(models, model)
}
}
You can use these formulas for run the linear model.
lapply(models, function(x) lm(x, data = mtcars))

How to create a loop for a linear model in R

I am here to ask your help.
I have to run a series of OLS regression on multiple depended variable using the same set for the independent ones.
I.e. I have a dataframe of size (1510x5), in particular each one represent the return of a portfolio, and I would like to regress it agains the same set of dependent variable (1510x4), which in my case are the factors from the Carhart model. Since, beside the value for the coefficients, I am interested in both their P-value and on the R2 of the regression, is there a way to build a loop that allows me to store the information?
What I have tried so far is:
for (i in 1:ncol(EW_Portfolio)) {
lmfit <- lm(EW_Portfolio[, i] ~ FFM)
summary(lmfit_i)
}
in the hope that, every time the loop repeated itself, I could see the result of each individual regression.
The easiest would be to store it in a list:
resultsList <- list()
for (i in 1:ncol(EW_Portfolio)) {
lmfit <- lm(EW_Portfolio[, i] ~ FFM)
resultsList[[i]] <- summary(lmfit_i)
}
You can then access the results you mention:
resultsList[[1]]$coefficients
resultsList[[1]]$r.squared
it may be something like, couldn't sure about the p.values
data("mtcars")
formulas <- list(
mpg ~ disp,
mpg ~ disp + wt
)
res <- vector("list", length = length(formulas))
my.r2 <- vector("list", length = length(formulas))
my.sum <- vector("list", length = length(formulas))
for(i in seq_along(formulas)){
res[[i]] <- lm(formulas[[i]], data = mtcars)
my.r2[[i]] <- (summary(res[[i]]))$adj.r.squared
my.sum[[i]] <- (summary(res[[i]]))
}
res
unlist(my.r2)
my.sum
lapply(formulas, lm, data = mtcars)

Formula from Data.frame Columns

I want to create a regression model from a vector (IC50) against a number of different molecular descriptors (A,B,C,D etc).
I want to use,
model <- lm (IC50 ~ A + B + C + D)
the molecular descriptors are found in the columns of a data.frame. I would like to use a function that takes the IC50 vector and the appropriately sub-setted data.frame as inputs.
My problem is that I can't convert the columns to formula for the model.
Can anyone help.
Sample data and feeble attempt,
IC50 <- c(0.1,0.2,0.55,0.63,0.005)
descs <- data.frame(A=c(0.002,0.2,0.654,0.851,0.654),
B=c(56,25,89,55,60),
C=c(0.005,0.006,0.004,0.009,0.007),
D=c(189,202,199,175,220))
model <- function(x=IC50,y=descs) {
a <- lm(x ~ y)
return(a)
}
I went down the substitute/deparse route but this didn't import the data.
You can do simply
model <- function(x = IC50, y = descs)
lm(x ~ ., data = y)

R: Regression of each variable depending on all the others

In R, I have the following data.frame:
df <- data.frame(var1,var2,var3)
I would like to fit a regression function, like multinom, for each variable with respect to the others, without using the variable names explicitely. In other words, I would like to obtain this result:
fit1 <- multinom(var1 ~ ., data=df)
fit2 <- multinom(var2 ~ ., data=df)
fit3 <- multinom(var3 ~ ., data=df)
But in a for loop, without using the variable names (so that I can use the same code for any data.frame). Something similar to this:
for (i in colnames(df))
{
fit[i] <- lm(i ~ ., data=df)
}
(This code does not work.)
Maybe my question is trivial, but I have no idea on how to proceed.
Thanks!
You need to add an extra step to build the formula object using string operation
fit <- vector(mode = "list", length = ncol(df))
for (i in colnames(df)) {
fm <- as.formula(paste0(i, " ~ ."))
fit[[i]] <- lm(fm, data = df)
}

Resources