lapply for multiple lmer models [duplicate] - r

I have seen an example of list apply (lapply) that works nicely to take a list of data objects,
and return a list of regression output, which we can pass to Stargazer for nicely formatted output.
Using stargazer with a list of lm objects created by lapply-ing over a split data.frame
library(MASS)
library(stargazer)
data(Boston)
by.river <- split(Boston, Boston$chas)
class(by.river)
fit <- lapply(by.river, function(dd)lm(crim ~ indus,data=dd))
stargazer(fit, type = "text")
What i would like to do is, instead of passing a list of datasets to do the same regression on each data set (as above),
pass a list of independent variables to do different regressions on the same data set. In long hand it would look like this:
fit2 <- vector(mode = "list", length = 2)
fit2[[1]] <- lm(nox ~ indus, data = Boston)
fit2[[2]] <- lm(crim ~ indus, data = Boston)
stargazer(fit2, type = "text")
with lapply, i tried this and it doesn't work. Where did I go wrong?
myvarc <- c("nox","crim")
class(myvarc)
myvars <- as.list(myvarc)
class(myvars)
fit <- lapply(myvars, function(dvar)lm(dvar ~ indus,data=Boston))
stargazer(fit, type = "text")

Consider creating dynamic formulas from string:
fit <- lapply(myvars, function(dvar)
lm(as.formula(paste0(dvar, " ~ indus")),data=Boston))

This should work:
fit <- lapply(myvars, function(dvar) lm(eval(paste0(dvar,' ~ wt')), data = Boston))

You can also use a dplyr & purrr approach, keep everything in a tibble, pull out what you want, when you need it. No difference in functionality from the lapply methods.
library(dplyr)
library(purrr)
library(MASS)
library(stargazer)
var_tibble <- tibble(vars = c("nox","crim"), data = list(Boston))
analysis <- var_tibble %>%
mutate(models = map2(data, vars, ~lm(as.formula(paste0(.y, " ~ indus")), data = .x))) %>%
mutate(tables = map2(models, vars, ~stargazer(.x, type = "text", dep.var.labels.include = FALSE, column.labels = .y)))

You can also use get():
# make a list of independent variables
list_x <- list("nox","crim")
# create regression function
my_reg <- function(x) { lm(indus ~ get(x), data = Boston) }
# run regression
results <- lapply(list_x, my_reg)

Related

Running Multiple Linear Regression Models in for-Loop

The logic is similar to the content-based recommender,
content
undesirable
desirable
user_1
...
user_10
1
3.00
2.77
0.11
NA
...
5000
2.50
2.11
NA
0.12
I need to run the model for undesirable and desirable as independent values and each user as the dependent value, thus I need run 10 times to fit the model and predict each user's NA value.
This is the code that I hard coding, but I wonder how to use for loop, I just searched for several methods but they do not work for me...
the data as 'test'
hard code
#fit model
fit_1 = lm(user_1 ~ undesirable + desirable, data = test)
...
fit_10 = lm(user_10 ~ undesirable + desirable, data = test)
#prediction
u_1_na = test[is.na(test$user_1), c('user_1', 'undesirable', 'desirable')]
result1 = predict(fit_1, newdata = u_1_na)
which(result1 == max(result1))
max(result1)
...
u_10_na = test[is.na(test$user_10), c('user_10', 'undesirable', 'desirable')]
result10 = predict(fit_10, newdata = u_10_na)
which(result10 == max(result10))
max(result10)
#make to csv file
apply each max predict value to csv.
this is what I try for now(for loop)
mod_summaries <- list()
for(i in 1:10) {
predictors_i <- colnames(data)[1:10]
mod_summaries[[i - 1]] <- summary(
lm(predictors_i ~ ., test[ , c("undesirable", 'desirable')]))
}
An apply method:
mod_summaries_lapply <-
lapply(
colnames(mtcars),
FUN = function(x)
summary(lm(reformulate(".", response = x), data = mtcars))
)
A for loop method to make linear models for each column. The key is the reformulate() function, which creates the formula from strings. In the question, the function is made of a string and results in error invalid term in model formula. The string needs to be evaluated with eval() . This example uses the mtcars dataset.
mod_summaries <- list()
for(i in 1:11) {
predictors_i <- colnames(mtcars)[i]
mod_summaries[[i]] <- summary(lm(reformulate(".", response = predictors_i), data=mtcars))
#summary(lm(reformulate(". -1", response = predictors_i), data=mtcars)) # -1 to exclude intercept
#summary(lm(as.formula(paste(predictors_i, "~ .")), data=mtcars)) # a "paste as formula" method
}
You could use the function as.formula together with the paste function to create your formula. Following is an example
formula_lm <- as.formula(
paste(response_var,
paste(expl_var, collapse = " + "),
sep = " ~ "))
This implies that you have more than one explanatory variable (separated in the paste with +). If you only have one, omit the second paste.
With the created formula, you can use the lm funciton like this:
lm(formula_lm, data)
Edit: the vector expl_var would in your case include the undesirable and desirable variable.
Avoid the loop. Make your data tidy. Something like:
library(tidyverse)
test %>%
select(-content) %>%
pivot_longer(
starts_with("user"),
names_to="user",
values_to="value"
) %>%
group_by(user) %>%
group_map(
function(.x, .y) {
summary(lm(user ~ ., data=.x))
}
)
Untested code since your example is not reproducible.

Using purrr to map right-hand-side variables to regression functions

I have a large number of regression models specified by reference to right-hand-side variables, and I want to use purrr to generate a set of lists with the models. Here is an example of the desired end result using toy data:
m.1.1 <- "cyl"
m.1.2 <- paste(c(m.1.1, "disp"), collapse = " + ")
m.1.1_reg <- lm(
data = mtcars,
formula = as.formula(paste0("mpg ~ ", m.1.1)))
m.1.2_reg <- lm(
data = mtcars,
formula = as.formula(paste0("mpg ~ ", m.1.2)))
How can I achieve the same outcome (i.e., a list named m.1.1_reg and another list named m.1.2_reg) using purrr?
Just loop over the 'm.1.' objects in map and create the formula
library(purrr)
out <- map(dplyr::lst(m.1.1, m.1.2),
~ lm(data = mtcars, formula = as.formula(paste0("mpg ~ ", .x))))
-checking the names
> names(out)
[1] "m.1.1" "m.1.2"

Function or package to compare model fits side by side?

Is there a more elegant way of doing this?
m1 <- lm(price ~ carat, data = diamonds)
m2 <- lm(price ~ carat + cut, data = diamonds)
m3 <- lm(price ~ carat + cut + depth, data = diamonds)
m1r2 <- summary(m1)$r.squared
m2r2 <- summary(m2)$r.squared
m3r2 <- summary(m3)$r.squared
data.frame(
model = c("m1", "m2", "m3"),
RSqd = c(m1r2, m2r2, m3r2)
)
With caret I often use the following to compare multiple models side by side:
resamples(list_of_models) %>% summary()
Is there a conventional approach to comparing models on fit such as R.Squared, AIC, RSE? As opposed to crudely writing a dataframe in the way I have above?
An option is to use mget
stack(mget(ls(pattern = "^m\\d+r\\d+$")))
From the input 'm's, get thee objects in to a list with mget and apply the summary by looping over the list and extract the r.squared
lapply(mget(ls(pattern = "^m\\d+$")), function(x) summary(x)$r.squared)
Also, this can be done with reformulate by passing the independent variables in a list
lapply(list('carat', c('carat', 'cut'), c('carat', 'cut', 'depth')),
function(nm) summary(lm(reformulate(nm, 'price'),
data = diamonds))$r.squared)
If we want to get multiple components
library(broom)
lapply(mget(ls(pattern = "^m\\d+$")), glance)
Here is a similar more tidyverse-based approach.
With only three variables it won't save you much typing, but once you have a two digit number of variables the 'many models approach' is really convenient.
library(dplyr)
library(purrr)
library(broom)
library(ggplot2)
reg_vars <- c("carat", "cut", "depth")
tibble(id = 1:3) %>%
mutate(equ = map(id, ~ reformulate(reg_vars[1:.x], response = "price")),
mod = map(equ, ~ lm(.x, data = diamonds)),
res = map(mod, glance)) %>%
pull(res) %>%
bind_rows(., .id = "model")
You could also try texreg package.
library(texreg)
screenreg(list(m1, m2, m3))

Use lapply for multiple regression with formula changing, not the dataset

I have seen an example of list apply (lapply) that works nicely to take a list of data objects,
and return a list of regression output, which we can pass to Stargazer for nicely formatted output.
Using stargazer with a list of lm objects created by lapply-ing over a split data.frame
library(MASS)
library(stargazer)
data(Boston)
by.river <- split(Boston, Boston$chas)
class(by.river)
fit <- lapply(by.river, function(dd)lm(crim ~ indus,data=dd))
stargazer(fit, type = "text")
What i would like to do is, instead of passing a list of datasets to do the same regression on each data set (as above),
pass a list of independent variables to do different regressions on the same data set. In long hand it would look like this:
fit2 <- vector(mode = "list", length = 2)
fit2[[1]] <- lm(nox ~ indus, data = Boston)
fit2[[2]] <- lm(crim ~ indus, data = Boston)
stargazer(fit2, type = "text")
with lapply, i tried this and it doesn't work. Where did I go wrong?
myvarc <- c("nox","crim")
class(myvarc)
myvars <- as.list(myvarc)
class(myvars)
fit <- lapply(myvars, function(dvar)lm(dvar ~ indus,data=Boston))
stargazer(fit, type = "text")
Consider creating dynamic formulas from string:
fit <- lapply(myvars, function(dvar)
lm(as.formula(paste0(dvar, " ~ indus")),data=Boston))
This should work:
fit <- lapply(myvars, function(dvar) lm(eval(paste0(dvar,' ~ wt')), data = Boston))
You can also use a dplyr & purrr approach, keep everything in a tibble, pull out what you want, when you need it. No difference in functionality from the lapply methods.
library(dplyr)
library(purrr)
library(MASS)
library(stargazer)
var_tibble <- tibble(vars = c("nox","crim"), data = list(Boston))
analysis <- var_tibble %>%
mutate(models = map2(data, vars, ~lm(as.formula(paste0(.y, " ~ indus")), data = .x))) %>%
mutate(tables = map2(models, vars, ~stargazer(.x, type = "text", dep.var.labels.include = FALSE, column.labels = .y)))
You can also use get():
# make a list of independent variables
list_x <- list("nox","crim")
# create regression function
my_reg <- function(x) { lm(indus ~ get(x), data = Boston) }
# run regression
results <- lapply(list_x, my_reg)

How to pass a large amount of models to gather_predictions

In the modelr package the function gather_predictions can be used to add predictions from multiple models to a data frame, I'm however unsure on how to specify these models in the function call. The help documentation gives the following exmaple:
df <- tibble::data_frame(
x = sort(runif(100)),
y = 5 * x + 0.5 * x ^ 2 + 3 + rnorm(length(x))
)
m1 <- lm(y ~ x, data = df)
grid <- data.frame(x = seq(0, 1, length = 10))
grid %>% add_predictions(m1)
m2 <- lm(y ~ poly(x, 2), data = df)
grid %>% spread_predictions(m1, m2)
grid %>% gather_predictions(m1, m2)
here the models are specifically mentioned in the function call. That works fine if we have a few models we want predictions for, but what if we have a large or unknown amount of models? In this case manually specifying the models isn't really workable anymore.
the way the help documentation phrases the arguments segment seems to suggest you need to add every model as a separate argument.
gather_predictions and spread_predictions take multiple models. The
name will be taken from either the argument name of the name of the
model.
And for example inputting a list of models into gather_predictions doesn't work.
Is there some easy way to input a list / large amount of models to gather_predictions?
example for 10 models in a list:
modelslist <- list()
for (N in 1:10) {
modelslist[[N]] <- lm(y ~ poly(x, N), data = df)
}
If having the models stored some other way than a list works better, that's fine as well.
m <- grid %>% gather_predictions(lm(y ~ poly(x, 1), data = df))
for (N in 2:10) {
m <- rbind(m, grid %>% gather_predictions(lm(y ~ poly(x, N), data = df)))
}
There are workarounds to solve this problem. My approach was to:
1. build a list of models with specific names
2. use a tweaked version of modelr::gather_predictions() to apply all models in the list to data
# prerequisites
library(tidyverse)
set.seed(1363)
# I'll use generic name 'data' throughout the code, so you can easily try other datasets.
# for this example I'll use your data df
data=df
# data visualization
ggplot(data, aes(x, y)) +
geom_point(size=3)
your sample data
# build a list of models
models <-vector("list", length = 5)
model_names <- vector("character", length=5)
for (i in 1:5) {
modelformula <- str_c("y ~ poly(x,", i, ")", sep="")
models[[i]] <- lm(as.formula(modelformula), data = data)
model_names[[i]] <- str_c('model', i) # remember we name the models here sequantially
}
# apply names to the models list
names(models) <- model_names
# this is modified verison of modelr::gather_predictions() in order to accept list of models
gather.predictions <- function (data, models, .pred = "pred", .model = "model")
{
df <- map2(models, .pred, modelr::add_predictions, data = data)
names(df) <- names(models)
bind_rows(df, .id = .model)
}
# the rest is the same as modelr's function...
grids <- gather.predictions(data = data, models = models, .pred = "y")
ggplot(data, aes(x, y)) +
geom_point() +
geom_line(data = grids, colour = "red") +
facet_wrap(~ model)
example of polynomial models (degree 1:5) applied to your sample data
side note: there are good reasons why I chose strings to build the model...to discuss.

Resources