Evaluating the loop index as a variable name, rather than a string - r

I would like to evaluate a string, eg x1, where x1 <- "disp", as the underlying value, i.e. disp, when x1 is the loop index.
A reproducible example, using the mtcars dataset as an example is below:
x1 <- "disp"
x2 <- "hp"
vars <- c("x1", "x2")
for (x in vars){
print(x)
}
Which gives me
#> [1] "x1"
#> [1] "x2"
Desired Outcome:
What I'm trying to get is a loop that runs these commands:
print(x1)
print(x2)
resulting in:
#> [1] "disp"
#> [1] "hp"
I recognise that the simplest solution would be to bypass x1 and x2 completely:
vars <- c("disp", "hp")
for (x in vars){
print(x)
}
But that's less helpful, as it will be very helpful to have x1, x2, etc, in my (unsimplified) problem.
Also, if purrr is a better way to do something like this, instead of a loop, I'd be very interested to understand that better.
If anyone has a suggestion on a better title for the question, I will also be very interested.
Deeper Question
I simplified my question above, hoping that would be enough to get what I needed, but for context, I'm trying to do something like this:
df <- mtcars
x1 <- "disp"
x2 <- "hp"
vars <- c("x1", "x2")
for (x in vars){
lm(mpg ~ x, data = mtcars)
}
Created on 2019-07-11 by the reprex package (v0.2.1)

The answer to your original question is to use get. However, since you want to do something beyond that and want to use vars as it is we can use get with as.formula
lst <- vector("list", length(vars))
for (x in seq_along(vars)) {
lst[[x]] <- lm(as.formula(paste0("mpg ~", get(vars[x]))), mtcars)
}
#[[1]]
#Call:
#lm(formula = as.formula(paste0("mpg ~", get(vars[.x]))), data = mtcars)
#Coefficients:
#(Intercept) disp
# 29.5999 -0.0412
#[[2]]
#Call:
#lm(formula = as.formula(paste0("mpg ~", get(vars[.x]))), data = mtcars)
#Coefficients:
#(Intercept) hp
# 30.0989 -0.0682
Using purrr you can do that with map
purrr::map(seq_along(vars), ~lm(as.formula(paste0("mpg ~", get(vars[.x]))), mtcars))

We can use lapply from base R and reformulate
lapply(mget(vars), function(x)
lm(reformulate(response = "mpg", termlabels = x), data = mtcars))
#$x1
#Call:
#lm(formula = reformulate(response = "mpg", termlabels = x), data = mtcars)
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
#$x2
#Call:
#lm(formula = reformulate(response = "mpg", termlabels = x), data = mtcars)
#Coefficients:
#(Intercept) hp
# 30.09886 -0.06823

Already answered, but:
library(rlang)
library(tidyverse)
vars <- exprs(disp, hp) # without "character-quotes"
map(seq_along(vars), ~eval(expr(lm(mpg ~ !!vars[[.x]], mtcars))))
# or
vars <- c("disp", "hp")
map(vars, ~exec("lm", str_c("mpg ~ ", .x), data = mtcars))

Related

How to dynamically reference datasets in function call of linear regression

Let's say I have a function like this:
data("mtcars")
ncol(mtcars)
test <- function(string){
fit <- lm(mpg ~ cyl,
data = string)
return(fit)
}
I'd like to be able to have the "string" variable evaluated as the dataset for a linear regression like so:
test("mtcars")
However, I get an error:
Error in eval(predvars, data, env) : invalid 'envir' argument of
type 'character'
I've tried using combinations of eval and parse, but to no avail. Any ideas?
You can use get() to search by name for an object.
test <- function(string){
fit <- lm(mpg ~ cyl, data = get(string))
return(fit)
}
test("mtcars")
# Call:
# lm(formula = mpg ~ cyl, data = get(string))
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
You can add one more line to make the output look better. Notice the change of the Call part in the output. It turns from data = get(string) to data = mtcars.
test <- function(string){
fit <- lm(mpg ~ cyl, data = get(string))
fit$call$data <- as.name(string)
return(fit)
}
test("mtcars")
# Call:
# lm(formula = mpg ~ cyl, data = mtcars)
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
Try this slight change to your code:
#Code
test <- function(string){
fit <- lm(mpg ~ cyl,
data = eval(parse(text=string)))
return(fit)
}
#Apply
test("mtcars")
Output:
Call:
lm(formula = mpg ~ cyl, data = eval(parse(text = string)))
Coefficients:
(Intercept) cyl
37.885 -2.876

lapply function to pass single and + arguments to LM

I am stuck trying to pass "+" arguments to lm.
My 2 lines of code below work fine for single arguments like:
model_combinations=c('.', 'Long', 'Lat', 'Elev')
lm_models = lapply(model_combinations, function(x) {
lm(substitute(Y ~ i, list(i=as.name(x))), data=climatol_ann)})
But same code fails if I add 'Lat+Elev' at end of list of model_combinations as in:
model_combinations=c('.', 'Long', 'Lat', 'Elev', 'Lat+Elev')
Error in eval(expr, envir, enclos) : object 'Lat+Elev' not found
I've scanned posts but am unable to find solution.
I've generally found it more robust/easier to understand to use reformulate to construct formulas via string manipulations rather than trying to use substitute() to modify an expression, e.g.
model_combinations <- c('.', 'Long', 'Lat', 'Elev', 'Lat+Elev')
model_formulas <- lapply(model_combinations,reformulate,
response="Y")
lm_models <- lapply(model_formulas,lm,data=climatol_ann)
Because reformulate works at a string level, it doesn't have a problem if the elements are themselves non-atomic (e.g. Lat+Elev). The only tricky situation here is if your data argument or variables are constructed in some environment that can't easily be found, but passing an explicit data argument usually avoids problems.
(You can also use as.formula(paste(...)) or as.formula(sprintf(...)); reformulate() is just a convenient wrapper.)
With as.formula you can do:
models = lapply(model_combinations,function(x) lm(as.formula(paste("y ~ ",x)), data=climatol_ann))
For the mtcars dataset:
model_combs = c("hp","cyl","hp+cyl")
testModels = lapply(model_combs,function(x) lm(as.formula(paste("mpg ~ ",x)), data=mtcars) )
testModels
#[[1]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) hp
# 30.09886 -0.06823
#
#
#[[2]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) cyl
# 37.885 -2.876
#
#
#[[3]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) hp cyl
# 36.90833 -0.01912 -2.26469

Combining cbind and paste in linear model

I would like to know how can I come up with a lm formula syntax that would enable me to use paste together with cbind for multiple multivariate regression.
Example
In my model I have a set of variables, which corresponds to the primitive example below:
data(mtcars)
depVars <- paste("mpg", "disp")
indepVars <- paste("qsec", "wt", "drat")
Problem
I would like to create a model with my depVars and indepVars. The model, typed by hand, would look like that:
modExmple <- lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars)
I'm interested in generating the same formula without referring to variable names and only using depVars and indepVars vectors defined above.
Attempt 1
For example, what I had on mind would correspond to:
mod1 <- lm(formula = formula(paste(cbind(paste(depVars, collapse = ",")), " ~ ",
indepVars)), data = mtcars)
Attempt 2
I tried this as well:
mod2 <- lm(formula = formula(cbind(depVars), paste(" ~ ",
paste(indepVars,
collapse = " + "))),
data = mtcars)
Side notes
I found a number of good examples on how to use paste with formula but I would like to know how I can combine with cbind.
This is mostly a syntax a question; in my real data I've a number of variables I would like to introduce to the model and making use of the previously generated vector is more parsimonious and makes the code more presentable. In effect, I'm only interested in creating a formula object that would contain cbind with variable names corresponding to one vector and the remaining variables corresponding to another vector.
In a word, I want to arrive at the formula in modExample without having to type variable names.
Think it works.
data(mtcars)
depVars <- c("mpg", "disp")
indepVars <- c("qsec", "wt", "drat")
lm(formula(paste('cbind(',
paste(depVars, collapse = ','),
') ~ ',
paste(indepVars, collapse = '+'))), data = mtcars)
All the solutions below use these definitions:
depVars <- c("mpg", "disp")
indepVars <- c("qsec", "wt", "drat")
1) character string formula Create a character string representing the formula and then run lm using do.call. Note that the the formula shown in the output displays correctly and is written out.
fo <- sprintf("cbind(%s) ~ %s", toString(depVars), paste(indepVars, collapse = "+"))
do.call("lm", list(fo, quote(mtcars)))
giving:
Call:
lm(formula = "cbind(mpg, disp) ~ qsec+wt+drat", data = mtcars)
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
1a) This would also work:
fo <- sprintf("cbind(%s) ~.", toString(depVars))
do.call("lm", list(fo, quote(mtcars[c(depVars, indepVars)])))
giving:
Call:
lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars[c(depVars,
indepVars)])
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
2) reformulate #akrun and #Konrad, in comments below the question suggest using reformulate. This approach produces a "formula" object whereas the ones above produce a character string as the formula. (If this were desired for the prior solutions above it would be possible using fo <- formula(fo) .) Note that it is important that the response argument to reformulate be a call object and not a character string or else reformulate will interpret the character string as the name of a single variable.
fo <- reformulate(indepVars, parse(text = sprintf("cbind(%s)", toString(depVars)))[[1]])
do.call("lm", list(fo, quote(mtcars)))
giving:
Call:
lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars)
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
3) lm.fit Another way that does not use a formula at all is:
m <- as.matrix(mtcars)
fit <- lm.fit(cbind(1, m[, indepVars]), m[, depVars])
The output is a list with these components:
> names(fit)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"

Using lapply to fit multiple model -- how to keep the model formula self-contained in lm object

The following code fits 4 different model formulas to the mtcars dataset, using either for loop or lapply. In both cases, the formula stored in the result is referred to as formulas[[1]], formulas[[2]], etc. instead of the human-readable formula.
formulas <- list(
mpg ~ disp,
mpg ~ I(1 / disp),
mpg ~ disp + wt,
mpg ~ I(1 / disp) + wt
)
res <- vector("list", length=length(formulas))
for (i in seq_along(formulas)) {
res[[i]] <- lm(formulas[[i]], data=mtcars)
}
res
lapply(formulas, lm, data=mtcars)
Is there a way to make the full, readable formula show up in the result?
This should work
lapply(formulas, function(x, data) eval(bquote(lm(.(x),data))), data=mtcars)
And it retruns
[[1]]
Call:
lm(formula = mpg ~ disp, data = data)
Coefficients:
(Intercept) disp
29.59985 -0.04122
[[2]]
Call:
lm(formula = mpg ~ I(1/disp), data = data)
Coefficients:
(Intercept) I(1/disp)
10.75 1557.67
....etc
We use bquote to insert the formula into the call to lm and then evaluate the expression.
Why not just:
lapply( formulas, function(frm) lm( frm, data=mtcars))
#------------------
[[1]]
Call:
lm(formula = frm, data = mtcars)
Coefficients:
(Intercept) disp
29.59985 -0.04122
[[2]]
Call:
lm(formula = frm, data = mtcars)
Coefficients:
(Intercept) I(1/disp)
10.75 1557.67
snpped....
If you wanted the names of the result to have the 'character'-ized version of the formulas it would just be"
names(res) <- as.character(formulas)
res[1]
#-----
$`mpg ~ disp`
Call:
lm(formula = frm, data = mtcars)
Coefficients:
(Intercept) disp
29.59985 -0.04122
you can also try something like
library(purrr)
library(tibble)
models <- map(formulas, lm, data = mtcars)
models

Easily performing the same regression on different datasets

I'm performing the same regression on several different datasets (same dependent and independe variables). However, there are many independent variables, and I often want to test adding/removing different variables. I'd like to avoid making all these changes to different lines of code, just because they use different datasets. Can I instead just copy the formula that was used to create some object, and then create a new object using a different dataset? For example, something like:
fit1 <- lm(y ~ x1 + x2 + x3 + ..., data = dataset1)
fit2 <- lm(fit1$call, data = dataset2) # this doesn't work
fit3 <- lm(fit1$call, data = dataset3) # this doesn't work
This way, if I want to update numerous regressions, I just update the first one and then rerun them all.
Can this be done? Preferably without using a loop or paste().
Thanks!
Or use update
(fit <- lm(mpg ~ wt, data = mtcars))
# Call:
# lm(formula = mpg ~ wt, data = mtcars)
#
# Coefficients:
# (Intercept) wt
# 37.285 -5.344
update(fit, data = mtcars[mtcars$hp < 100, ])
# Call:
# lm(formula = mpg ~ wt, data = mtcars[mtcars$hp < 100, ])
#
# Coefficients:
# (Intercept) wt
# 39.295 -5.379
update(fit, data = mtcars[1:10, ])
# Call:
# lm(formula = mpg ~ wt, data = mtcars[1:10, ])
#
# Coefficients:
# (Intercept) wt
# 33.774 -4.285
Collect your datasets into a list and then use lapply. E.g.:
dsets <- list(dataset1,dataset2,dataset3)
lapply(dsets, function(x) lm(y ~ x1 + x2, data=x) )
Not sure entirely that this what you want but you can do this as follows:
formula <- y ~ x1 + x2 + x3 + ...
fit1 <- lm(formula, data = dataset1)
fit2 <- lm(formula, data = dataset2)
fit3 <- lm(formula, data = dataset3)

Resources