lapply function to pass single and + arguments to LM - r

I am stuck trying to pass "+" arguments to lm.
My 2 lines of code below work fine for single arguments like:
model_combinations=c('.', 'Long', 'Lat', 'Elev')
lm_models = lapply(model_combinations, function(x) {
lm(substitute(Y ~ i, list(i=as.name(x))), data=climatol_ann)})
But same code fails if I add 'Lat+Elev' at end of list of model_combinations as in:
model_combinations=c('.', 'Long', 'Lat', 'Elev', 'Lat+Elev')
Error in eval(expr, envir, enclos) : object 'Lat+Elev' not found
I've scanned posts but am unable to find solution.

I've generally found it more robust/easier to understand to use reformulate to construct formulas via string manipulations rather than trying to use substitute() to modify an expression, e.g.
model_combinations <- c('.', 'Long', 'Lat', 'Elev', 'Lat+Elev')
model_formulas <- lapply(model_combinations,reformulate,
response="Y")
lm_models <- lapply(model_formulas,lm,data=climatol_ann)
Because reformulate works at a string level, it doesn't have a problem if the elements are themselves non-atomic (e.g. Lat+Elev). The only tricky situation here is if your data argument or variables are constructed in some environment that can't easily be found, but passing an explicit data argument usually avoids problems.
(You can also use as.formula(paste(...)) or as.formula(sprintf(...)); reformulate() is just a convenient wrapper.)

With as.formula you can do:
models = lapply(model_combinations,function(x) lm(as.formula(paste("y ~ ",x)), data=climatol_ann))
For the mtcars dataset:
model_combs = c("hp","cyl","hp+cyl")
testModels = lapply(model_combs,function(x) lm(as.formula(paste("mpg ~ ",x)), data=mtcars) )
testModels
#[[1]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) hp
# 30.09886 -0.06823
#
#
#[[2]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) cyl
# 37.885 -2.876
#
#
#[[3]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) hp cyl
# 36.90833 -0.01912 -2.26469

Related

Model syntax in R: how to input dynamic variables?

I want to run linear models (in this case, multivariate models with two response variables) within a for loop in which a new data frame called bc_applied is created at each iteration, as well as the vector targets. In my code, the column names "target1" and "target2" change at every iteration, which means I can't explicitly write variable names, instead I want to extract them from the vector targets.
Here is an example:
targets <- c("target1","target2")
bc_applied <- data.frame("dsRNA" = c(rep("gene1",5),rep("gene2",5),rep("gene3",5)),
"target1" = runif(15), "target2" = runif(15))
But when running
lm(bc_applied[,targets] ~ dsRNA, data = bc_applied)
The following error is returned:
Error in model.frame.default(formula = bc_applied[, targets] ~ dsRNA, :
invalid type (list) for variable 'bc_applied[, targets]'
The desired output is given by
lm(cbind(target1, target2) ~ dsRNA, data = bc_applied)
According to ?lm
If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix.
With cbind, it is creating a matrix. So, we need an option that takes a matrix. After subsetting the dataset with the columns, convert it to a matrix with as.matrix and it should work
lm(as.matrix(bc_applied[,targets]) ~ dsRNA, data = bc_applied)
-output
#Call:
#lm(formula = as.matrix(bc_applied[, targets]) ~ dsRNA, data = bc_applied)
#Coefficients:
# target1 target2
#(Intercept) 0.45161 0.47457
#dsRNAgene2 0.36341 0.29226
#dsRNAgene3 -0.07115 -0.03003
Or another option is to create a formula with paste
lm(paste0('cbind(', toString(targets),') ~ dsRNA'), data = bc_applied)
-output
#Call:
#lm(formula = paste0("cbind(", toString(targets), ") ~ dsRNA"),
# data = bc_applied)
#Coefficients:
# target1 target2
#(Intercept) 0.45161 0.47457
#dsRNAgene2 0.36341 0.29226
#dsRNAgene3 -0.07115 -0.03003
or create the formula with glue
lm(glue::glue('cbind({toString(targets)}) ~ dsRNA'), bc_applied)
or another option is
lm(do.call(cbind, asplit(bc_applied[, targets], 2)) ~ dsRNA, bc_applied)
Crosschecking with cbind
lm(cbind(target1, target2)~ dsRNA, data = bc_applied)
-output
#Call:
#lm(formula = cbind(target1, target2) ~ dsRNA, data = bc_applied)
#Coefficients:
# target1 target2
#(Intercept) 0.45161 0.47457
#dsRNAgene2 0.36341 0.29226
#dsRNAgene3 -0.07115 -0.03003

How to dynamically reference datasets in function call of linear regression

Let's say I have a function like this:
data("mtcars")
ncol(mtcars)
test <- function(string){
fit <- lm(mpg ~ cyl,
data = string)
return(fit)
}
I'd like to be able to have the "string" variable evaluated as the dataset for a linear regression like so:
test("mtcars")
However, I get an error:
Error in eval(predvars, data, env) : invalid 'envir' argument of
type 'character'
I've tried using combinations of eval and parse, but to no avail. Any ideas?
You can use get() to search by name for an object.
test <- function(string){
fit <- lm(mpg ~ cyl, data = get(string))
return(fit)
}
test("mtcars")
# Call:
# lm(formula = mpg ~ cyl, data = get(string))
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
You can add one more line to make the output look better. Notice the change of the Call part in the output. It turns from data = get(string) to data = mtcars.
test <- function(string){
fit <- lm(mpg ~ cyl, data = get(string))
fit$call$data <- as.name(string)
return(fit)
}
test("mtcars")
# Call:
# lm(formula = mpg ~ cyl, data = mtcars)
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
Try this slight change to your code:
#Code
test <- function(string){
fit <- lm(mpg ~ cyl,
data = eval(parse(text=string)))
return(fit)
}
#Apply
test("mtcars")
Output:
Call:
lm(formula = mpg ~ cyl, data = eval(parse(text = string)))
Coefficients:
(Intercept) cyl
37.885 -2.876

Evaluating the loop index as a variable name, rather than a string

I would like to evaluate a string, eg x1, where x1 <- "disp", as the underlying value, i.e. disp, when x1 is the loop index.
A reproducible example, using the mtcars dataset as an example is below:
x1 <- "disp"
x2 <- "hp"
vars <- c("x1", "x2")
for (x in vars){
print(x)
}
Which gives me
#> [1] "x1"
#> [1] "x2"
Desired Outcome:
What I'm trying to get is a loop that runs these commands:
print(x1)
print(x2)
resulting in:
#> [1] "disp"
#> [1] "hp"
I recognise that the simplest solution would be to bypass x1 and x2 completely:
vars <- c("disp", "hp")
for (x in vars){
print(x)
}
But that's less helpful, as it will be very helpful to have x1, x2, etc, in my (unsimplified) problem.
Also, if purrr is a better way to do something like this, instead of a loop, I'd be very interested to understand that better.
If anyone has a suggestion on a better title for the question, I will also be very interested.
Deeper Question
I simplified my question above, hoping that would be enough to get what I needed, but for context, I'm trying to do something like this:
df <- mtcars
x1 <- "disp"
x2 <- "hp"
vars <- c("x1", "x2")
for (x in vars){
lm(mpg ~ x, data = mtcars)
}
Created on 2019-07-11 by the reprex package (v0.2.1)
The answer to your original question is to use get. However, since you want to do something beyond that and want to use vars as it is we can use get with as.formula
lst <- vector("list", length(vars))
for (x in seq_along(vars)) {
lst[[x]] <- lm(as.formula(paste0("mpg ~", get(vars[x]))), mtcars)
}
#[[1]]
#Call:
#lm(formula = as.formula(paste0("mpg ~", get(vars[.x]))), data = mtcars)
#Coefficients:
#(Intercept) disp
# 29.5999 -0.0412
#[[2]]
#Call:
#lm(formula = as.formula(paste0("mpg ~", get(vars[.x]))), data = mtcars)
#Coefficients:
#(Intercept) hp
# 30.0989 -0.0682
Using purrr you can do that with map
purrr::map(seq_along(vars), ~lm(as.formula(paste0("mpg ~", get(vars[.x]))), mtcars))
We can use lapply from base R and reformulate
lapply(mget(vars), function(x)
lm(reformulate(response = "mpg", termlabels = x), data = mtcars))
#$x1
#Call:
#lm(formula = reformulate(response = "mpg", termlabels = x), data = mtcars)
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
#$x2
#Call:
#lm(formula = reformulate(response = "mpg", termlabels = x), data = mtcars)
#Coefficients:
#(Intercept) hp
# 30.09886 -0.06823
Already answered, but:
library(rlang)
library(tidyverse)
vars <- exprs(disp, hp) # without "character-quotes"
map(seq_along(vars), ~eval(expr(lm(mpg ~ !!vars[[.x]], mtcars))))
# or
vars <- c("disp", "hp")
map(vars, ~exec("lm", str_c("mpg ~ ", .x), data = mtcars))

Take elements out of quotes in R [duplicate]

This question already has answers here:
Formula with dynamic number of variables
(5 answers)
Closed 5 years ago.
I am trying to build an R shiny application where I have the user select variables for a model. The elements that the user selects get put into a vector. How do I remove the quotes as well as put spaces between each element they select, to be variables in a model?
As an example:
> vars <- c("cyl", "disp", "hp")
> my.model <- lm(mpg ~ paste(vars, collapse = "+"), data = mtcars)
Gives the error:
Error in model.frame.default(formula = mpg ~ paste(vars, collapse = "+"), :
variable lengths differ (found for 'paste(vars, collapse = "+")')
From reading other somewhat similar questions on Stackoverflow, someone suggested to use as.name() to remove the quotation marks, but this produces another error:
> vars <- c("cyl", "disp", "hp")
> my.model <- lm(mpg ~ as.name(paste(vars, collapse = "+")), data = mtcars)
Error in model.frame.default(formula = mpg ~ as.name(paste(vars, collapse = "+")), :
invalid type (symbol) for variable 'as.name(paste(vars, collapse = "+"))'
A formula is not just a string without quotes. It's a collection of un-evaluated symbols. Try using the build in reformulate function to build your formula.
vars <- c("cyl", "disp", "hp")
my.model <- lm(reformulate(vars,"mpg"), data = mtcars)
as.formula should be able to coerce strings into formula
lm(as.formula((paste("mpg ~", paste(vars, collapse = "+")))), data = mtcars)
#Call:
#lm(formula = as.formula((paste("mpg ~", paste(vars, collapse = "+")))),
# data = mtcars)
#Coefficients:
#(Intercept) cyl disp hp
# 34.18492 -1.22742 -0.01884 -0.01468

Combining cbind and paste in linear model

I would like to know how can I come up with a lm formula syntax that would enable me to use paste together with cbind for multiple multivariate regression.
Example
In my model I have a set of variables, which corresponds to the primitive example below:
data(mtcars)
depVars <- paste("mpg", "disp")
indepVars <- paste("qsec", "wt", "drat")
Problem
I would like to create a model with my depVars and indepVars. The model, typed by hand, would look like that:
modExmple <- lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars)
I'm interested in generating the same formula without referring to variable names and only using depVars and indepVars vectors defined above.
Attempt 1
For example, what I had on mind would correspond to:
mod1 <- lm(formula = formula(paste(cbind(paste(depVars, collapse = ",")), " ~ ",
indepVars)), data = mtcars)
Attempt 2
I tried this as well:
mod2 <- lm(formula = formula(cbind(depVars), paste(" ~ ",
paste(indepVars,
collapse = " + "))),
data = mtcars)
Side notes
I found a number of good examples on how to use paste with formula but I would like to know how I can combine with cbind.
This is mostly a syntax a question; in my real data I've a number of variables I would like to introduce to the model and making use of the previously generated vector is more parsimonious and makes the code more presentable. In effect, I'm only interested in creating a formula object that would contain cbind with variable names corresponding to one vector and the remaining variables corresponding to another vector.
In a word, I want to arrive at the formula in modExample without having to type variable names.
Think it works.
data(mtcars)
depVars <- c("mpg", "disp")
indepVars <- c("qsec", "wt", "drat")
lm(formula(paste('cbind(',
paste(depVars, collapse = ','),
') ~ ',
paste(indepVars, collapse = '+'))), data = mtcars)
All the solutions below use these definitions:
depVars <- c("mpg", "disp")
indepVars <- c("qsec", "wt", "drat")
1) character string formula Create a character string representing the formula and then run lm using do.call. Note that the the formula shown in the output displays correctly and is written out.
fo <- sprintf("cbind(%s) ~ %s", toString(depVars), paste(indepVars, collapse = "+"))
do.call("lm", list(fo, quote(mtcars)))
giving:
Call:
lm(formula = "cbind(mpg, disp) ~ qsec+wt+drat", data = mtcars)
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
1a) This would also work:
fo <- sprintf("cbind(%s) ~.", toString(depVars))
do.call("lm", list(fo, quote(mtcars[c(depVars, indepVars)])))
giving:
Call:
lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars[c(depVars,
indepVars)])
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
2) reformulate #akrun and #Konrad, in comments below the question suggest using reformulate. This approach produces a "formula" object whereas the ones above produce a character string as the formula. (If this were desired for the prior solutions above it would be possible using fo <- formula(fo) .) Note that it is important that the response argument to reformulate be a call object and not a character string or else reformulate will interpret the character string as the name of a single variable.
fo <- reformulate(indepVars, parse(text = sprintf("cbind(%s)", toString(depVars)))[[1]])
do.call("lm", list(fo, quote(mtcars)))
giving:
Call:
lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars)
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
3) lm.fit Another way that does not use a formula at all is:
m <- as.matrix(mtcars)
fit <- lm.fit(cbind(1, m[, indepVars]), m[, depVars])
The output is a list with these components:
> names(fit)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"

Resources