Changing the arguments of this function to abstract variables in R - r

In order to use the for loop, I'm trying to replace the arguments in this function by variables:
lm(mpg~cylinders, data=Auto)
So I did this:
var1='cylinders'
lm((paste('mpg ~',var1)), data = Auto)
It worked fine.
Now, I wonder how we can replace the arguments cylinders+acceleration by var1 and var2.
So tried the same method. I tried to replace this:
lm(mpg~cylinders+acceleration, data=Auto)
by
var1='cylinders'
var2 = 'acceleration'
lm((paste('mpg ~',var1+var2)), data = Auto)
But I got a message error:
Error in var1 + var2 : non-numeric argument to binary operator
So I want to learn how I can work with var1 and var2 in order to use for loop afterwards.

Use reformulate to generate the formula.
var1 <- 'cyl'
var2 <- 'disp'
fo <- reformulate(c(var1, var2), "mpg")
lm(fo, mtcars)
or you could write it like this which gives the same answer except the above shows literally fo in the Call: line in the output whereas the code below expands fo in the Call: line in the output.
do.call("lm", list(fo, quote(mtcars)))
giving:
Call:
lm(formula = mpg ~ cyl + disp, data = mtcars)
Coefficients:
(Intercept) cyl disp
34.66099 -1.58728 -0.02058

Related

How to dynamically reference datasets in function call of linear regression

Let's say I have a function like this:
data("mtcars")
ncol(mtcars)
test <- function(string){
fit <- lm(mpg ~ cyl,
data = string)
return(fit)
}
I'd like to be able to have the "string" variable evaluated as the dataset for a linear regression like so:
test("mtcars")
However, I get an error:
Error in eval(predvars, data, env) : invalid 'envir' argument of
type 'character'
I've tried using combinations of eval and parse, but to no avail. Any ideas?
You can use get() to search by name for an object.
test <- function(string){
fit <- lm(mpg ~ cyl, data = get(string))
return(fit)
}
test("mtcars")
# Call:
# lm(formula = mpg ~ cyl, data = get(string))
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
You can add one more line to make the output look better. Notice the change of the Call part in the output. It turns from data = get(string) to data = mtcars.
test <- function(string){
fit <- lm(mpg ~ cyl, data = get(string))
fit$call$data <- as.name(string)
return(fit)
}
test("mtcars")
# Call:
# lm(formula = mpg ~ cyl, data = mtcars)
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
Try this slight change to your code:
#Code
test <- function(string){
fit <- lm(mpg ~ cyl,
data = eval(parse(text=string)))
return(fit)
}
#Apply
test("mtcars")
Output:
Call:
lm(formula = mpg ~ cyl, data = eval(parse(text = string)))
Coefficients:
(Intercept) cyl
37.885 -2.876

robust linear regression with lapply

I'm having problems to run a robust linear regression model (using rlm from the MASS library) over a list of dataframes.
Reproducible example:
var1 <- c(1:100)
var2 <- var1*var1
df1 <- data.frame(var1, var2)
var1 <- var1 + 50
var2 <- var2*2
df2 <- data.frame(var1, var2)
lst1 <- list(df1, df2)
Linear model (works):
lin_mod <- lapply(lst1, lm, formula = var1 ~ var2)
summary(lin_mod[[1]])
My code for the robust model:
rob_mod <- lapply(lst1, MASS::rlm, formula = var1 ~ var2)
gives the following error:
Error in rlm.default(X[[i]], ...) :
argument "y" is missing, with no default
How could I solve this?
The error in my actual data is:
Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion
You can also try a purrr:map solution:
library(tidyverse)
map(lst1, ~rlm(var1 ~ var2, data=.))
or as joran commented
map(lst1, MASS:::rlm.formula, formula = var1 ~ var2)
As you can see here ?lm provides only a formula method. In contrast ?rlm provides both (formula and x, y). Thus, you have to specify data= to say rlm to explicitly use the formula method. Otherwise rlm wants x and y as input.
Your call is missing the data argument. lapply will call FUN with each member of the list as the first argument of FUN but data is the second argument to rlm.
The solution is to define an anonymous function.
lin_mod <- lapply(lst1, function(DF) MASS::rlm(formula = var1 ~ var2, data = DF))
summary(lin_mod[[1]])
#
#Call: rlm(formula = var1 ~ var2, data = DF)
#Residuals:
# Min 1Q Median 3Q Max
#-18.707 -5.381 1.768 6.067 7.511
#
#Coefficients:
# Value Std. Error t value
#(Intercept) 19.6977 1.0872 18.1179
#var2 0.0092 0.0002 38.2665
#
#Residual standard error: 8.827 on 98 degrees of freedom

lapply function to pass single and + arguments to LM

I am stuck trying to pass "+" arguments to lm.
My 2 lines of code below work fine for single arguments like:
model_combinations=c('.', 'Long', 'Lat', 'Elev')
lm_models = lapply(model_combinations, function(x) {
lm(substitute(Y ~ i, list(i=as.name(x))), data=climatol_ann)})
But same code fails if I add 'Lat+Elev' at end of list of model_combinations as in:
model_combinations=c('.', 'Long', 'Lat', 'Elev', 'Lat+Elev')
Error in eval(expr, envir, enclos) : object 'Lat+Elev' not found
I've scanned posts but am unable to find solution.
I've generally found it more robust/easier to understand to use reformulate to construct formulas via string manipulations rather than trying to use substitute() to modify an expression, e.g.
model_combinations <- c('.', 'Long', 'Lat', 'Elev', 'Lat+Elev')
model_formulas <- lapply(model_combinations,reformulate,
response="Y")
lm_models <- lapply(model_formulas,lm,data=climatol_ann)
Because reformulate works at a string level, it doesn't have a problem if the elements are themselves non-atomic (e.g. Lat+Elev). The only tricky situation here is if your data argument or variables are constructed in some environment that can't easily be found, but passing an explicit data argument usually avoids problems.
(You can also use as.formula(paste(...)) or as.formula(sprintf(...)); reformulate() is just a convenient wrapper.)
With as.formula you can do:
models = lapply(model_combinations,function(x) lm(as.formula(paste("y ~ ",x)), data=climatol_ann))
For the mtcars dataset:
model_combs = c("hp","cyl","hp+cyl")
testModels = lapply(model_combs,function(x) lm(as.formula(paste("mpg ~ ",x)), data=mtcars) )
testModels
#[[1]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) hp
# 30.09886 -0.06823
#
#
#[[2]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) cyl
# 37.885 -2.876
#
#
#[[3]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) hp cyl
# 36.90833 -0.01912 -2.26469

Combining cbind and paste in linear model

I would like to know how can I come up with a lm formula syntax that would enable me to use paste together with cbind for multiple multivariate regression.
Example
In my model I have a set of variables, which corresponds to the primitive example below:
data(mtcars)
depVars <- paste("mpg", "disp")
indepVars <- paste("qsec", "wt", "drat")
Problem
I would like to create a model with my depVars and indepVars. The model, typed by hand, would look like that:
modExmple <- lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars)
I'm interested in generating the same formula without referring to variable names and only using depVars and indepVars vectors defined above.
Attempt 1
For example, what I had on mind would correspond to:
mod1 <- lm(formula = formula(paste(cbind(paste(depVars, collapse = ",")), " ~ ",
indepVars)), data = mtcars)
Attempt 2
I tried this as well:
mod2 <- lm(formula = formula(cbind(depVars), paste(" ~ ",
paste(indepVars,
collapse = " + "))),
data = mtcars)
Side notes
I found a number of good examples on how to use paste with formula but I would like to know how I can combine with cbind.
This is mostly a syntax a question; in my real data I've a number of variables I would like to introduce to the model and making use of the previously generated vector is more parsimonious and makes the code more presentable. In effect, I'm only interested in creating a formula object that would contain cbind with variable names corresponding to one vector and the remaining variables corresponding to another vector.
In a word, I want to arrive at the formula in modExample without having to type variable names.
Think it works.
data(mtcars)
depVars <- c("mpg", "disp")
indepVars <- c("qsec", "wt", "drat")
lm(formula(paste('cbind(',
paste(depVars, collapse = ','),
') ~ ',
paste(indepVars, collapse = '+'))), data = mtcars)
All the solutions below use these definitions:
depVars <- c("mpg", "disp")
indepVars <- c("qsec", "wt", "drat")
1) character string formula Create a character string representing the formula and then run lm using do.call. Note that the the formula shown in the output displays correctly and is written out.
fo <- sprintf("cbind(%s) ~ %s", toString(depVars), paste(indepVars, collapse = "+"))
do.call("lm", list(fo, quote(mtcars)))
giving:
Call:
lm(formula = "cbind(mpg, disp) ~ qsec+wt+drat", data = mtcars)
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
1a) This would also work:
fo <- sprintf("cbind(%s) ~.", toString(depVars))
do.call("lm", list(fo, quote(mtcars[c(depVars, indepVars)])))
giving:
Call:
lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars[c(depVars,
indepVars)])
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
2) reformulate #akrun and #Konrad, in comments below the question suggest using reformulate. This approach produces a "formula" object whereas the ones above produce a character string as the formula. (If this were desired for the prior solutions above it would be possible using fo <- formula(fo) .) Note that it is important that the response argument to reformulate be a call object and not a character string or else reformulate will interpret the character string as the name of a single variable.
fo <- reformulate(indepVars, parse(text = sprintf("cbind(%s)", toString(depVars)))[[1]])
do.call("lm", list(fo, quote(mtcars)))
giving:
Call:
lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars)
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
3) lm.fit Another way that does not use a formula at all is:
m <- as.matrix(mtcars)
fit <- lm.fit(cbind(1, m[, indepVars]), m[, depVars])
The output is a list with these components:
> names(fit)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"

Change printed variable names for summary()

I am using summary() to create a, yes, summary from my regression. What now is printed is my variable names, including underscore.
Is there any way to change the printed variable names so that I can see e.g. "Age of dog" instead of dog_age.
I can not change the variable names since they can not contain spaces.
Something like this?
> x <- summary(lm(mpg ~ cyl+wt, mtcars))
> rownames(x$coef) <- c("YOUR", "NAMES", "HERE")
> x$coef
# Estimate Std. Error t value Pr(>|t|)
# YOUR 39.6863 1.7150 23.141 < 2e-16
# NAMES -1.5078 0.4147 -3.636 0.001064
# HERE -3.1910 0.7569 -4.216 0.000222
Or you could just change the names in the data before running regression
> names(mtcars)[1:3] <- rownames(x$coef)
> lm(YOUR ~ NAMES+HERE, mtcars)
# Call:
# lm(formula = YOUR ~ NAMES + HERE, data = mtcars)
# Coefficients:
# (Intercept) NAMES HERE
# 34.66099 -1.58728 -0.02058
You can use backtick ` to introduce spaces in variables:
dat = data.frame(`Age of dog`=1:10,`T`=1:10,check.names=FALSE)
summary(lm(T~`Age of dog`,data=dat))

Resources