Take elements out of quotes in R [duplicate] - r

This question already has answers here:
Formula with dynamic number of variables
(5 answers)
Closed 5 years ago.
I am trying to build an R shiny application where I have the user select variables for a model. The elements that the user selects get put into a vector. How do I remove the quotes as well as put spaces between each element they select, to be variables in a model?
As an example:
> vars <- c("cyl", "disp", "hp")
> my.model <- lm(mpg ~ paste(vars, collapse = "+"), data = mtcars)
Gives the error:
Error in model.frame.default(formula = mpg ~ paste(vars, collapse = "+"), :
variable lengths differ (found for 'paste(vars, collapse = "+")')
From reading other somewhat similar questions on Stackoverflow, someone suggested to use as.name() to remove the quotation marks, but this produces another error:
> vars <- c("cyl", "disp", "hp")
> my.model <- lm(mpg ~ as.name(paste(vars, collapse = "+")), data = mtcars)
Error in model.frame.default(formula = mpg ~ as.name(paste(vars, collapse = "+")), :
invalid type (symbol) for variable 'as.name(paste(vars, collapse = "+"))'

A formula is not just a string without quotes. It's a collection of un-evaluated symbols. Try using the build in reformulate function to build your formula.
vars <- c("cyl", "disp", "hp")
my.model <- lm(reformulate(vars,"mpg"), data = mtcars)

as.formula should be able to coerce strings into formula
lm(as.formula((paste("mpg ~", paste(vars, collapse = "+")))), data = mtcars)
#Call:
#lm(formula = as.formula((paste("mpg ~", paste(vars, collapse = "+")))),
# data = mtcars)
#Coefficients:
#(Intercept) cyl disp hp
# 34.18492 -1.22742 -0.01884 -0.01468

Related

How to dynamically reference datasets in function call of linear regression

Let's say I have a function like this:
data("mtcars")
ncol(mtcars)
test <- function(string){
fit <- lm(mpg ~ cyl,
data = string)
return(fit)
}
I'd like to be able to have the "string" variable evaluated as the dataset for a linear regression like so:
test("mtcars")
However, I get an error:
Error in eval(predvars, data, env) : invalid 'envir' argument of
type 'character'
I've tried using combinations of eval and parse, but to no avail. Any ideas?
You can use get() to search by name for an object.
test <- function(string){
fit <- lm(mpg ~ cyl, data = get(string))
return(fit)
}
test("mtcars")
# Call:
# lm(formula = mpg ~ cyl, data = get(string))
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
You can add one more line to make the output look better. Notice the change of the Call part in the output. It turns from data = get(string) to data = mtcars.
test <- function(string){
fit <- lm(mpg ~ cyl, data = get(string))
fit$call$data <- as.name(string)
return(fit)
}
test("mtcars")
# Call:
# lm(formula = mpg ~ cyl, data = mtcars)
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
Try this slight change to your code:
#Code
test <- function(string){
fit <- lm(mpg ~ cyl,
data = eval(parse(text=string)))
return(fit)
}
#Apply
test("mtcars")
Output:
Call:
lm(formula = mpg ~ cyl, data = eval(parse(text = string)))
Coefficients:
(Intercept) cyl
37.885 -2.876

Problem extracting model covariates for model summary table

I'm a graduate student using a linear regression (count) model to understand drivers of fish movement into and out of tidal wetlands. I am currently trying to generate a publication-worthy model summary table in r. I've been using the sel.table function which has been working well for this purpose.
However, I've been unable to generate a column that contains the individual model formulas. Below is my code which is based off of some nice instructions for using the MuMIn package. https://sites.google.com/site/rforfishandwildlifegrads/home/mumin_usage_examples
So to recap, my question pertains to the last line of code below,
How can I insert model formulas into a model selection table.**
install.packages("MuMIn")
library(MuMIn)
data = mtcars
models = list(
model1 <- lm(mpg ~ cyl, data = data),
model2 <- lm(mpg ~ cyl + hp, data = data),
model3 <- lm(mpg ~ cyl * hp, data = data)
)
#create an object “out.put” that contains all of the model selection information
out.put <- model.sel(models)
#coerce the object out.put into a data frame
sel.table <-as.data.frame(out.put)[6:10]
#add a column for model names
sel.table$Model <- rownames(sel.table)
#replace model name with formulas
for(i in 1:nrow(sel.table)) sel.table$Model[i]<- as.character(formula(paste(sel.table$Model[i])))[3]
#Any help on this topic would be greatly appreciated!
UPDATED CODE
My method of pulling out model names is pretty clunky but otherwise this code seems to generate what I intended (a complete model selection table). Also, I'm not sure if the model coefficients are displayed properly but I hope to follow up on this for my final answer.
data = mtcars
#write linear models
models = list(
model1 <- lm(mpg ~ cyl, data = data),
model2 <- lm(mpg ~ cyl + hp, data = data),
model3 <- lm(mpg ~ cyl * hp + disp, data = data),
model4 <- lm(mpg ~ cyl * hp + disp + wt + drat, data = data)
)
#create an object “out.put” that contains all of the model selection information
out.put <- model.sel(models)
#coerce the object out.put into a data frame
sel.table <-as.data.frame(out.put)
#slightly rename intercept column
names(sel.table)[1]="Intercept"
#select variables to display in model summary table
sel.table <- sel.table %>%
select(Intercept,cyl,hp,disp,wt,drat,df,logLik,AICc,delta)
#round numerical coumns
sel.table[,1:6]<- round(sel.table[,1:6],2)
sel.table[,8:10]<-round(sel.table[,8:10],2)
#add a column for model (row) names
sel.table$Model <- rownames(sel.table)
#extract model formulas
form <- data.frame(name = as.character(lapply(models, `[[`, c(10,2))))
#generate a column with model (row) numbers (beside associated model formulas)
form <- form %>%
mutate(Model=(1:4))
#merge model table and model formulas
sum_table <- merge (form,sel.table,by="Model")
#rename model equation column
names(sum_table)[2]="Formula"
print <- flextable(head(sum_table))
print <- autofit(print)
print
6/1/20 UPDATE:
Below is an image that describes two issues that I'm having with the code. I've found a workaround to the first question but I'm still investigating the second.
see details here
Models end up being misnumbered
Model formula columns are being generated for each model
I believe there is a part of the code missing in the examples you followed, that is why your code does not work.
The easiest way to generate formula-like strings is simply to deparse the right hand side of the model formulas (i.e. 3-rd element):
sapply(get.models(out.put, TRUE), function(mo) deparse(formula(mo)[[3]], width.cutoff = 500))
or, if you want A*B's expanded into A + B + A:B:
sapply(get.models(out.put, TRUE), function(mo) deparse(terms(formula(mo), simplify = TRUE)[[3]], width.cutoff = 500))
Update: the original example code improved and simplified:
library(MuMIn)
data <- mtcars
#! Feed the models directly to `model.sel`. No need to create a separate list of
#! models.
gm <- lm(mpg ~ cyl, data = data)
out.put <- model.sel(
model1 = gm,
model2 = update(gm, . ~. + hp),
model3 = update(gm, . ~ . * hp + disp),
model4 = update(gm, . ~ . * hp + disp + wt + drat)
)
sel.table <- out.put
sel.table$family <- NULL
sel.table <- round(sel.table, 2)
#! Use `get.models` to get the list of models in the same order as in the
#! selection table
sel.table <- cbind(
Model =
#! Update (2): model number according to their original order, use:
attr(out.put, "order"),
#! otherwise: seq(nrow(sel.table)),
#!
#! Update (2): add a large `width.cutoff` to `deparse` so that the result is
#! always a single string and `sapply` returns a character vector
#! rather than a list.
#! For oversize formulas, use `paste0(deparse(...), collapse = "")`
formula = sapply(get.models(out.put, TRUE),
function(mo) deparse(formula(mo)[[3]], width.cutoff = 500)),
#!
sel.table
)
library(MuMIn)
data <- mtcars
#! Feed the models directly to `model.sel`. No need to create a separate list of
#! models.
gm <- lm(mpg ~ cyl, data = data)
out.put <- model.sel(
model1 = gm,
model2 = update(gm, . ~. + hp),
model3 = update(gm, . ~ . * hp + disp),
model4 = update(gm, . ~ . * hp + disp + wt + drat)
)
sel.table <- out.put
sel.table$family <- NULL
sel.table <- round(sel.table, 2)
#! Use `get.models` to get the list of models in the same order as in the
sel.table <- cbind(
Model =
#! Update (2): model number according to their original order, use:
attr(out.put, "order"),
#! otherwise: seq(nrow(sel.table)),
#!
#! Update (2): add a large `width.cutoff` to `deparse` so that the result is
#! always a single string and `sapply` returns a character vector
#! rather than a list.
#! For oversize formulas, use `paste0(deparse(...), collapse = "")`
formula = sapply(get.models(out.put, TRUE),
function(mo) deparse(formula(mo)[[3]], width.cutoff = 500)),
#!
sel.table
)
#slightly rename intercept column
colnames(sel.table)[3] <- 'Intercept'
# #select summary columns for model selection table
# sel.table <- sel.table %>%
# select(Model,formula,Intercept,df,logLik,AICc,delta,weight)
print <- flextable(head(sel.table))
print <- autofit(print)
print
Since your question isn't reproducible, i'll try with something else and maybe that's what you're looking for:
data = mtcars
models = list(
model1 = lm(mpg ~ cyl, data = data),
model2 = lm(mpg ~ cyl + hp, data = data)
)
data.frame(name = as.character(lapply(models, `[[`, c(10,2))),
other.column = NA)
#> name other.column
#> 1 mpg ~ cyl NA
#> 2 mpg ~ cyl + hp NA
Created on 2020-05-28 by the reprex package (v0.3.0)
The formula (call) of a lm object is on position 10 of the list. You can actually count when you type model1$. You can use rownames() instead of a column, but that's not recommended.
EDIT AFTER REPRODUCIBLE EXAMPLE
library(MuMIn)
data = mtcars
models = list(
model1 <- lm(mpg ~ cyl, data = data),
model2 <- lm(mpg ~ cyl + hp, data = data),
model3 <- lm(mpg ~ cyl * hp, data = data)
)
# create an object that contains all of the model selection information
out.put <- model.sel(models)
#coerce the object out.put into a data frame
sel.table <-as.data.frame(out.put)[6:10]
# formulas as names
sel.table$name = as.character(lapply(models, `[[`, c(10,2)))
# reordering
sel.table = sel.table[, c(6,1,2,3,4,5)]
sel.table
#> name df logLik AICc delta weight
#> 3 mpg ~ cyl 5 -78.14329 168.5943 0.000000 0.5713716
#> 1 mpg ~ cyl + hp 3 -81.65321 170.1636 1.569298 0.2607054
#> 2 mpg ~ cyl * hp 4 -80.78092 171.0433 2.449068 0.1679230
Created on 2020-05-31 by the reprex package (v0.3.0)

Evaluating the loop index as a variable name, rather than a string

I would like to evaluate a string, eg x1, where x1 <- "disp", as the underlying value, i.e. disp, when x1 is the loop index.
A reproducible example, using the mtcars dataset as an example is below:
x1 <- "disp"
x2 <- "hp"
vars <- c("x1", "x2")
for (x in vars){
print(x)
}
Which gives me
#> [1] "x1"
#> [1] "x2"
Desired Outcome:
What I'm trying to get is a loop that runs these commands:
print(x1)
print(x2)
resulting in:
#> [1] "disp"
#> [1] "hp"
I recognise that the simplest solution would be to bypass x1 and x2 completely:
vars <- c("disp", "hp")
for (x in vars){
print(x)
}
But that's less helpful, as it will be very helpful to have x1, x2, etc, in my (unsimplified) problem.
Also, if purrr is a better way to do something like this, instead of a loop, I'd be very interested to understand that better.
If anyone has a suggestion on a better title for the question, I will also be very interested.
Deeper Question
I simplified my question above, hoping that would be enough to get what I needed, but for context, I'm trying to do something like this:
df <- mtcars
x1 <- "disp"
x2 <- "hp"
vars <- c("x1", "x2")
for (x in vars){
lm(mpg ~ x, data = mtcars)
}
Created on 2019-07-11 by the reprex package (v0.2.1)
The answer to your original question is to use get. However, since you want to do something beyond that and want to use vars as it is we can use get with as.formula
lst <- vector("list", length(vars))
for (x in seq_along(vars)) {
lst[[x]] <- lm(as.formula(paste0("mpg ~", get(vars[x]))), mtcars)
}
#[[1]]
#Call:
#lm(formula = as.formula(paste0("mpg ~", get(vars[.x]))), data = mtcars)
#Coefficients:
#(Intercept) disp
# 29.5999 -0.0412
#[[2]]
#Call:
#lm(formula = as.formula(paste0("mpg ~", get(vars[.x]))), data = mtcars)
#Coefficients:
#(Intercept) hp
# 30.0989 -0.0682
Using purrr you can do that with map
purrr::map(seq_along(vars), ~lm(as.formula(paste0("mpg ~", get(vars[.x]))), mtcars))
We can use lapply from base R and reformulate
lapply(mget(vars), function(x)
lm(reformulate(response = "mpg", termlabels = x), data = mtcars))
#$x1
#Call:
#lm(formula = reformulate(response = "mpg", termlabels = x), data = mtcars)
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
#$x2
#Call:
#lm(formula = reformulate(response = "mpg", termlabels = x), data = mtcars)
#Coefficients:
#(Intercept) hp
# 30.09886 -0.06823
Already answered, but:
library(rlang)
library(tidyverse)
vars <- exprs(disp, hp) # without "character-quotes"
map(seq_along(vars), ~eval(expr(lm(mpg ~ !!vars[[.x]], mtcars))))
# or
vars <- c("disp", "hp")
map(vars, ~exec("lm", str_c("mpg ~ ", .x), data = mtcars))

lapply function to pass single and + arguments to LM

I am stuck trying to pass "+" arguments to lm.
My 2 lines of code below work fine for single arguments like:
model_combinations=c('.', 'Long', 'Lat', 'Elev')
lm_models = lapply(model_combinations, function(x) {
lm(substitute(Y ~ i, list(i=as.name(x))), data=climatol_ann)})
But same code fails if I add 'Lat+Elev' at end of list of model_combinations as in:
model_combinations=c('.', 'Long', 'Lat', 'Elev', 'Lat+Elev')
Error in eval(expr, envir, enclos) : object 'Lat+Elev' not found
I've scanned posts but am unable to find solution.
I've generally found it more robust/easier to understand to use reformulate to construct formulas via string manipulations rather than trying to use substitute() to modify an expression, e.g.
model_combinations <- c('.', 'Long', 'Lat', 'Elev', 'Lat+Elev')
model_formulas <- lapply(model_combinations,reformulate,
response="Y")
lm_models <- lapply(model_formulas,lm,data=climatol_ann)
Because reformulate works at a string level, it doesn't have a problem if the elements are themselves non-atomic (e.g. Lat+Elev). The only tricky situation here is if your data argument or variables are constructed in some environment that can't easily be found, but passing an explicit data argument usually avoids problems.
(You can also use as.formula(paste(...)) or as.formula(sprintf(...)); reformulate() is just a convenient wrapper.)
With as.formula you can do:
models = lapply(model_combinations,function(x) lm(as.formula(paste("y ~ ",x)), data=climatol_ann))
For the mtcars dataset:
model_combs = c("hp","cyl","hp+cyl")
testModels = lapply(model_combs,function(x) lm(as.formula(paste("mpg ~ ",x)), data=mtcars) )
testModels
#[[1]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) hp
# 30.09886 -0.06823
#
#
#[[2]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) cyl
# 37.885 -2.876
#
#
#[[3]]
#
#Call:
#lm(formula = as.formula(paste("mpg ~ ", x)), data = mtcars)
#
#Coefficients:
#(Intercept) hp cyl
# 36.90833 -0.01912 -2.26469

Combining cbind and paste in linear model

I would like to know how can I come up with a lm formula syntax that would enable me to use paste together with cbind for multiple multivariate regression.
Example
In my model I have a set of variables, which corresponds to the primitive example below:
data(mtcars)
depVars <- paste("mpg", "disp")
indepVars <- paste("qsec", "wt", "drat")
Problem
I would like to create a model with my depVars and indepVars. The model, typed by hand, would look like that:
modExmple <- lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars)
I'm interested in generating the same formula without referring to variable names and only using depVars and indepVars vectors defined above.
Attempt 1
For example, what I had on mind would correspond to:
mod1 <- lm(formula = formula(paste(cbind(paste(depVars, collapse = ",")), " ~ ",
indepVars)), data = mtcars)
Attempt 2
I tried this as well:
mod2 <- lm(formula = formula(cbind(depVars), paste(" ~ ",
paste(indepVars,
collapse = " + "))),
data = mtcars)
Side notes
I found a number of good examples on how to use paste with formula but I would like to know how I can combine with cbind.
This is mostly a syntax a question; in my real data I've a number of variables I would like to introduce to the model and making use of the previously generated vector is more parsimonious and makes the code more presentable. In effect, I'm only interested in creating a formula object that would contain cbind with variable names corresponding to one vector and the remaining variables corresponding to another vector.
In a word, I want to arrive at the formula in modExample without having to type variable names.
Think it works.
data(mtcars)
depVars <- c("mpg", "disp")
indepVars <- c("qsec", "wt", "drat")
lm(formula(paste('cbind(',
paste(depVars, collapse = ','),
') ~ ',
paste(indepVars, collapse = '+'))), data = mtcars)
All the solutions below use these definitions:
depVars <- c("mpg", "disp")
indepVars <- c("qsec", "wt", "drat")
1) character string formula Create a character string representing the formula and then run lm using do.call. Note that the the formula shown in the output displays correctly and is written out.
fo <- sprintf("cbind(%s) ~ %s", toString(depVars), paste(indepVars, collapse = "+"))
do.call("lm", list(fo, quote(mtcars)))
giving:
Call:
lm(formula = "cbind(mpg, disp) ~ qsec+wt+drat", data = mtcars)
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
1a) This would also work:
fo <- sprintf("cbind(%s) ~.", toString(depVars))
do.call("lm", list(fo, quote(mtcars[c(depVars, indepVars)])))
giving:
Call:
lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars[c(depVars,
indepVars)])
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
2) reformulate #akrun and #Konrad, in comments below the question suggest using reformulate. This approach produces a "formula" object whereas the ones above produce a character string as the formula. (If this were desired for the prior solutions above it would be possible using fo <- formula(fo) .) Note that it is important that the response argument to reformulate be a call object and not a character string or else reformulate will interpret the character string as the name of a single variable.
fo <- reformulate(indepVars, parse(text = sprintf("cbind(%s)", toString(depVars)))[[1]])
do.call("lm", list(fo, quote(mtcars)))
giving:
Call:
lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars)
Coefficients:
mpg disp
(Intercept) 11.3945 452.3407
qsec 0.9462 -20.3504
wt -4.3978 89.9782
drat 1.6561 -41.1148
3) lm.fit Another way that does not use a formula at all is:
m <- as.matrix(mtcars)
fit <- lm.fit(cbind(1, m[, indepVars]), m[, depVars])
The output is a list with these components:
> names(fit)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"

Resources