Substituting variable for string argument in function call - r

I am trying to call a function that expects a string as one of the arguments. However, attempting to substitute a variable containing the string throws an error.
library(jtools)
# Fit linear model
fitiris <- lm(Petal.Length ~ Petal.Width * Species, data = iris)
# Plot interaction effect: works!
interact_plot(fitiris, pred = "Petal.Width", modx = "Species")
# Substitute variable name for string: doesn't work!
predictor <- "Petal.Width"
interact_plot(fitiris, pred = predictor, modx = "Species")
Error in names(modxvals2) <- modx.labels :
attempt to set an attribute on NULL

{jtools} uses non-standard evaluation so you can specify unquoted column names, e.g.
library(jtools)
fitiris <- lm(Petal.Length ~ Petal.Width * Species, data = iris)
interact_plot(fitiris, pred = Petal.Width, modx = Species)
...but it's not robustly implemented, so the (common!) case you've run into breaks it. If you really need it to work, you can use bquote to restructure the call (with .(...) around what you want substituted), and then run it with eval:
predictor <- "Petal.Width"
eval(bquote(interact_plot(fitiris, pred = .(predictor), modx = "Species")))
...but this is diving pretty deep into R. A better approach is to make the plot yourself using an ordinary plotting library like {ggplot2}.

I'm the developer of this package.
A short note: this function has just been moved to a new package, called interactions, which is in the process of being added to CRAN. If you want to install it before it gets to CRAN (I expect this to happen within the week), you'll need to use this code to download it from Github:
if (!requireNamespace("remotes") {
install.packages("remotes")
}
remotes::install_github("jacob-long/interactions")
In this new version, I've changed the non-standard evaluation to follow the tidyeval model. This means it should be more straightforward to write a function that plugs in arguments to pred, modx, and/or mod2.
For example:
library(interactions)
plot_wrapper <- function(my_model, my_pred, my_modx) {
interact_plot(my_model, pred = !! my_pred, modx = !! my_modx)
}
fiti <- lm(Income ~ Frost + Murder * Illiteracy, data = as.data.frame(state.x77))
plot_wrapper(fiti, my_pred = "Murder", my_modx = "Illiteracy") # Works
pred_var <- "Murder"
modx_var <- "Illiteracy"
plot_wrapper(fiti, my_pred = pred_var, my_modx = modx_var) # Works
Or just to give an example of using variables in a loop...
variables <- c("Murder", "Illiteracy")
for (var in variables) {
print(interact_plot(fiti, pred = !! var, modx = !! (variables[variables != var])))
}

Related

How to change a call object within a purrr::map?

I am taking a list of fixest objects, and performing a wild-cluster-bootstrap on the standard errors using the fwildclusterboot package. I am doing this using the purrr::map function so that I can condense the code. However, when trying to get the goodness-of-fit statistics using the broom::glance method, I cannot get the output since purrr::map changes the $call$object to the function input's name (x). Essentially, broom::glance is attempting to take statistics from the $call$object which is named x in my code (e.g., the function argument name), which is unsuccessful. What WOULD work is changing the call object to fixest_regressions[[1]], fixest_regressions[[2]], etc. However, I am unsure of how to do this.
Here is the original code
library(tidyverse)
library(fixest)
library(fwildclusterboot)
## two fixest regressions - these are only for demonstration
reg1 <- feols(mpg~cyl | gear + carb, cluster = "gear", data = mtcars)
reg2 <- feols(mpg~cyl | gear + carb, cluster = "gear", data = mtcars)
fixest_regressions <- list(reg1, reg2)
output <- map(fixest_regressions, function(x) {
bootstrap <- fwildclusterboot::boottest(x, param = "cyl", clustid = "gear", B =9999)
return(bootstrap)
})
My first thought it to do the following:
output <- map(fixest_regressions, function(x) {
new_call_object <- as.symbol(deparse(substitute(x))
bootstrap <- fwildclusterboot::boottest(x, param = "cyl", clustid = "gear", B =9999)
bootstrap$call$object <- new_call_object
return(bootstrap)
})
However, this does not seem to work.

Sequentially calling functions within function in R

How can you create a function that calls some predefined functions simultaneously?
E.g. I have 3 different functions like
myplot(data)
fmodel(data)
mymodel <- fmodel(data)
myconclusion(model = mymodel)
Now I want to create a new function that calls those predefined functions (from 1 to 3). What should I do?
I tried to do something like the below and receive the following error message, but I don't what was wrong.
P/s: my model involves linear regression and I've already put in the 'data' arguments.
myplot(mydata)
fmodel(mydata)
myconclusion(mymodel)
funlist <- list(
F1 = myplot
F2 = fmodel
mymodel <- fmodel
F3 = myconclusion
)
callfun <- function(funrange, data, ...){
for(i in funrange){
funlist[[i]](...)
}
}
callfun(1:3, data = mydata)
#Error in model.frame.default(formula = Y ~ X, data = mydata, drop.unused.levels = TRUE) :
#argument "data" is missing, with no default
Running the 3 functions inside another function should execute them, However, depending on what the functions actually do, there may not be any visible output.
f1 <- function(mydata, mymodel){
myplot(mydata)
fmodel(mydata)
myconclusion(mymodel)
}
f1(mydata, mymodel)
Again, depending on what these functions actually do will dictate the output.
EDIT
Here is an example for you
my_plot <- function(my_data){
my_data %>%
ggplot(aes(mpg, hp))+
geom_point()
}
my_model <- function(my_data){
my_data %>%
lm(mpg ~ hp, data = .) %>%
summary
}
my_model_2 <- function(my_data){
my_data %>%
lm(mpg ~ disp, data = .) %>%
summary
}
f1 <- function(my_data){
my_plot(my_data)
my_model(my_data)
my_model_2(my_data)
}
If you call f1(mtcars), all you will see is the output from my_model_2(), because that was the last function to be executed. my_plot() and my_model() were still executed, but you just couldn't see the results because all it does is preview a plot in the viewer, or print the model summary to the console.
One way to 'see' the plot produced by my_plot() is to change what it does, from previewing a plot in the viewer, to saving a copy of the plot. This may be done like this:
my_plot <- function(my_data){
my_data %>%
ggplot(aes(mpg, hp))+
geom_point()
ggsave('my_saved_plot.png')
}
Or, wrapping each function inside print will print the model summaries to the console, and show the plot in the viewer
f1 <- function(my_data){
print(my_plot(my_data))
print(my_model(my_data))
print(my_model_2(my_data))
}

Recommended way of creating reusable objects within an R function

Suppose we have the following data:
# simulate data to fit
set.seed(21)
y = rnorm(100)
x = .5*y + rnorm(100, 0, sqrt(.75))
Let's also suppose the user has fit a model:
# user fits a lm
mod = lm(y~x)
Now suppose I have an R package designed to perform several operations on the object mod. Just for simplicify, suppose we have two functions, one that plots the data, and one that computes the coefficients. However, as an intermediary, suppose we want to perform some operation on the data (in this example, add ten).
Example:
# function that adds ten to all scores
add_ten = function(model) {
data = model$model
data = data + 10
return(data)
}
# functions I defined that do something to the "add_ten" dataset
plot_ten = function(model) {
new_data = data.frame(add_ten(model))
x = all.vars(formula(model))[2]
y = all.vars(formula(model))[1]
ggplot2::ggplot(new_data, aes_string(x=x, y=y)) + geom_point() + geom_smooth()
}
coefs_ten = function(model) {
new_data = data.frame(add_ten(model))
coef(lm(formula(model), new_data))
}
(Obviously, this is pretty silly to do. In actuality, the operation I want to perform is multiple imputation, which is computationally intensive).
Notice in the above example I have to call the add_ten function twice, once for plot_ten and once for coefs_ten. This is inefficient.
So, now to my question, what is the best way to create a reusable object within a function?
I could, of course, create an object to be placed in the user's global environment:
add_ten = function(model) {
# check for add_ten_data in the global environment
if (exists("add_ten_data", where = .GlobalEnv)) return(get("add_ten_data", envir = .GlobalEnv))
data = model$model
data = data + 10
# assign add_ten_data to the global environment
assign('add_ten_data', data, envir = .GlobalEnv)
return(data)
}
I'm happy to do so, but worry about the "netiquette" of putting something in the user's environment. There's also a potential problem if users happen to have an object called "add_ten_data" in their environment.
So, what is the best way of accomplishing this?
Thanks in advance!
You should certainly avoid writing an object to the global environment. If you find that you have to repeat the same computationally expensive task at the top of a number of different functions, it means you are carrying out the computationally expensive task too late.
For example, you could create an S3 class that holds the necessary components to produce a "cheap" plot and a "cheap" extraction of the coefficients. It even has the benefits of generic dispatch:
add_ten <- function(model) model$model + 10
lm_tens <- function(formula, data)
{
model <- if(missing(data)) lm(formula) else lm(formula, data = data)
structure(list(data = data.frame(add_ten(model)), model = model),
class = "tens")
}
plot.tens <- function(tens) {
x = all.vars(formula(tens$data))[2]
y = all.vars(formula(tens$data))[1]
ggplot2::ggplot(tens$data, ggplot2::aes(x = x, y = y)) +
ggplot2::geom_point() +
ggplot2::geom_smooth()
}
coef.tens = function(tens) {
coef(lm(formula(tens$model), data = tens$data))
}
So now we just need to do:
set.seed(21)
y = rnorm(100)
x = .5*y + rnorm(100, 0, sqrt(.75))
mod <- lm_tens(y ~ x)
coef(mod)
#> (Intercept) x
#> 4.3269914 0.5775404
plot(mod)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Note that we only need to call add_ten once here.

Passing the Data Argument in R User Defined Functions

For functions like lm() in R, you pass the "data" argument into the function, usually a dataframe, and then R knows all of the columns by name rather than referencing them. So the difference being x=column instead of referencing in the fashion x=df$column. So how can I use that same method in my own user defined functions?
A simple example:
library(tidyverse)
df <- tibble(x=1:100,y=x*(1+rnorm(n=100)))
test_corr <- function(x,y) {
cor(x,y) %>% return()
}
# Right now I would do this
test_corr(df$x,df$y)
# I want to be able to do this
test_corr(data=df, x, y)
Since you are using tidyverse functions, it would make sense to use tidy evaulation for this type of task. For this function you could do
test_corr <- function(data, x, y) {
quo( cor({{x}}, {{y}}) ) %>%
rlang::eval_tidy(data=data)
}
test_corr(df, x, y)
First we make a quosure to build the expression you want to evaluate and we use the {{ }} (embrace) syntax to insert the variable names you pass in to the function into the expression. We then evaluate that quosure in the context of the data.frame you supply with eval_tidy.
You might also be interested in the tidyselect package vignette where more options are discussed.
You could use reformulate
apply_fun <- function(response, terms, data) {
lm(reformulate(terms, response), data)
}
apply_fun("mpg", "cyl", mtcars)
#Call:
#lm(formula = reformulate(terms, response), data = data)
#Coefficients:
#(Intercept) cyl
# 37.885 -2.876
apply_fun("mpg", c("cyl", "am"), mtcars)
#Call:
#lm(formula = reformulate(terms, response), data = data)
#Coefficients:
#(Intercept) cyl am
# 34.522 -2.501 2.567

Inside function F, use an argument to F as an argument to update()

I would like to have a function like my_lm, exemplified below:
library(rlang)
base_formula <- new_formula(lhs = quote(potato),
rhs = quote(Sepal.Width + Petal.Length))
my_lm <- function(response) {
lm(formula = update(old = base_formula, new = quote(response) ~ . ),
data = iris)
}
my_lm(response = Sepal.Length)
But I am met with the following error:
Error in model.frame.default(formula = update(old = base_formula, new = enquo(response) ~ :
object is not a matrix
I suspect I am misusing rlang, but I can't seem to figure out what combination of quoting, unquoting and formulating would solve this problem.
EDIT: desired output is as if I ran:
lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length,
data = iris)
EDIT2: I should also clarify, I'm really interested in a solution that uses rlang to solve this problem through update more so than a solution that uses paste, gsub, and formula.
Here's something that works
base_formula <- new_formula(lhs = quote(potato),
rhs = quote(Sepal.Width + Petal.Length))
my_lm <- function(response) {
newf <- new_formula(get_expr(enquo(response)), quote(.))
lm(formula = update(old = base_formula, new = newf),
data = iris)
}
my_lm(response = Sepal.Length)
It seems to be a bit of a mess because quosures are also basically formulas and you're trying to make a regular formula with them. And then new_formula doesn't seem to allow for !! expansion.
If you are really just interested in changing the left hand side, something like this might be more direct
my_lm <- function(response) {
newf <- base_formula
f_lhs(newf) <- get_expr(enquo(response))
lm(formula = get_expr(newf),
data = iris)
}

Resources