How to change a call object within a purrr::map? - r

I am taking a list of fixest objects, and performing a wild-cluster-bootstrap on the standard errors using the fwildclusterboot package. I am doing this using the purrr::map function so that I can condense the code. However, when trying to get the goodness-of-fit statistics using the broom::glance method, I cannot get the output since purrr::map changes the $call$object to the function input's name (x). Essentially, broom::glance is attempting to take statistics from the $call$object which is named x in my code (e.g., the function argument name), which is unsuccessful. What WOULD work is changing the call object to fixest_regressions[[1]], fixest_regressions[[2]], etc. However, I am unsure of how to do this.
Here is the original code
library(tidyverse)
library(fixest)
library(fwildclusterboot)
## two fixest regressions - these are only for demonstration
reg1 <- feols(mpg~cyl | gear + carb, cluster = "gear", data = mtcars)
reg2 <- feols(mpg~cyl | gear + carb, cluster = "gear", data = mtcars)
fixest_regressions <- list(reg1, reg2)
output <- map(fixest_regressions, function(x) {
bootstrap <- fwildclusterboot::boottest(x, param = "cyl", clustid = "gear", B =9999)
return(bootstrap)
})
My first thought it to do the following:
output <- map(fixest_regressions, function(x) {
new_call_object <- as.symbol(deparse(substitute(x))
bootstrap <- fwildclusterboot::boottest(x, param = "cyl", clustid = "gear", B =9999)
bootstrap$call$object <- new_call_object
return(bootstrap)
})
However, this does not seem to work.

Related

How do I add a non-error message to my custom R function

I have a custom function in r that involves writing and extracting terms from a model. For example:
myfunction <- function(data, outcome1, predictor1, predictor2){
modelx <- glm(reformulate(predictor1, outcome1), data = data)
modelx$aic
}
myfunction(data = mtcars, outcome1 = "mpg", predictor1 = "cyl")
In reality my function returns a dataframe, but I am making it simpler for the purposes of this question.
I would like to add a message to the function that automatically prints out the regression formula being called when I run my function. Outside of a function, I can print this information by running:
> reformulate("cyl", "mpg")
mpg ~ cyl
Therefore, I think I should be able to have my function print the message by adding something like this to the end of the function:
simpleMessage(reformulate(predictor1, outcome1))
However, I can't get this to work, and I understand that it has something to do with the fact that a function can only return one output value. How can I get this message to work?
Close. You can use message() directly, which actually does have a call to simpleMessage() within it, but chances are you don't have to actually worry about constructing a message.
You can pass the formula into message() (with the help of format() to print it out and still have your model output.
Below has this implemented as well as a quick demonstration for why format() is helpful with the formula class.
myfunction <- function(data, outcome1, predictor1, predictor2){
ref <- reformulate(predictor1, outcome1) # save as object to use twice
modelx <- glm(ref, data = data)
message(format(ref)) # message() with format()
modelx$aic
}
myfunction(data = mtcars, outcome1 = "mpg", predictor1 = "cyl")
#> mpg ~ cyl
#> [1] 169.3064
# wrap with suppressMessages() to silence
suppressMessages(myfunction(data = mtcars, outcome1 = "mpg", predictor1 = "cyl"))
#> [1] 169.3064
# Why format?
ref <- reformulate("a", "b")
print(ref)
#> b ~ a
as.character(ref)
#> [1] "~" "b" "a"
format(ref)
#> [1] "b ~ a"
Created on 2022-10-06 with reprex v2.0.2

control scoping of arguments supplied to lm() from whithin a function called by lapply

I have a function that takes a dataset, extracts different variables, and then makes linear models from those variables (it expects the response in the last column). I want the data argument of the calls for these models to use objects in the global environment so that I can manipulate them with other functions outside this function. The following gives the expected behavior when provided with a single dataset.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) ) # Unevaluated to show output
}
d_sub <- substitute(dataset)
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
# Make some different datasets
ex1 <- ex2 <- ex3 <- mtcars[c(3,4,6,1)]
new_data <- function(x) {
x + rnorm(length(x), mean = 0, sd = sd(x))
}
ex2[-length(ex2)] <- lapply(ex2[-length(ex2)], new_data)
ex3[-length(ex3)] <- lapply(ex3[-length(ex3)], new_data)
make_mods(ex1)
I also want to be able to use this function within lapply
# List of datasets for testing function with lapply
ex_l <- mget(c("ex1", "ex2", "ex3"))
lapply(ex_l, make_mods)
But here the model calls end up looking like this: lm(mpg ~ disp, data = X[[i]]) and, of course, this model call doesn't evaluate in the default environment (the actual function evaluates the model call in the function). The desired output is a list of lists of models that look like: lm(mpg ~ disp, data = ex_l[["ex1"]]), i.e., they have valid calls with data arguments that reference data frames in the global environment.
I've experimented with passing names to lapply and different wrapper functions for calling make_mods from lapply but it seems like my function, in using substitute only gives the expected behavior when called from the global environment. I'm new to working with scoping and environments. How can I get my function to give the desired lm call both when passed a data frame from the global environment, and when passed data frames from within lapply.
The only thing that I could think of was to add an if statement to my make mods function that tests if the input is a call or not. If it's a call, it expects it to be a call for a dataset in the global environment.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) )
}
if(is.call(dataset)) {
d_sub <- dataset
dataset <- eval(dataset)
} else {
d_sub <- substitute(dataset)
}
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
Then I can use lapply like this:
out <- lapply(names(ex_l), function(x){
g <- bquote(ex_l[[.(x)]])
make_mods(g)
})
names(out) <- names(ex_l)
which gives me this:
$ex1
$ex1[[1]]
lm(mpg ~ disp, data = ex_l[["ex1"]])
$ex1[[2]]
lm(mpg ~ hp, data = ex_l[["ex1"]])
$ex1[[3]]
lm(mpg ~ wt, data = ex_l[["ex1"]])
<<output truncated>>
Maybe not an elegant solution, but it's working.

Passing the Data Argument in R User Defined Functions

For functions like lm() in R, you pass the "data" argument into the function, usually a dataframe, and then R knows all of the columns by name rather than referencing them. So the difference being x=column instead of referencing in the fashion x=df$column. So how can I use that same method in my own user defined functions?
A simple example:
library(tidyverse)
df <- tibble(x=1:100,y=x*(1+rnorm(n=100)))
test_corr <- function(x,y) {
cor(x,y) %>% return()
}
# Right now I would do this
test_corr(df$x,df$y)
# I want to be able to do this
test_corr(data=df, x, y)
Since you are using tidyverse functions, it would make sense to use tidy evaulation for this type of task. For this function you could do
test_corr <- function(data, x, y) {
quo( cor({{x}}, {{y}}) ) %>%
rlang::eval_tidy(data=data)
}
test_corr(df, x, y)
First we make a quosure to build the expression you want to evaluate and we use the {{ }} (embrace) syntax to insert the variable names you pass in to the function into the expression. We then evaluate that quosure in the context of the data.frame you supply with eval_tidy.
You might also be interested in the tidyselect package vignette where more options are discussed.
You could use reformulate
apply_fun <- function(response, terms, data) {
lm(reformulate(terms, response), data)
}
apply_fun("mpg", "cyl", mtcars)
#Call:
#lm(formula = reformulate(terms, response), data = data)
#Coefficients:
#(Intercept) cyl
# 37.885 -2.876
apply_fun("mpg", c("cyl", "am"), mtcars)
#Call:
#lm(formula = reformulate(terms, response), data = data)
#Coefficients:
#(Intercept) cyl am
# 34.522 -2.501 2.567

Substituting variable for string argument in function call

I am trying to call a function that expects a string as one of the arguments. However, attempting to substitute a variable containing the string throws an error.
library(jtools)
# Fit linear model
fitiris <- lm(Petal.Length ~ Petal.Width * Species, data = iris)
# Plot interaction effect: works!
interact_plot(fitiris, pred = "Petal.Width", modx = "Species")
# Substitute variable name for string: doesn't work!
predictor <- "Petal.Width"
interact_plot(fitiris, pred = predictor, modx = "Species")
Error in names(modxvals2) <- modx.labels :
attempt to set an attribute on NULL
{jtools} uses non-standard evaluation so you can specify unquoted column names, e.g.
library(jtools)
fitiris <- lm(Petal.Length ~ Petal.Width * Species, data = iris)
interact_plot(fitiris, pred = Petal.Width, modx = Species)
...but it's not robustly implemented, so the (common!) case you've run into breaks it. If you really need it to work, you can use bquote to restructure the call (with .(...) around what you want substituted), and then run it with eval:
predictor <- "Petal.Width"
eval(bquote(interact_plot(fitiris, pred = .(predictor), modx = "Species")))
...but this is diving pretty deep into R. A better approach is to make the plot yourself using an ordinary plotting library like {ggplot2}.
I'm the developer of this package.
A short note: this function has just been moved to a new package, called interactions, which is in the process of being added to CRAN. If you want to install it before it gets to CRAN (I expect this to happen within the week), you'll need to use this code to download it from Github:
if (!requireNamespace("remotes") {
install.packages("remotes")
}
remotes::install_github("jacob-long/interactions")
In this new version, I've changed the non-standard evaluation to follow the tidyeval model. This means it should be more straightforward to write a function that plugs in arguments to pred, modx, and/or mod2.
For example:
library(interactions)
plot_wrapper <- function(my_model, my_pred, my_modx) {
interact_plot(my_model, pred = !! my_pred, modx = !! my_modx)
}
fiti <- lm(Income ~ Frost + Murder * Illiteracy, data = as.data.frame(state.x77))
plot_wrapper(fiti, my_pred = "Murder", my_modx = "Illiteracy") # Works
pred_var <- "Murder"
modx_var <- "Illiteracy"
plot_wrapper(fiti, my_pred = pred_var, my_modx = modx_var) # Works
Or just to give an example of using variables in a loop...
variables <- c("Murder", "Illiteracy")
for (var in variables) {
print(interact_plot(fiti, pred = !! var, modx = !! (variables[variables != var])))
}

Change arguments in a "call" object

I have a call object and I want to add an argument, and I don't want to use parse in the way this answer does.
So, say I have an lm object, and so the call from the lm
lma <- lm(mpg ~ cyl, data=mtcars)
lma$call
# lm(formula = mpg ~ cyl, data = mtcars)
now, say I wanted to add an argument, weights=wt, using the call. I realize that there is an incredibly easy way to just create a new call, but I'm wondering if I can work with a call object. there is also a way to edit weights if it were already in there
lmb <- lm(mpg ~ cyl, data=mtcars, wei=wt)
cl <- lmb$call
wtpos <- which.max(pmatch(names(cl), "weights"))
cl[[wtpos]] <- mtcars$qsec
eval(cl)
but this won't work on lma$call because there is no weights argument in lma$call.
so, it feels like I should be able to simply "grow" the call by adding another element, but I don't see how to do that. For example, the following fails:
cl <- lma$call
cl <- c(cl, weights=quote(wt))
eval(cl)
# [[1]]
# lm(formula = mpg ~ cyl, data = mtcars)
#
# $weights
# wt
so, I would hope for the result to be a new "lm" object equal to the lmb, not just a list.
While there is a workaround, that doesn't use parse (to modify a copy of lm to have wt=weights as the default similar to in this solution) that, again, doesn't involve editing the call object.
I believe that the pryr package provides some useful functions for manipulating calls:
lma <- lm(mpg ~ cyl, data=mtcars)
lm_call <- lma$call
library(pryr)
modify_call(lm_call,list(weights = runif(32)))
> lm_call2 <- modify_call(lm_call,list(weights = runif(32)))
> eval(lm_call2)
Call:
lm(formula = mpg ~ cyl, data = mtcars, weights = c(0.934802365722135,
0.983909613220021, 0.762353664264083, 0.23217184189707, 0.850970500381663,
0.430563687346876, 0.962665138067678, 0.318865151610225, 0.697970792884007,
0.389103061752394, 0.824285467388108, 0.676439745584503, 0.344414771301672,
0.292265978176147, 0.925716639030725, 0.517001488478854, 0.726312294835225,
0.842773627489805, 0.669753148220479, 0.618112818570808, 0.139365098671988,
0.843711007386446, 0.851153723662719, 0.134744396666065, 0.92681276681833,
0.00274682720191777, 0.732672147220001, 0.4184603120666, 0.0912447033915669,
0.427389309043065, 0.721000595251098, 0.614837386412546))
Coefficients:
(Intercept) cyl
38.508 -2.945
You can look inside pryr::modify_call to see what it's doing if you'd like to do it manually, I suppose.

Resources