For functions like lm() in R, you pass the "data" argument into the function, usually a dataframe, and then R knows all of the columns by name rather than referencing them. So the difference being x=column instead of referencing in the fashion x=df$column. So how can I use that same method in my own user defined functions?
A simple example:
library(tidyverse)
df <- tibble(x=1:100,y=x*(1+rnorm(n=100)))
test_corr <- function(x,y) {
cor(x,y) %>% return()
}
# Right now I would do this
test_corr(df$x,df$y)
# I want to be able to do this
test_corr(data=df, x, y)
Since you are using tidyverse functions, it would make sense to use tidy evaulation for this type of task. For this function you could do
test_corr <- function(data, x, y) {
quo( cor({{x}}, {{y}}) ) %>%
rlang::eval_tidy(data=data)
}
test_corr(df, x, y)
First we make a quosure to build the expression you want to evaluate and we use the {{ }} (embrace) syntax to insert the variable names you pass in to the function into the expression. We then evaluate that quosure in the context of the data.frame you supply with eval_tidy.
You might also be interested in the tidyselect package vignette where more options are discussed.
You could use reformulate
apply_fun <- function(response, terms, data) {
lm(reformulate(terms, response), data)
}
apply_fun("mpg", "cyl", mtcars)
#Call:
#lm(formula = reformulate(terms, response), data = data)
#Coefficients:
#(Intercept) cyl
# 37.885 -2.876
apply_fun("mpg", c("cyl", "am"), mtcars)
#Call:
#lm(formula = reformulate(terms, response), data = data)
#Coefficients:
#(Intercept) cyl am
# 34.522 -2.501 2.567
Related
I have a custom function in r that involves writing and extracting terms from a model. For example:
myfunction <- function(data, outcome1, predictor1, predictor2){
modelx <- glm(reformulate(predictor1, outcome1), data = data)
modelx$aic
}
myfunction(data = mtcars, outcome1 = "mpg", predictor1 = "cyl")
In reality my function returns a dataframe, but I am making it simpler for the purposes of this question.
I would like to add a message to the function that automatically prints out the regression formula being called when I run my function. Outside of a function, I can print this information by running:
> reformulate("cyl", "mpg")
mpg ~ cyl
Therefore, I think I should be able to have my function print the message by adding something like this to the end of the function:
simpleMessage(reformulate(predictor1, outcome1))
However, I can't get this to work, and I understand that it has something to do with the fact that a function can only return one output value. How can I get this message to work?
Close. You can use message() directly, which actually does have a call to simpleMessage() within it, but chances are you don't have to actually worry about constructing a message.
You can pass the formula into message() (with the help of format() to print it out and still have your model output.
Below has this implemented as well as a quick demonstration for why format() is helpful with the formula class.
myfunction <- function(data, outcome1, predictor1, predictor2){
ref <- reformulate(predictor1, outcome1) # save as object to use twice
modelx <- glm(ref, data = data)
message(format(ref)) # message() with format()
modelx$aic
}
myfunction(data = mtcars, outcome1 = "mpg", predictor1 = "cyl")
#> mpg ~ cyl
#> [1] 169.3064
# wrap with suppressMessages() to silence
suppressMessages(myfunction(data = mtcars, outcome1 = "mpg", predictor1 = "cyl"))
#> [1] 169.3064
# Why format?
ref <- reformulate("a", "b")
print(ref)
#> b ~ a
as.character(ref)
#> [1] "~" "b" "a"
format(ref)
#> [1] "b ~ a"
Created on 2022-10-06 with reprex v2.0.2
I am taking a list of fixest objects, and performing a wild-cluster-bootstrap on the standard errors using the fwildclusterboot package. I am doing this using the purrr::map function so that I can condense the code. However, when trying to get the goodness-of-fit statistics using the broom::glance method, I cannot get the output since purrr::map changes the $call$object to the function input's name (x). Essentially, broom::glance is attempting to take statistics from the $call$object which is named x in my code (e.g., the function argument name), which is unsuccessful. What WOULD work is changing the call object to fixest_regressions[[1]], fixest_regressions[[2]], etc. However, I am unsure of how to do this.
Here is the original code
library(tidyverse)
library(fixest)
library(fwildclusterboot)
## two fixest regressions - these are only for demonstration
reg1 <- feols(mpg~cyl | gear + carb, cluster = "gear", data = mtcars)
reg2 <- feols(mpg~cyl | gear + carb, cluster = "gear", data = mtcars)
fixest_regressions <- list(reg1, reg2)
output <- map(fixest_regressions, function(x) {
bootstrap <- fwildclusterboot::boottest(x, param = "cyl", clustid = "gear", B =9999)
return(bootstrap)
})
My first thought it to do the following:
output <- map(fixest_regressions, function(x) {
new_call_object <- as.symbol(deparse(substitute(x))
bootstrap <- fwildclusterboot::boottest(x, param = "cyl", clustid = "gear", B =9999)
bootstrap$call$object <- new_call_object
return(bootstrap)
})
However, this does not seem to work.
I have a function that takes a dataset, extracts different variables, and then makes linear models from those variables (it expects the response in the last column). I want the data argument of the calls for these models to use objects in the global environment so that I can manipulate them with other functions outside this function. The following gives the expected behavior when provided with a single dataset.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) ) # Unevaluated to show output
}
d_sub <- substitute(dataset)
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
# Make some different datasets
ex1 <- ex2 <- ex3 <- mtcars[c(3,4,6,1)]
new_data <- function(x) {
x + rnorm(length(x), mean = 0, sd = sd(x))
}
ex2[-length(ex2)] <- lapply(ex2[-length(ex2)], new_data)
ex3[-length(ex3)] <- lapply(ex3[-length(ex3)], new_data)
make_mods(ex1)
I also want to be able to use this function within lapply
# List of datasets for testing function with lapply
ex_l <- mget(c("ex1", "ex2", "ex3"))
lapply(ex_l, make_mods)
But here the model calls end up looking like this: lm(mpg ~ disp, data = X[[i]]) and, of course, this model call doesn't evaluate in the default environment (the actual function evaluates the model call in the function). The desired output is a list of lists of models that look like: lm(mpg ~ disp, data = ex_l[["ex1"]]), i.e., they have valid calls with data arguments that reference data frames in the global environment.
I've experimented with passing names to lapply and different wrapper functions for calling make_mods from lapply but it seems like my function, in using substitute only gives the expected behavior when called from the global environment. I'm new to working with scoping and environments. How can I get my function to give the desired lm call both when passed a data frame from the global environment, and when passed data frames from within lapply.
The only thing that I could think of was to add an if statement to my make mods function that tests if the input is a call or not. If it's a call, it expects it to be a call for a dataset in the global environment.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) )
}
if(is.call(dataset)) {
d_sub <- dataset
dataset <- eval(dataset)
} else {
d_sub <- substitute(dataset)
}
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
Then I can use lapply like this:
out <- lapply(names(ex_l), function(x){
g <- bquote(ex_l[[.(x)]])
make_mods(g)
})
names(out) <- names(ex_l)
which gives me this:
$ex1
$ex1[[1]]
lm(mpg ~ disp, data = ex_l[["ex1"]])
$ex1[[2]]
lm(mpg ~ hp, data = ex_l[["ex1"]])
$ex1[[3]]
lm(mpg ~ wt, data = ex_l[["ex1"]])
<<output truncated>>
Maybe not an elegant solution, but it's working.
I am trying to write a function around "lm" using tidyeval (non-standard evaluation).Using base R NSE, it works:
lm_poly_raw <- function(df, y, x, degree = 1, ...){
lm_formula <-
substitute(expr = y ~ poly(x, degree, raw = TRUE),
env = list(y = substitute(y),
x = substitute(x),
degree = degree))
eval(lm(lm_formula, data = df, ...))
}
lm_poly_raw(mtcars, hp, mpg, degree = 2)
However, I have not figured out how to write this function using tidyeval and rlang. I assume that substitute should be replaced be enquo, and eval by !!. There are some hints in Hadley's Adv-R, but I could not figure it out.
Here is the kind of formula constructor that might make its way in rlang in the future:
f <- function(x, y, flatten = TRUE) {
x <- enquo(x)
y <- enquo(y)
# Environments should be the same
# They could be different if forwarded through dots
env <- get_env(x)
stopifnot(identical(env, get_env(y)))
# Flatten the quosures. This warns the user if nested quosures are
# found. Those are not supported by functions like lm()
if (flatten) {
x <- quo_expr(x, warn = TRUE)
y <- quo_expr(y, warn = TRUE)
}
new_formula(x, y, env = env)
}
# This can be used for unquoting symbols
var <- "cyl"
lm(f(disp, am + (!! sym(var))), data = mtcars)
The tricky parts are:
The LHS and RHS could come from different environments if forwarded through different layers of .... We need to check for this.
We need to check that the user doesn't unquote quosures. lm() and co do not support those. quo_expr() flattens all the quosures and optionally warns if some were found.
I have a call object and I want to add an argument, and I don't want to use parse in the way this answer does.
So, say I have an lm object, and so the call from the lm
lma <- lm(mpg ~ cyl, data=mtcars)
lma$call
# lm(formula = mpg ~ cyl, data = mtcars)
now, say I wanted to add an argument, weights=wt, using the call. I realize that there is an incredibly easy way to just create a new call, but I'm wondering if I can work with a call object. there is also a way to edit weights if it were already in there
lmb <- lm(mpg ~ cyl, data=mtcars, wei=wt)
cl <- lmb$call
wtpos <- which.max(pmatch(names(cl), "weights"))
cl[[wtpos]] <- mtcars$qsec
eval(cl)
but this won't work on lma$call because there is no weights argument in lma$call.
so, it feels like I should be able to simply "grow" the call by adding another element, but I don't see how to do that. For example, the following fails:
cl <- lma$call
cl <- c(cl, weights=quote(wt))
eval(cl)
# [[1]]
# lm(formula = mpg ~ cyl, data = mtcars)
#
# $weights
# wt
so, I would hope for the result to be a new "lm" object equal to the lmb, not just a list.
While there is a workaround, that doesn't use parse (to modify a copy of lm to have wt=weights as the default similar to in this solution) that, again, doesn't involve editing the call object.
I believe that the pryr package provides some useful functions for manipulating calls:
lma <- lm(mpg ~ cyl, data=mtcars)
lm_call <- lma$call
library(pryr)
modify_call(lm_call,list(weights = runif(32)))
> lm_call2 <- modify_call(lm_call,list(weights = runif(32)))
> eval(lm_call2)
Call:
lm(formula = mpg ~ cyl, data = mtcars, weights = c(0.934802365722135,
0.983909613220021, 0.762353664264083, 0.23217184189707, 0.850970500381663,
0.430563687346876, 0.962665138067678, 0.318865151610225, 0.697970792884007,
0.389103061752394, 0.824285467388108, 0.676439745584503, 0.344414771301672,
0.292265978176147, 0.925716639030725, 0.517001488478854, 0.726312294835225,
0.842773627489805, 0.669753148220479, 0.618112818570808, 0.139365098671988,
0.843711007386446, 0.851153723662719, 0.134744396666065, 0.92681276681833,
0.00274682720191777, 0.732672147220001, 0.4184603120666, 0.0912447033915669,
0.427389309043065, 0.721000595251098, 0.614837386412546))
Coefficients:
(Intercept) cyl
38.508 -2.945
You can look inside pryr::modify_call to see what it's doing if you'd like to do it manually, I suppose.