Programming a function for "lm" using tidyeval - r

I am trying to write a function around "lm" using tidyeval (non-standard evaluation).Using base R NSE, it works:
lm_poly_raw <- function(df, y, x, degree = 1, ...){
lm_formula <-
substitute(expr = y ~ poly(x, degree, raw = TRUE),
env = list(y = substitute(y),
x = substitute(x),
degree = degree))
eval(lm(lm_formula, data = df, ...))
}
lm_poly_raw(mtcars, hp, mpg, degree = 2)
However, I have not figured out how to write this function using tidyeval and rlang. I assume that substitute should be replaced be enquo, and eval by !!. There are some hints in Hadley's Adv-R, but I could not figure it out.

Here is the kind of formula constructor that might make its way in rlang in the future:
f <- function(x, y, flatten = TRUE) {
x <- enquo(x)
y <- enquo(y)
# Environments should be the same
# They could be different if forwarded through dots
env <- get_env(x)
stopifnot(identical(env, get_env(y)))
# Flatten the quosures. This warns the user if nested quosures are
# found. Those are not supported by functions like lm()
if (flatten) {
x <- quo_expr(x, warn = TRUE)
y <- quo_expr(y, warn = TRUE)
}
new_formula(x, y, env = env)
}
# This can be used for unquoting symbols
var <- "cyl"
lm(f(disp, am + (!! sym(var))), data = mtcars)
The tricky parts are:
The LHS and RHS could come from different environments if forwarded through different layers of .... We need to check for this.
We need to check that the user doesn't unquote quosures. lm() and co do not support those. quo_expr() flattens all the quosures and optionally warns if some were found.

Related

Passing optional arguments with `...` to multiple functions; control argument matching?

I'm trying to write a parent function that calls a bunch of sub-functions that all have pretty sensible defaults and are well documented. Based on the value of one parameter, there are potentially different arguments I'd like to pass down to different sub-functions for customization. Is there a way to pass arguments to multiple functions using elipsis or another strategy?
Here's a simple example; here the challenge is being able to pass na.rm and/or base when a user wants, but otherwise use the existing defaults:
dat <- c(NA, 1:5)
# I want a flexible function that uses sensible defaults unless otherwise specified
meanLog<-function(x, ...){
y <- log(x, ...)
z <- mean(y, ...)
return(z)
}
# I know I can pass ... to one function wrapped inside this one.
justLog <- function(x, ...){
log(x, ...)
}
justLog(dat)
justLog(dat, base = 2)
# or another
justMean <- function(x, ...){
mean(x, ...)
}
justMean(dat)
justMean(dat, na.rm =T)
# but I can't pass both arguments
meanLog(dat) # works fine, but I want to customize a few things
meanLog(dat, na.rm =T, base = 2)
justMean(dat, base =2)
# In this case that is because justLog breaks if it gets an unused na.rm
justLog(dat, na.rm =T)
1) Define do.call2 which is like do.call except that it accepts unnamed arguments as well as named argument in the character vector accepted which defaults to the formals in the function.
Note that the arguments of mean do not include na.rm -- it is slurped up by the dot dot dot argument -- but the mean.default method does. Also primitive functions do not have formals so the accepted argument must be specified explicitly for those rather than defaulted.
do.call2 <- function(what, args, accepted = formalArgs(what)) {
ok <- names(args) %in% c("", accepted)
do.call(what, args[ok])
}
# test
dat <- c(NA, 1:5)
meanLog <- function(x, ...){
y <- do.call2("log", list(x, ...), "base")
z <- do.call2("mean.default", list(y, ...))
return(z)
}
meanLog(dat, na.rm = TRUE, base = 2)
## [1] 1.381378
# check
mean(log(dat, base = 2), na.rm = TRUE)
## [1] 1.381378
2) Another possibility is to provide separate arguments for mean and log.
(A variation of that is to use dot dot dot for one of the functions and argument lists for the others. For example nls in R uses dot dot dot but also uses a control argument to specify other arguments.)
# test
dat <- c(NA, 1:5)
meanLog <- function(x, logArgs = list(), meanArgs = list()) {
y <- do.call("log", c(list(x), logArgs))
z <- do.call("mean", c(list(y), meanArgs))
return(z)
}
meanLog(dat, logArgs = list(base = 2), meanArgs = list(na.rm = TRUE))
## [1] 1.381378
# check
mean(log(dat, base = 2), na.rm = TRUE)
## [1] 1.381378

How to find object name passed to function

I have a function which takes a dataframe and its columns and processes it in various ways (left out for simplicity). We can put in column names as arguments or transform columns directly inside function arguments (like here). I need to find out what object(s) are passed in the function.
Reproducible example:
df <- data.frame(x= 1:10, y=1:10)
myfun <- function(data, col){
col_new <- eval(substitute(col), data)
# magic part
object_name <- ...
# magic part
plot(col_new, main= object_name)
}
For instance, the expected output for myfun(data= df, x*x) is the plot plot(df$x*df$x, main= "x"). So the title is x, not x*x. What I have got so far is this:
myfun <- function(data, col){
colname <- tryCatch({eval(substitute(col))}, error= function(e) {geterrmessage()})
colname <- gsub("' not found", "", gsub("object '", "", colname))
plot(eval(substitute(col), data), main= colname)
}
This function gives the expected output but there must be some more elegant way to find out to which object the input refers to. The answer must be with base R.
Use substitute to get the expression passed as col and then use eval and all.vars to get the values and name.
myfun <- function(data, col){
s <- substitute(col)
plot(eval(s, data), main = all.vars(s), type = "o", ylab = "")
}
myfun(df, x * x)
Anothehr possibility is to pass a one-sided formula.
myfun2 <- function(formula, data){
plot(eval(formula[[2]], data), main = all.vars(formula), type = "o", ylab = "")
}
myfun2(~ x * x, df)
The rlang package can be very powerful when you get a hang of it. Does something like this do what you want?
library(rlang)
myfun <- function (data, col){
.col <- enexpr(col)
unname(sapply(call_args(.col), as_string))
}
This gives you back the "wt" column.
myfun(mtcars, as.factor(wt))
# [1] "wt"
I am not sure your use case, but this would work for multiple inputs.
myfun(mtcars, sum(x, y))
# [1] "x" "y"
And finally, it is possible you might not even need to do this, but rather store the expression and operate directly on the data. The tidyeval framework can help with that as well.

Nested use of call_modify

I am trying to create a call to a function f whose first argument is a call to another function (for which I've chosen dbinom as an example). The call to dbinom (passed on to f) does not include values for all the arguments as these should be finalised within f, and the completed call is returned by f. Here is my failed minimal attempt:
f <- function(a_call) {
call_modify(a_call, x=1)
}
a_call <- call2(dbinom, size=1, prob=0.5)
y <- call2(f, a_call)
The output for y is:
(function(a_call) {
call_modify(a_call, x=1)
})((function (x, size, prob, log = FALSE)
.Call(C_dbinom, x, size, prob, log))(size = 1, prob = 0.5))
This call will
call a_call without any arguments, and then;
pass this result on to f.
If I evaluate y, it errors because dinom's first argument is missing.
I similar-but-related construct:
> call2(call2(dbinom, x=1, size=1, prob=0.5))
((function (x, size, prob, log = FALSE)
.Call(C_dbinom, x, size, prob, log))(x = 1, size = 1, prob = 0.5))()
(function (x, size, prob, log = FALSE)
I get the sense there is something 'not even wrong' with what I'm trying here, and nesting a call modification is best done another way.
It seems that what you are trying to do is handled more naturally by purrr::partial(), which fills in one or more arguments of a function:
f <- function( a_fun ) {purrr::partial( a_fun, x=1 )}
a_fun <- purrr::partial( dbinom, size=1, prob=0.5 )
y <- f(a_fun)
y(...) is now effectively dbinom( x=1, size=1, prob=0.5, ... )
y() # 0.5
y(log=TRUE) # -0.6931472
The great thing about partial() is that it can be naturally chained with the %>% pipe:
z <- partial(dbinom, size=1) %>% partial(prob=0.5) %>% partial(x=1)
z(log=TRUE) # -0.6931472
If I understand correctly what you're trying to do,
then maybe this works better:
f <- function(a_call) {
call_modify(call_standardise(call2(ensym(a_call)),
caller_env()),
x=1)
}
Which you can use with or without characters:
f(print)
# print(x = 1)
f("print")
# print(x = 1)
eval(f(print))
# 1
Or with more indirection:
a_call <- expr(print)
eval(call2(f, a_call))
# print(x = 1)
eval(expr(f(!!a_call)))
# print(x = 1)
Since we do a bit of non-standard evaluation here,
things get a bit tricky.
call_standardise needs to be able to find the function you specify,
and it's very probable that it will be found in the environment that calls f,
and not necessarily in the environment that calls call_standardise,
which would be f's execution environment in this case.
That's why caller_env() is explicitly specified when calling call_standardise even though that's the default for the latter's env,
because default arguments are evaluated in the function's execution environment,
whereas explicit arguments are evaluated in the caller's environment.
Here's a contrived-looking example for this problem:
f2 <- function(a_call) {
call_modify(call_standardise(call2(ensym(a_call))),
x=1)
}
e <- new.env()
e$foo <- function(x) { x + 1 }
with(e, f(foo))
# foo(x = 1)
with(e, f2(foo))
# Error in eval_bare(node_car(expr), env) : object 'foo' not found
However, if you were to develop a package that provides f,
the example is no longer contrived:
f would live in your package's environment,
and other packages could call it for functions that are only available in their respective namespaces.
For more specifics and depictions,
check this reference,
and maybe try drawing the call tree for my example.
call2 constructs a call by passing evaluated ... arguments on to the callable object (the first argument). For example, the command below outputs to the console "y" as the second argument passed to call2 is evaluated,
> A <- call2(print, x=print('y'))
[1] "y"
and constructs a call to print which takes x="y" as its argument (not x=print("y")):
> A
(function (x, ...)
UseMethod("print"))(x = "y")
In order to get around a_call being evaluated and then passed (to f) in the constructed call, it can be quoted, e.g.
f <- function(a_call) {
call_modify(a_call, x=1)
}
a_call <- call2(dbinom, size=1, prob=0.5)
y <- call2(f, quote(a_call))
Now:
> y
(function(a_call) {
call_modify(a_call, x=1)
})(a_call)

Functions and non-standard evaluation in dplyr

I just finished reading 'Programming with dplyr' and 'Define aesthetic mappings programatically' to start to get a grip on non-standard evaluation of functions. The specific question for this post is, "How do I write the code directly below using the tidyverse (eg quo(), !!, etc.)" instead of the base-R approach eval(), substitute, etc..
library(tidyverse)
xy <- data.frame(xvar = 1:10, yvar = 11:20)
plotfunc <- function(data, x, y){
y.sqr <- (eval(substitute(y), envir = data))^2
print(
ggplot(data, aes_q(x = substitute(x), y = substitute(y.sqr))) +
geom_line()
)
}
plotfunc(xy, xvar, yvar)
Can you provide the answer? It would be a bonus if you could work in the following concept, being, why is the function above non-standard whereas this other function below is standard? I read the Advanced R chapters on functions and non-standard evaluation, but it's above my head at this point. Can you explain in layperson terms? The function below is clear and concise (to me) whereas the function above is a hazy mess.
rescale01 <- function(x) {
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
rescale01(c(0, 5, 10))
You could do the following :
library(tidyverse)
xy <- data.frame(xvar = 1:10, yvar = 11:20)
plotfunc <- function(data, x, y){
x <- enquo(x)
y <- enquo(y)
print(
ggplot(data, aes(x = !!x, y = (!!y)^2)) +
geom_line()
)
}
plotfunc(xy, xvar, yvar)
Non standard evaluation basically means that you're passing the argument as an expression rather than a value. quo and enquo also associate an evaluation environment to this expression.
Hadley Wickham introduces it like this in his book :
In most programming languages, you can only access the values of a
function’s arguments. In R, you can also access the code used to
compute them. This makes it possible to evaluate code in non-standard
ways: to use what is known as non-standard evaluation, or NSE for
short. NSE is particularly useful for functions when doing interactive
data analysis because it can dramatically reduce the amount of typing.
With rlang_0.4.0, we can use the tidy-evaluation operator ({{...}}) or curly-curly which abstracts quote-and-unquote into a single interpolation step. This makes it easier to create functions
library(rlang)
library(ggplot2)
plotfunc <- function(data, x, y){
print(
ggplot(data, aes(x = {{x}}, y = {{y}}^2)) +
geom_line()
)
}
plotfunc(xy, xvar, yvar)
-output

Write a wrapper function to successfully take addition arguments (like subset) via ellipsis (...)

I am writing a function that calls another function (e.g. lm), and I would like to pass other
arguments to it using ellipsis (...). However, the data to be used is not
in the global environment, but inside a list. A minimal example:
L <- list(data = chickwts, other = 1:5)
wrapper <- function(list, formula = NULL, ...){
if (missing(formula)) formula <- formula(weight~feed)
lm(formula, data = list$data, ...)
}
wrapper(L, subset = feed != "casein") #fails
I can make it work using attach but I'm sure there is more efficient ways of doing it by specifying the evaluation frame...?
wrapper2 <- function(list, formula = NULL, ...){
if (missing(formula)) formula <- formula(weight~feed)
attach(list$data)
m <- lm(formula, ...)
detach(list$data)
return(m)
}
wrapper2(L, subset = feed != "casein") #works
Another solution I have used before is to use list(...), and dealing with the arguments manually, but that would not be practical in the real situation.
I can see that this is fairly basic, but I couldn't find a solution. Any suggestion to the specific problem and also a link to a good conceptual explanation of environments in general would be appreciated.
We would need to construct a call and eval it.
wrapper <- function(list, formula = NULL, ...){
if (missing(formula)) formula <- weight ~ feed
cl <- match.call()
cl$list <- NULL
cl$formula <- formula
cl$data <- quote(list$data)
cl[[1]] <- quote(stats::lm)
eval(cl)
}
Reproducible example:
L <- list(data = trees, other = 1:5)
wrapper(L, Height ~ Girth, subset = Volume > 20)

Resources