capturing functions using rlang's enexprs - r

I'm writing a function such that callers of this function can write schemas declaratively:
myschema <- Schema(
patientID = character,
temp = numeric,
treated = logical,
reason_treated = factor(levels=c('fever', 'chills', 'nausea'))
)
Later, I'd to be able to assemble dataframes using the types declared in this schema. I think the best candidate for this job is to use the metaprogramming features available in rlang:
Schema = function(...) {
schematypes = rlang::enexprs(...)
}
However, most of the examples pertain to capturing the expression and thereafter using them as arguments to functions, rather than as functions themselves. That is, I'm finding it hard to capture the right side of the following expression:
patientID = character
and then later being able to evaluate it later as character(myvec), whenever I get myvec. The same applies to the following:
reason_treated = factor(levels=c('fever', 'chills', 'nausea'))
which I would later like to evaluate as factor(myvec, levels=c('fever', 'chills', 'nausea'))
Thanks!

If I understand correctly, you are effectively constructing a schema out of functions, and you want to apply those functions to some arguments when those become available. This falls under the umbrella of functional programming rather than rlang metaprogramming.
A large portion of the functionality you want is already captured by purrr::map and its "engine" as_mapper. You can employ it directly to define
Schema <- function(...) { purrr::map( list(...), purrr::as_mapper ) }
You can now employ it to build new schemas like you suggested (with minor modifications to function definitions):
myschema <- Schema(
patientID = as.character, # Note the correct function name
temp = as.numeric, # Ditto
treated = as.logical, # Tritto
reason_treated = ~factor(., levels=c('fever', 'chills', 'nausea'))
)
# $patientID
# function (x, ...)
# as.character(x = x, ...)
# <environment: base>
#
# $temp
# function (x, ...)
# as.double(x = x, ...)
# <environment: base>
#
# $treated
# function (x, ...)
# as.logical(x = x, ...)
# <environment: base>
#
# $reason_treated
# function (..., .x = ..1, .y = ..2, . = ..1)
# factor(., levels = c("fever", "chills", "nausea"))
# <bytecode: 0x00000000027a2d00>
Given your new schema, registering new patients can be done using a sister function of map that lines up arguments from two lists / vectors:
register_patient <- function(myvec) { purrr::map2( myschema, myvec, ~.x(.y) ) }
JohnDoe <- register_patient( c(1234, 100, TRUE, "fever") )
# $patientID
# [1] "1234"
#
# $temp
# [1] 100
#
# $treated
# [1] TRUE
#
# $reason_treated
# [1] fever
# Levels: fever chills nausea
Let's verify the type of each element:
purrr::map( JohnDoe, class )
# $patientID
# [1] "character"
#
# $temp
# [1] "numeric"
#
# $treated
# [1] "logical"
#
# $reason_treated
# [1] "factor"

Related

R: instrument function to capture all assignments

Given a regular R function f, I'd like to be able to create a new function f_debug that acts just like f, but lets me keep track of all the assignments to function-local variables that happened inside it.
For example:
f <- function(x, y) {
z <- x + y
df <- data.frame(z=z)
df
}
# This function doesn't work as intended - would like it to (in the case of `f` above)
# write out a list containing `z` and `df` to an RDS file
capturing <- function(func) {
e <- new.env()
altered <- function(...) {
parent <- parent.frame()
e <- something...(func, environment(), parent, etc., etc.)
result <- func(...)
saveRDS(as.list(e), 'foo.rds')
result
}
environment(func) <- e
altered
}
f_debug <- capturing(f)
I'm not sure whether my knowledge gap to do this is large or small, anyone have a solution?
Solution 1: Steal the function's code
Here's a solution which doesn't return a new function which captures intermediate calculations, but rather calls the given function's code internally. There's some limitations, such as it probably only works with named arguments. Instead of storing the intermediate calculations as an RDS, it attaches them as an attribute.
capturing <- function(fun, ...) {
fun <- match.fun(fun)
code <- body(fun)
parent <- environment(fun)
env <- new.env(parent = parent)
for (val in names(list(...))) {
env[[val]] <- list(...)[[val]]
}
result <- eval(code, envir = env, enclos = parent.frame())
attr(result, "intermediate") <- env
result
}
my_add <- function(x, y) {
z <- x+y
u <- x-y
w <- x*y
x + y
}
intermediates <- function(x) {
attr(x, "intermediate", exact = TRUE)
}
value <- capturing(my_add, x = 1, y = 7)
ls(envir = intermediates(value))
#> [1] "u" "w" "x" "y" "z"
intermediates(value)$x
#> [1] 1
# Created on 2022-02-08 by the reprex package (v2.0.1)
Solution 2: Modify the function's code
One weakness of this solution is that if the chosen function features a call to on.exit(add=FALSE), some additional work needs to be done to modify the function so the internal environment is captured. However, it does work when the function accepts ... arguments.
my_add <- function(x, y) {
z <- x+y
u <- x-y
w <- x*y
x + y
}
insert_capture <- function(code) {
# `<<-` assigns into the global environment if no variable of the given name is found
# while traveling up to the global environment. If you need this assignment to go elsewhere,
# I'd recommend passing in `assign()`. Of course, you could also modify the `on.exit()`
# to use saveRDS.
parse(text=append(deparse(code),
"on.exit(._last_capture <<- environment(), add = TRUE)",
after = 1L))
}
capturing2 <- function(fun) {
fun <- match.fun(fun)
code <- insert_capture(body(fun))
body(fun) <- code
fun
}
my_add2 <- capturing2(my_add)
my_add2(1, 7)
#> [1] 8
ls(envir = ._last_capture)
#> [1] "u" "w" "x" "y" "z"
._last_capture$u
#> [1] -6
Created on 2022-02-08 by the reprex package (v2.0.1)
What you are describing is already implemented in base R with utils::dump.frames, in an even more sophisticated way. It saves the frame (environment) associated with each call in the call stack to an object of class "dump.frames", which you can explore retroactively with utils::debugger as if you had actually run your code under a debugger.
capturing <- function(func, ...) {
cc <- as.call(c(quote(utils::dump.frames), list(...)))
cc <- call("on.exit", cc, add = TRUE)
body(func) <- call("{", cc, body(func))
func
}
capturing injects the call on.exit(utils::dump.frames(...), add = TRUE) into the body of func and returns the modified function.
Here, ... is a list of arguments to dump.frames:
dumpto, a character string giving the name to be used for the "dump.frames" object
to.file, a logical flag indicating whether the "dump.frames" object should be assigned in the global environment or save-ed to paste0(dumpto, ".rda") in the current working directory
include.GlobalEnv, a logical flag indicating whether the global environment should be saved as well
A quick example, which you should try yourself:
tmp <- tempfile()
dir.create(tmp)
cwd <- setwd(tmp)
f <- function(x, y) {
z <- x + y
z + 1
}
g <- capturing(f, dumpto = "zzz", to.file = TRUE)
h <- function(a, b) {
d <- g(a, b)
d + 1
}
h12 <- h(1, 2)
load("zzz.rda")
zzz
## $`h(1, 2)`
## <environment: 0x14c16cb58>
##
## $`#2: g(a, b)`
## <environment: 0x14c16ca40>
##
## attr(,"error.message")
## [1] ""
## attr(,"class")
## [1] "dump.frames"
ls(zzz[[1L]])
## [1] "a" "b"
ls(zzz[[2L]])
## [1] "z" "x" "y"
utils::debugger(zzz)
## Message: Available environments had calls:
## 1: h(1, 2)
## 2: #2: g(a, b)
##
## Enter an environment number, or 0 to exit
## Selection: 2
## Browsing in the environment with call:
## #2: g(a, b)
## Called from: debugger.look(ind)
## Browse[1]> ls()
## [1] "x" "y" "z"
## Browse[1]> x == 1 && y == 2 && z == x + y
## [1] TRUE
## Browse[1]> Q
setwd(cwd)
unlink(tmp, recursive = TRUE)
See ?browser if you are unfamiliar with R's environment browser.
My capturing function has the limitation that on.exit calls in the body of func must also use add = TRUE. If you have written func yourself, then it is not much of a limitation at all, and passing add = TRUE is a good habit anyway.
Ultimately, there is no completely safe way to inject code into functions, but, in an interactive setting, I would say that this level of "unsafety" is fine.

Using partial within map

I've came up to an interesting problem. I have a function of three variables, let's say (for simplicity and transparency) it is this:
my_fun <- function(a, b, c) paste(a, b, c, sep = '-')
I want to create multiple functions with only argument c for several combinations of a anb b. I am using functions map2 and partial (both from package purrr).
require(purrr)
funs <- map2(letters[1:5], LETTERS[1:5], partial, ...f = my_fun)
I would expect each function in the list of functions produce different output, but that is not true.
funs[[1]]('hi') # [1] "e-E-hi"
funs[[3]]('hi') # [1] "e-E-hi"
funs[[5]]('hi') # [1] "e-E-hi"
I am able to create different solution to my problem, so my question isn't "how to do it". I am rather interested in why it does this.
Another example using base mapply:
mapply(partial, letters[1:5], LETTERS[1:5], MoreArgs = list(...f = my_fun))[[1]]('hi')
# [1] "e-E-hi"
The problem stems from the fact that partial uses lazy evaluation, which within map2 means that it is storing .x and .y instead of a and A. Luckily there is a function argument for that, and we can use:
funs <- map2(letters[1:5], LETTERS[1:5], partial, ...f = my_fun, .lazy = FALSE)
funs[[1]]('hi')
# [1] "a-A-hi"
If you look at your version, we see this:
funs[[1]]
# function (...)
# my_fun(.x[[i]], .y[[i]], ...)
# <environment: 0x00000000201d9598>
And the same for each one of the other 4.
Now, if we look into that environment, we can see:
ls(envir = environment(funs[[1]]))
# [1] "i"
So there is an object stored i in there, that will determine which .x and .y we get and its value is:
get('i', environment(funs[[1]]))
# [1] 5
Also note that your arguments are stored there as well, but are hidden due to their starting with a .:
ls(envir = environment(funs[[1]]), all.names = TRUE)
# [1] "..." ".f" ".x" ".y" "i"
get('.x', envir = environment(funs[[1]]))
# [1] "a" "b" "c" "d" "e"
So for all of these, we get the same result. Specifically, the executed call ends up being:
my_fun(letters[1:5][[5]], LETTERS[1:5][[5]], 'hi')
The lazy evaluation is not playing nice here, and using the stored internal loop counter inside map2.

In R, making packages with functions that are by() and with() compliant

Working on improving my package summarytools, I'm looking for a way to use the information on each of by()'s groups to integrate this info in some function's output. To give a little bit more of a context, the functions in this package print out the dataframe name and variable name(s) being summarized. Functions like by() make it difficult because they use generic names such as dd[x, ] when slicing the data and feeding it to functions. substitute() is thus not an option to get at the x parameter in that case, and the values of the IND variable(s) are also hidden (to a certain level).
To illustrate, in the following example, the group information (c.g. "gender: F" and "smoker: No") is simply printed out with cat() when print.by() is invoked, using attributes of the object of class "by":
dat <- data.frame(gender=rep(c("F","M"),each=15),
smoker=rep(c("Yes", "No")),
someQty=runif(n = 30,min = 0, max = 10))
by(dat$someQty, INDICES = list(gender=dat$gender, smoker=dat$smoker), FUN = mean)
## gender: F
## smoker: No
## [1] 5.560505
## -------------------------------------------------------------------------------
## gender: M
## smoker: No
## [1] 2.568055
## -------------------------------------------------------------------------------
## gender: F
## smoker: Yes
## [1] 4.057938
## -------------------------------------------------------------------------------
## gender: M
## smoker: Yes
## [1] 3.416027
Now what I need is to get the info for each group during the by-group processing (as opposed to recuperating them after the "by" object has been created).
I worked on a solution, but before I repeat a similar work for making functions comply with with(), %>%, and possibly others... and their combinations, I'm wondering if there might be a simpler approach to this.
Here's what I have so far do deal with by():
# Initialise variable in package-specific environment that
# will help keeping track of the by-processing
myenv <- new.env()
myenv$byInfo <- list()
# Declare some function that will return the `by` variables values
# at each iteration (it's a sort of dummy function that does just that)
myfunc <- function(x) {
sc <- sys.calls()
sf <- sys.frames()
# Find position of by.default() and tapply() in the sys.calls list
by_pos <- which(as.character(lapply(sc, head, 1))=="by.default()")
tapply_pos <- which(as.character(lapply(sc, head, 1))=="tapply()")
if (length(by_pos) == 1) {
# check if this is the first "by" iteration
if(length(myenv$byInfo) == 0) {
# Standardise the call (adds argument names)
by_call <- as.list(pryr::standardise_call(sc[[by_pos]]))
# Extract the data argument
by_data <- deparse(by_call$data)
# Extract the IND variable names
by_IND <- as.character(by_call$IND)
by_IND <- by_IND[-which(by_IND=="list")]
# Get the levels of these IND variables
by_levels <- sf[[tapply_pos]]$namelist
levels_df <- expand.grid(by_levels, stringsAsFactors = FALSE)
# Store the info in the package-specific environment
myenv$byInfo$iter <- 1
myenv$byInfo$levels_df <- levels_df
myenv$byInfo$nb_iter <- nrow(levels_df)
}
levels_df <- myenv$byInfo$levels_df
info <- paste(colnames(myenv$byInfo$levels_df),
as.character(myenv$byInfo$levels_df[myenv$byInfo$iter,]),
sep=" = ", collapse = ", ")
if (myenv$byInfo$iter == myenv$byInfo$nb_iter)
myenv$byInfo <- list()
else
myenv$byInfo$iter = myenv$byInfo$iter + 1
return(info)
}
return()
}
b <- by(data = dat$someQty,
INDICES = list(gender = dat$gender, smoker = dat$smoker),
FUN = myfunc)
b[1:4]
## [1] "gender = F, smoker = No" "gender = M, smoker = No"
## [3] "gender = F, smoker = Yes" "gender = M, smoker = Yes"
So yes, it does give me what I want, but I'd like to know if I'm missing something more straightforward here.
Note: I thought adding a by= parameter to some functions and just ignore base R's by() altogether but I'd rather use the preexisting base functions people are accustomed to.

Why do some primitives have byte-codes and some do not?

I've noticed that when I call args on some of the primitive functions, byte-codes show up as well. But on other primitives, no byte-code appears. For example
args(length)
# function (x)
# NULL
args(list)
# function (...)
# NULL
# <bytecode: 0x44a0f38>
Why is that?
At first I thought it might be related to the ... argument, but the following disproves that theory.
args(dim)
# function (x)
# NULL
args(unclass)
# function (x)
# NULL
# <bytecode: 0x44a0450>
It's confusing to me that a byte-code only shows up in some of these, and not in others. I have always been under the impression that all primitives are special and that they all share the same "attributes" (for lack of a better word, not the actual R attributes).
As agstudy noted, this is an oddity related to how args prints things. That is, whether args includes a bytecode line in its output isn't a reliable indicator of whether or not the function was byte compiled. compare:
args(writeLines)
## function (text, con = stdout(), sep = "\n", useBytes = FALSE)
## NULL
writeLines
## function (text, con = stdout(), sep = "\n", useBytes = FALSE)
## {
## if (is.character(con)) {
## con <- file(con, "w")
## on.exit(close(con))
## }
## .Internal(writeLines(text, con, sep, useBytes))
## }
## <bytecode: 0x000000001bf3aeb0>
We can compare printing of a bytecode line for args vs. standard function printing.
arg_shows_bytecode <- function(fn)
{
output <- capture.output(args(fn))
grepl("^<bytecode", output[length(output)])
}
printing_shows_bytecode <- function(fn)
{
output <- capture.output(print(fn))
length(output) > 1 && grepl("^<bytecode", output[length(output) - 1])
}
base_fns <- Filter(is.function, mget(ls(baseenv()), baseenv()))
yn_args <- vapply(base_fns, arg_shows_bytecode, logical(1))
yn_print <- vapply(base_fns, printing_shows_bytecode, logical(1))
It's worth noting that all functions where args shows bytecode information are primitives.
head(base_fns[yn_args])
## $`%*%`
## function (x, y) .Primitive("%*%")
##
## $as.call
## function (x) .Primitive("as.call")
##
## $attr
## function (x, which, exact = FALSE) .Primitive("attr")
##
## $`attr<-`
## function (x, which, value) .Primitive("attr<-")
##
## $attributes
## function (obj) .Primitive("attributes")
##
## $`attributes<-`
## function (obj, value) .Primitive("attributes<-")
The converse isn't true: some base functions where args doesn't show bytecode information are primitives; others are not.
yn_prim <- vapply(base_fns, is.primitive, logical(1))
table(yn_args, yn_print, yn_prim)
## , , yn_prim = FALSE
##
## yn_print
## yn_args FALSE TRUE
## FALSE 0 988
## TRUE 0 0
##
## , , yn_prim = TRUE
##
## yn_print
## yn_args FALSE TRUE
## FALSE 119 0
## TRUE 63 0
So non-primitive functions in the base package are all compiled, but args doesn't mention it. Primitive functions don't show a bytecode message when printed, and only sometimes show a bytecode message when called with args.
Thanks for the report. This behavior is unintentional (a bug, as Hadley says), it is not consistent internally as the bytecode address is displayed only for builtins and specials and only when their formals are in .ArgsEnv (they can also be in .GenericArgsEnv). Now fixed in R-devel. Bug reports are best directed right into R bugzilla (R-devel mailing list works as well).

Given a function defined in an R env, obtain function parameters

What I'm trying to do is trivial but I've not found a clear solution to it:
For instance, I have the following function:
sample.function <- function(a, b, named="test") {
...
}
I wish I could inspect the function and obtain the arguments (maybe as an R list), given ret is the returned value of the desired function, fhe following assertions should be all True
ret <- magicfunction(sample.function)
ret[[1]] == "a"
ret[[2]] == "b"
ret$named == "test"
can it be done?
Here are a few things you can look at, inside or outside of the function.
> f <- function(FUN = sum, na.rm = FALSE) {
c(formals(f), args(f), match.fun(FUN))
}
> f()
$FUN
sum
$na.rm
[1] FALSE
[[3]]
function (FUN = sum, na.rm = FALSE)
NULL
[[4]]
function (..., na.rm = FALSE) .Primitive("sum")
This will work if the function encloses its body with brace brackets (which nearly all functions do). It gives a list whose names are the argument names and whose values are the defaults:
sample.function <- function(a, b, named="test") {} # test function
L <- as.list(formals(sample.function))); L
## $a
##
## $b
##
## $named
## [1] "test"
This is slightly longer but works even for functions whose bodies are not surrounded by brace brackets:
head(as.list(args(sample.function)), -1)
# same output
head(as.list(args(sin)), -1) # sin has no {}
## $x
Returning to the first example, to examine the default values for missing:
sapply(L, identical, formals(function(x) {})$x)
## a b named
## TRUE TRUE FALSE
Revised

Resources