Given a regular R function f, I'd like to be able to create a new function f_debug that acts just like f, but lets me keep track of all the assignments to function-local variables that happened inside it.
For example:
f <- function(x, y) {
z <- x + y
df <- data.frame(z=z)
df
}
# This function doesn't work as intended - would like it to (in the case of `f` above)
# write out a list containing `z` and `df` to an RDS file
capturing <- function(func) {
e <- new.env()
altered <- function(...) {
parent <- parent.frame()
e <- something...(func, environment(), parent, etc., etc.)
result <- func(...)
saveRDS(as.list(e), 'foo.rds')
result
}
environment(func) <- e
altered
}
f_debug <- capturing(f)
I'm not sure whether my knowledge gap to do this is large or small, anyone have a solution?
Solution 1: Steal the function's code
Here's a solution which doesn't return a new function which captures intermediate calculations, but rather calls the given function's code internally. There's some limitations, such as it probably only works with named arguments. Instead of storing the intermediate calculations as an RDS, it attaches them as an attribute.
capturing <- function(fun, ...) {
fun <- match.fun(fun)
code <- body(fun)
parent <- environment(fun)
env <- new.env(parent = parent)
for (val in names(list(...))) {
env[[val]] <- list(...)[[val]]
}
result <- eval(code, envir = env, enclos = parent.frame())
attr(result, "intermediate") <- env
result
}
my_add <- function(x, y) {
z <- x+y
u <- x-y
w <- x*y
x + y
}
intermediates <- function(x) {
attr(x, "intermediate", exact = TRUE)
}
value <- capturing(my_add, x = 1, y = 7)
ls(envir = intermediates(value))
#> [1] "u" "w" "x" "y" "z"
intermediates(value)$x
#> [1] 1
# Created on 2022-02-08 by the reprex package (v2.0.1)
Solution 2: Modify the function's code
One weakness of this solution is that if the chosen function features a call to on.exit(add=FALSE), some additional work needs to be done to modify the function so the internal environment is captured. However, it does work when the function accepts ... arguments.
my_add <- function(x, y) {
z <- x+y
u <- x-y
w <- x*y
x + y
}
insert_capture <- function(code) {
# `<<-` assigns into the global environment if no variable of the given name is found
# while traveling up to the global environment. If you need this assignment to go elsewhere,
# I'd recommend passing in `assign()`. Of course, you could also modify the `on.exit()`
# to use saveRDS.
parse(text=append(deparse(code),
"on.exit(._last_capture <<- environment(), add = TRUE)",
after = 1L))
}
capturing2 <- function(fun) {
fun <- match.fun(fun)
code <- insert_capture(body(fun))
body(fun) <- code
fun
}
my_add2 <- capturing2(my_add)
my_add2(1, 7)
#> [1] 8
ls(envir = ._last_capture)
#> [1] "u" "w" "x" "y" "z"
._last_capture$u
#> [1] -6
Created on 2022-02-08 by the reprex package (v2.0.1)
What you are describing is already implemented in base R with utils::dump.frames, in an even more sophisticated way. It saves the frame (environment) associated with each call in the call stack to an object of class "dump.frames", which you can explore retroactively with utils::debugger as if you had actually run your code under a debugger.
capturing <- function(func, ...) {
cc <- as.call(c(quote(utils::dump.frames), list(...)))
cc <- call("on.exit", cc, add = TRUE)
body(func) <- call("{", cc, body(func))
func
}
capturing injects the call on.exit(utils::dump.frames(...), add = TRUE) into the body of func and returns the modified function.
Here, ... is a list of arguments to dump.frames:
dumpto, a character string giving the name to be used for the "dump.frames" object
to.file, a logical flag indicating whether the "dump.frames" object should be assigned in the global environment or save-ed to paste0(dumpto, ".rda") in the current working directory
include.GlobalEnv, a logical flag indicating whether the global environment should be saved as well
A quick example, which you should try yourself:
tmp <- tempfile()
dir.create(tmp)
cwd <- setwd(tmp)
f <- function(x, y) {
z <- x + y
z + 1
}
g <- capturing(f, dumpto = "zzz", to.file = TRUE)
h <- function(a, b) {
d <- g(a, b)
d + 1
}
h12 <- h(1, 2)
load("zzz.rda")
zzz
## $`h(1, 2)`
## <environment: 0x14c16cb58>
##
## $`#2: g(a, b)`
## <environment: 0x14c16ca40>
##
## attr(,"error.message")
## [1] ""
## attr(,"class")
## [1] "dump.frames"
ls(zzz[[1L]])
## [1] "a" "b"
ls(zzz[[2L]])
## [1] "z" "x" "y"
utils::debugger(zzz)
## Message: Available environments had calls:
## 1: h(1, 2)
## 2: #2: g(a, b)
##
## Enter an environment number, or 0 to exit
## Selection: 2
## Browsing in the environment with call:
## #2: g(a, b)
## Called from: debugger.look(ind)
## Browse[1]> ls()
## [1] "x" "y" "z"
## Browse[1]> x == 1 && y == 2 && z == x + y
## [1] TRUE
## Browse[1]> Q
setwd(cwd)
unlink(tmp, recursive = TRUE)
See ?browser if you are unfamiliar with R's environment browser.
My capturing function has the limitation that on.exit calls in the body of func must also use add = TRUE. If you have written func yourself, then it is not much of a limitation at all, and passing add = TRUE is a good habit anyway.
Ultimately, there is no completely safe way to inject code into functions, but, in an interactive setting, I would say that this level of "unsafety" is fine.
Related
I have the following requirement: I have a list of variables and functions defined in a config.R file:
# config.R
x <- 1
foo <- function(y) {
2
}
z <- x + 1
I want the above to be "sourced" in a list defined in the .Globalenv
I have a way to do this by creating a local environment:
source_in_list <- function(path) {
e <- new.env()
source(path, local = e)
return(as.list(e))
}
p <- source_in_list("config.R")
p
$x
[1] 1
$z
[1] 2
$foo
function(y) {
2
}
<environment: 0x2f99d90>
My problem is that foo is linked to the <environment: 0x2f99d90>, which means if I was to redefine foo in the .Globalenv p$foo would be unaffected, and this is not what I want.
Essentially, I would like to do as if I was:
creating p in .Globalenv
executing every line within p
so result would be like:
p
$x
[1] 1
$z
[1] 2
$foo
function(y) {
2
}
How can I do this ?
EDIT:
I realized that what I wanted was define functions from the source file in the globalenv() and the rest in a list
source_in_list <- function(path) {
e <- new.env()
source(path, local = e)
# types
is_fun <- sapply(e, FUN = function(x) inherits(x, "function"))
# define functions from e into globalenv
if(any(is_fun)) {
for(fun_name in names(which(is_fun))) {
# assign in globalenv
assign(x = fun_name, value = eval(parse(text = deparse(get(fun_name, envir = e))), envir = globalenv()), envir = globalenv())
# remove from local env
rm(list = fun_name, envir = e)
}
}
return(as.list(e))
}
p1 <- source_in_list("config.R")
p1
$x
[1] 1
$z
[1] 2
foo
function (y)
{
2
}
>
I think you have a misconception: if foo is stored in your list p, then redefining foo in .Globalenv won't have any effect. Those will be separate objects.
The purpose of the environment associated with a function is to tell R where to look for non-local variables used in the function. Your original version will end up with two copies of everything you sourced, one in the list and one in the local environment you created. If foo referred to x, it would see the one in the local environment. For example, look at this change to your code where foo() returns x:
# config.R
x <- 1
foo <- function() {
x
}
z <- x + 1
and then
source_in_list <- function(path) {
e <- new.env()
source(path, local = e)
return(as.list(e))
}
p <- source_in_list("config.R")
x <- 42 # Set a global variable x
p$foo()
# [1] 1 # It is ignored
p$x <- 123 # Set x in p
p$foo()
# [1] 1 # It is ignored
You probably don't want two copies of everything. But then it's not clear that what you want to do is possible. A list can't act as the environment of a function, so there's no way to make p$x be the target of references from within foo.
I'd suggest that instead of returning a list, you just return the local environment you created. Then things will work as you'd expect:
source_to_local <- function(path) {
e <- new.env()
source(path, local = e)
return(e)
}
e <- source_to_local("config.R")
x <- 42 # set a global
e$foo()
[1] 1 # it is ignored
e$x <- 123 # set x in e
e$foo()
[1] 123 # it responds
The main disadvantage of returning the environment is that they don't print the way lists do, but you could probably write a function to print an environment and make everything in it visible.
For example, suppose I would like to be able to define a function that returned the name of the assignment variable concatenated with the first argument:
a <- add_str("b")
a
# "ab"
The function in the example above would look something like this:
add_str <- function(x) {
arg0 <- as.list(match.call())[[1]]
return(paste0(arg0, x))
}
but where the arg0 line of the function is replaced by a line that will get the name of the variable being assigned ("a") rather than the name of the function.
I've tried messing around with match.call and sys.call, but I can't get it to work. The idea here is that the assignment operator is being called on the variable and the function result, so that should be the parent call of the function call.
I think that it's not strictly possible, as other solutions explained, and the reasonable alternative is probably Yosi's answer.
However we can have fun with some ideas, starting simple and getting crazier gradually.
1 - define an infix operator that looks similar
`%<-add_str%` <- function(e1, e2) {
e2_ <- e2
e1_ <- as.character(substitute(e1))
eval.parent(substitute(e1 <- paste0(e1_,e2_)))
}
a %<-add_str% "b"
a
# "ab"
2 - Redefine := so that it makes available the name of the lhs to the rhs through a ..lhs() function
I think it's my favourite option :
`:=` <- function(lhs,rhs){
lhs_name <- as.character(substitute(lhs))
assign(lhs_name,eval(substitute(rhs)), envir = parent.frame())
lhs
}
..lhs <- function(){
eval.parent(quote(lhs_name),2)
}
add_str <- function(x){
res <- paste0(..lhs(),x)
res
}
a := add_str("b")
a
# [1] "ab"
There might be a way to redefine <- based on this, but I couldn't figure it out due to recursion issues.
3 - Use memory address dark magic to hunt lhs (if it exists)
This comes straight from: Get name of x when defining `(<-` operator
We'll need to change a bit the syntax and define the function fetch_name for this purpose, which is able to get the name of the rhs from a *<- function, where as.character(substitute(lhs)) would return "*tmp*".
fetch_name <- function(x,env = parent.frame(2)) {
all_addresses <- sapply(ls(env), pryr:::address2, env)
all_addresses <- all_addresses[names(all_addresses) != "*tmp*"]
all_addresses_short <- gsub("(^|<)[0x]*(.*?)(>|$)","\\2",all_addresses)
x_address <- tracemem(x)
untracemem(x)
x_address_short <- tolower(gsub("(^|<)[0x]*(.*?)(>|$)","\\2",x_address))
ind <- match(x_address_short, all_addresses_short)
x_name <- names(all_addresses)[ind]
x_name
}
`add_str<-` <- function(x,value){
x_name <- fetch_name(x)
paste0(x_name,value)
}
a <- NA
add_str(a) <- "b"
a
4- a variant of the latter, using .Last.value :
add_str <- function(value){
x_name <- fetch_name(.Last.value)
assign(x_name,paste0(x_name,value),envir = parent.frame())
paste0(x_name,value)
}
a <- NA;add_str("b")
a
# [1] "ab"
Operations don't need to be on the same line, but they need to follow each other.
5 - Again a variant, using a print method hack
Extremely dirty and convoluted, to please the tortured spirits and troll the others.
This is the only one that really gives the expected output, but it works only in interactive mode.
The trick is that instead of doing all the work in the first operation I also use the second (printing). So in the first step I return an object whose value is "b", but I also assigned a class "weird" to it and a printing method, the printing method then modifies the object's value, resets its class, and destroys itself.
add_str <- function(x){
class(x) <- "weird"
assign("print.weird", function(x) {
env <- parent.frame(2)
x_name <- fetch_name(x, env)
assign(x_name,paste0(x_name,unclass(x)),envir = env)
rm(print.weird,envir = env)
print(paste0(x_name,x))
},envir = parent.frame())
x
}
a <- add_str("b")
a
# [1] "ab"
(a <- add_str("b") will have the same effect as both lines above. print(a <- add_str("b")) would also have the same effect but would work in non interactive code, as well.
This is generally not possible because the operator <- is actually parsed to a call of the <- function:
rapply(as.list(quote(a <- add_str("b"))),
function(x) if (!is.symbol(x)) as.list(x) else x,
how = "list")
#[[1]]
#`<-`
#
#[[2]]
#a
#
#[[3]]
#[[3]][[1]]
#add_str
#
#[[3]][[2]]
#[1] "b"
Now, you can access earlier calls on the call stack by passing negative numbers to sys.call, e.g.,
foo <- function() {
inner <- sys.call()
outer <- sys.call(-1)
list(inner, outer)
}
print(foo())
#[[1]]
#foo()
#[[2]]
#print(foo())
However, help("sys.call") says this (emphasis mine):
Strictly, sys.parent and parent.frame refer to the context of the
parent interpreted function. So internal functions (which may or may
not set contexts and so may or may not appear on the call stack) may
not be counted, and S3 methods can also do surprising things.
<- is such an "internal function":
`<-`
#.Primitive("<-")
`<-`(x, foo())
x
#[[1]]
#foo()
#
#[[2]]
#NULL
As Roland pointed, the <- is outside of the scope of your function and could only be located looking at the stack of function calls, but this fail. So a possible solution could be to redefine the '<-' else than as a primitive or, better, to define something that does the same job and additional things too.
I don't know if the ideas behind following code can fit your needs, but you can define a "verbose assignation" :
`:=` <- function (var, value)
{
call = as.list(match.call())
message(sprintf("Assigning %s to %s.\n",deparse(call$value),deparse(call$var)))
eval(substitute(var <<- value))
return(invisible(value))
}
x := 1:10
# Assigning 1:10 to x.
x
# [1] 1 2 3 4 5 6 7 8 9 10
And it works in some other situation where the '<-' is not really an assignation :
y <- data.frame(c=1:3)
colnames(y) := "b"
# Assigning "b" to colnames(y).
y
# b
#1 1
#2 2
#3 3
z <- 1:4
dim(z) := c(2,2)
#Assigning c(2, 2) to dim(z).
z
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
>
I don't think the function has access to the variable it is being assigned to. It is outside of the function scope and you do not pass any pointer to it or specify it in any way. If you were to specify it as a parameter, you could do something like this:
add_str <- function(x, y) {
arg0 <-deparse(substitute(x))
return(paste0(arg0, y))
}
a <- 5
add_str(a, 'b')
#"ab"
Hadley's new pryr package that shows the address of a variable is really great for profiling. I have found that whenever a variable is passed into a function, no matter what that function does, a copy of that variable is created. Furthermore, if the body of the function passes the variable into another function, another copy is generated. Here is a clear example
n = 100000
p = 100
bar = function(X) {
print(pryr::address(X))
}
foo = function(X) {
print(pryr::address(X))
bar(X)
}
X = matrix(rnorm(n*p), n, p)
print(pryr::address(X))
foo(X)
Which generates
> X = matrix(rnorm(n*p), n, p)
> print(pryr::address(X))
[1] "0x7f5f6ce0f010"
> foo(X)
[1] "0x92f6d70"
[1] "0x92f3650"
The address changes each time, despite the functions not doing anything. I'm confused by this behavior because I've heard R described as copy on write - so variables can be passed around but copies are only generated when a function wants to write into that variable. What is going on in these function calls?
For best R development is it better to not write multiple small functions, rather keep the content all in one function? I have also found some discussion on Reference Classes, but I see very little R developers using this. Is there another efficient way to pass the variable that I am missing?
I'm not entirely certain, but address may point to the memory address of the pointer to the object. Take the following example.
library(pryr)
n <- 100000
p <- 500
X <- matrix(rep(1,n*p), n, p)
l <- list()
for(i in 1:10000) l[[i]] <- X
At this point, if each element of l was a copy of X, the size of l would be ~3.5Tb. Obviously this is not the case as your computer would have started smoking. And yet the addresses are different.
sapply(l[1:10], function(x) address(x))
# [1] "0x1062c14e0" "0x1062c0f10" "0x1062bebc8" "0x10641e790" "0x10641dc28" "0x10641c640" "0x10641a800" "0x1064199c0"
# [9] "0x106417380" "0x106411d40"
pryr::address passes an unevaluated symbol to an internal function that returns its address in the parent.frame():
pryr::address
#function (x)
#{
# address2(check_name(substitute(x)), parent.frame())
#}
#<environment: namespace:pryr>
Wrapping of the above function can lead to returning address of a "promise". To illustrate we can simulate pryr::address's functionality as:
ff = inline::cfunction(sig = c(x = "symbol", env = "environment"), body = '
SEXP xx = findVar(x, env);
Rprintf("%s at %p\\n", type2char(TYPEOF(xx)), xx);
if(TYPEOF(xx) == PROMSXP) {
SEXP pr = eval(PRCODE(xx), PRENV(xx));
Rprintf("\tvalue: %s at %p\\n", type2char(TYPEOF(pr)), pr);
}
return(R_NilValue);
')
wrap1 = function(x) ff(substitute(x), parent.frame())
where wrap1 is an equivalent of pryr::address.
Now:
x = 1:5
.Internal(inspect(x))
##256ba60 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 1,2,3,4,5
pryr::address(x)
#[1] "0x256ba60"
wrap1(x)
#integer at 0x0256ba60
#NULL
with further wrapping, we can see that a "promise" object is being constructed while the value is not copied:
wrap2 = function(x) wrap1(x)
wrap2(x)
#promise at 0x0793f1d4
# value: integer at 0x0256ba60
#NULL
wrap2(x)
#promise at 0x0793edc8
# value: integer at 0x0256ba60
#NULL
# wrap 'pryr::address' like your 'bar'
( function(x) pryr::address(x) )(x)
#[1] "0x7978a64"
( function(x) pryr::address(x) )(x)
#[1] "0x79797b8"
You can use the profmem package (I'm the author), to see what memory allocations take place. It requires that your R session was build with "profmem" capabilities:
capabilities()["profmem"]
## profmem
## TRUE
Then, you can do something like this:
n <- 100000
p <- 100
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
object.size(X)
## 80000200 bytes
## No copies / no new objects
bar <- function(X) X
foo <- function(X) bar(X)
## One new object
bar2 <- function(X) 2*X
foo2 <- function(X) bar2(X)
profmem::profmem(foo(X))
## Rprofmem memory profiling of:
## foo(X)
##
## Memory allocations:
## bytes calls
## total 0
profmem::profmem(foo2(X))
## Rprofmem memory profiling of:
## foo2(X)
##
## Memory allocations:
## bytes calls
## 1 80000040 foo2() -> bar2()
## total 80000040
The first call to the function f works, the second does not. How can I pass a String ("v") to the function f so that the function works as exspected?
library(data.table)
f<-function(t,x) t[,deparse(substitute(x)),with=F]
dat<-data.table(v="a")
f(dat,v)
# v
# 1: a
f(dat,eval(parse(text="v")))
# Error in `[.data.table`(t, , deparse(substitute(x)), with = F) :
# column(s) not found: eval(parse(text = "v"))
It won't be a one-liner anymore but you can test for what you're passing in:
library(data.table)
library(purrr)
dat <- data.table(v="a")
f <- function(dt, x) {
# first, see if 'x' is a variable holding a string with a column name
seval <- safely(eval)
res <- seval(x, dt, parent.frame())
# if it is, then get the value, otherwise substitute() it
if ((!is.null(res$result)) && inherits(res$result, "character")) {
y <- res$result
} else {
y <- substitute(x)
}
# if it's a bare name, then we deparse it, otherwise we turn
# the string into name and then deparse it
if (inherits(y, "name")) {
y <- deparse(y)
} else if (inherits(y, "character")) {
y <- deparse(as.name(x))
}
dt[, y, with=FALSE]
}
f(dat,v)
## v
## 1: a
f(dat, "v")
## v
## 1: a
V <- "v"
f(dat, V)
## v
## 1: a
f(dat, VVV)
#> throws an Error
I switched it from t to dt since I don't like using the names of built-in functions (like t()) as variable names unless I really have to. It can introduce subtle errors in larger code blocks that can be frustrating to debug.
I'd also move the safely() call outside the f() function to save a function call each time you run f(). You can use old-school try() instead, if you like, but you have to check for try-error which may break some day. You could also tryCatch() wrap it, but the safely() way just seems cleaner to me.
I'm trying to get the names of arguments in the global environment within a function. I know I can use substitute to get the name of named arguments, but I would like to be able to do the same thing with ... arguments. I kinda got it to work for the first element of ... but can't figure out how to do it for the rest of the elements. Any idea how to get this working as intended.
foo <- function(a,...)
{
print(substitute(a))
print(eval(enquote(substitute(...))))
print(sapply(list(...),function(x) eval(enquote(substitute(x)),env=.GlobalEnv)))
}
x <- 1
y <- 2
z <- 3
foo(x,y,z)
x
y
[[1]]
X[[1L]]
[[2]]
X[[2L]]
The canonical idiom here is deparse(substitute(foo)), but the ... needs slightly different processing. Here is a modification that does what you want:
foo <- function(a, ...) {
arg <- deparse(substitute(a))
dots <- substitute(list(...))[-1]
c(arg, sapply(dots, deparse))
}
x <- 1
y <- 2
z <- 3
> foo(x,y,z)
[1] "x" "y" "z"
I would go with
foo <- function(a, ...) {
print( n <- sapply(as.list(substitute(list(...)))[-1L], deparse) )
n
}
Then
foo(x,y,z)
# [1] "y" "z"
Related question was previously on StackOverflow:
How to use R's ellipsis feature when writing your own function? Worth reading.
Second solution, using match.call
foo <- function(a, ...) {
sapply(match.call(expand.dots=TRUE)[-1], deparse)
}