Based on the answer provided in1088639, I set up a pair of functions which both access the same sub-function's environment. This example works, but I wanted to see if I'd missed some cleaner way to "connect" both top-level functions to the internal environment.
( Back story: I wanted to write a pair of complementary functions which shared a variable, e.g. "count" in this example, and meet CRAN package requirements which do not allow functions to modify the global environment. )
static.f <- function() {
count <- 0
f <- function(x) {
count <<- count + 1
return( list(mean=mean(x), count=count) )
}
return( f )
}
# make sure not to delete this command, even tho' it's not
# creating a function.
f1 <- static.f()
statfoo <- function(x){
tmp<-f1(x)
tmp<- list(tmp,plus=2)
return(tmp)
}
statbar <- function(x){
tmp<-f1(x)
tmp<- list(tmp,minus=3)
return(tmp)
}
Sample output:
> statfoo(5)
[[1]]
[[1]]$mean
[1] 5
[[1]]$count
[1] 1
$plus
[1] 2
Rgames> statfoo(5)
[[1]]
[[1]]$mean
[1] 5
[[1]]$count
[1] 2
$plus
[1] 2
> statbar(4)
[[1]]
[[1]]$mean
[1] 4
[[1]]$count
[1] 3
$minus
[1] 3
> statfoo(5)
[[1]]
[[1]]$mean
[1] 5
[[1]]$count
[1] 4
$plus
[1] 2
A cleaner method would be to use an object oriented approach. There is already an answer using reference classes.
A typical object oriented approach with classes would create a class and then create a singleton object, i.e. a single object of that class. Of course it is a bit wasteful to create a class only to create one object from it so here we provide a proto example. (Creating a function to enclose count and the function doing the real work has a similar problem -- you create an enclosing function only to run it once.) The proto model allows one to create an object directly bypassing the need to create a class only to use it once. Here foobar is the proto object with property count and methods stats, statfoo and statbar. Note that we factored out stats to avoid duplicating its code in each of statfoo and statbar. (continued further down)
library(proto)
foobar <- proto(count = 0,
stats = function(., x) {
.$count <- .$count + 1
list(mean = mean(x), count = .$count)
},
statfoo = function(., x) c(.$stats(x), plus = 2),
statbar = function(., x) c(.$stats(x), plus = 3)
)
foobar$statfoo(1:3)
foobar$statbar(2:4)
giving:
> foobar$statfoo(1:3)
$mean
[1] 2
$count
[1] 1
$plus
[1] 2
> foobar$statbar(2:4)
$mean
[1] 3
$count
[1] 2
$plus
[1] 3
A second design would be to have statfoo and statbar as independent functions and only keep count and stats in foobar (continued further down)
library(proto)
foobar <- proto(count = 0,
stats = function(., x) {
.$count <- .$count + 1
list(mean = mean(x), count = .$count)
}
)
statfoo <- function(x) c(foobar$stats(x), plus = 2)
statbar <- function(x) c(foobar$stats(x), plus = 3)
statfoo(1:3)
statbar(2:4)
giving similar output to the prior example.
Third approach Of course the second variation could easily be implemented by using local and a function getting us close to where you started. This does not use any packages but does not create a function only to throw it away:
foobar <- local({
count <- 0
function(x) {
count <<- count + 1
list(mean = mean(x), count = count)
}
})
statfoo <- function(x) c(foobar(x), plus = 2)
statbar <- function(x) c(foobar(x), plus = 3)
statfoo(1:3)
statbar(2:4)
Another simple option is tocreate an environment and assign it to both functions. Here I use simpler functions for illustrative purposes, but this can be easily extended:
f1 <- function() {count <<- count + 1; return(paste("hello", count))}
f2 <- function() {count <<- count + 1; return(paste("goodbye", count))}
environment(f1) <- environment(f2) <- list2env(list(count=0))
Then:
> f1()
[1] "hello 1"
> f2()
[1] "goodbye 2"
> f1()
[1] "hello 3"
Both functions have the same environment.
You can use reference class like this:
foobar <- setRefClass(
'foobar',
fields = list(count='numeric'),
methods = list(
initialize=function() {
.self$initFields(count = 0L)
},
statfoo = function(x) {
count <<- count + 1L
list(list(mean=mean(x), count=count), plus=2)
},
statbar = function(x){
count <<- count + 1L
list(list(mean=mean(x), count=count), minus=3)
}
)
)()
foobar$statfoo(5)
foobar$statbar(3)
It makes it relatively clear that neither statfoo nor statbar is a pure function.
You could get rid of the factory functions, and more explicitly use environments. A solution like this would work as well
.env<-(function() {
count <- 0
f <- function(x) {
count <<- count + 1
return( list(mean=mean(x), count=count))
}
return(environment())
})()
statfoo <- function(x){
list(.env$f(x),plus=2)
}
statbar <- function(x){
list(.env$f(x),minus=3)
}
The .env variable is created by immediately executing an anonymous function to get its environment. We then extract the function from the environment itself to modify its values.
Related
Given a regular R function f, I'd like to be able to create a new function f_debug that acts just like f, but lets me keep track of all the assignments to function-local variables that happened inside it.
For example:
f <- function(x, y) {
z <- x + y
df <- data.frame(z=z)
df
}
# This function doesn't work as intended - would like it to (in the case of `f` above)
# write out a list containing `z` and `df` to an RDS file
capturing <- function(func) {
e <- new.env()
altered <- function(...) {
parent <- parent.frame()
e <- something...(func, environment(), parent, etc., etc.)
result <- func(...)
saveRDS(as.list(e), 'foo.rds')
result
}
environment(func) <- e
altered
}
f_debug <- capturing(f)
I'm not sure whether my knowledge gap to do this is large or small, anyone have a solution?
Solution 1: Steal the function's code
Here's a solution which doesn't return a new function which captures intermediate calculations, but rather calls the given function's code internally. There's some limitations, such as it probably only works with named arguments. Instead of storing the intermediate calculations as an RDS, it attaches them as an attribute.
capturing <- function(fun, ...) {
fun <- match.fun(fun)
code <- body(fun)
parent <- environment(fun)
env <- new.env(parent = parent)
for (val in names(list(...))) {
env[[val]] <- list(...)[[val]]
}
result <- eval(code, envir = env, enclos = parent.frame())
attr(result, "intermediate") <- env
result
}
my_add <- function(x, y) {
z <- x+y
u <- x-y
w <- x*y
x + y
}
intermediates <- function(x) {
attr(x, "intermediate", exact = TRUE)
}
value <- capturing(my_add, x = 1, y = 7)
ls(envir = intermediates(value))
#> [1] "u" "w" "x" "y" "z"
intermediates(value)$x
#> [1] 1
# Created on 2022-02-08 by the reprex package (v2.0.1)
Solution 2: Modify the function's code
One weakness of this solution is that if the chosen function features a call to on.exit(add=FALSE), some additional work needs to be done to modify the function so the internal environment is captured. However, it does work when the function accepts ... arguments.
my_add <- function(x, y) {
z <- x+y
u <- x-y
w <- x*y
x + y
}
insert_capture <- function(code) {
# `<<-` assigns into the global environment if no variable of the given name is found
# while traveling up to the global environment. If you need this assignment to go elsewhere,
# I'd recommend passing in `assign()`. Of course, you could also modify the `on.exit()`
# to use saveRDS.
parse(text=append(deparse(code),
"on.exit(._last_capture <<- environment(), add = TRUE)",
after = 1L))
}
capturing2 <- function(fun) {
fun <- match.fun(fun)
code <- insert_capture(body(fun))
body(fun) <- code
fun
}
my_add2 <- capturing2(my_add)
my_add2(1, 7)
#> [1] 8
ls(envir = ._last_capture)
#> [1] "u" "w" "x" "y" "z"
._last_capture$u
#> [1] -6
Created on 2022-02-08 by the reprex package (v2.0.1)
What you are describing is already implemented in base R with utils::dump.frames, in an even more sophisticated way. It saves the frame (environment) associated with each call in the call stack to an object of class "dump.frames", which you can explore retroactively with utils::debugger as if you had actually run your code under a debugger.
capturing <- function(func, ...) {
cc <- as.call(c(quote(utils::dump.frames), list(...)))
cc <- call("on.exit", cc, add = TRUE)
body(func) <- call("{", cc, body(func))
func
}
capturing injects the call on.exit(utils::dump.frames(...), add = TRUE) into the body of func and returns the modified function.
Here, ... is a list of arguments to dump.frames:
dumpto, a character string giving the name to be used for the "dump.frames" object
to.file, a logical flag indicating whether the "dump.frames" object should be assigned in the global environment or save-ed to paste0(dumpto, ".rda") in the current working directory
include.GlobalEnv, a logical flag indicating whether the global environment should be saved as well
A quick example, which you should try yourself:
tmp <- tempfile()
dir.create(tmp)
cwd <- setwd(tmp)
f <- function(x, y) {
z <- x + y
z + 1
}
g <- capturing(f, dumpto = "zzz", to.file = TRUE)
h <- function(a, b) {
d <- g(a, b)
d + 1
}
h12 <- h(1, 2)
load("zzz.rda")
zzz
## $`h(1, 2)`
## <environment: 0x14c16cb58>
##
## $`#2: g(a, b)`
## <environment: 0x14c16ca40>
##
## attr(,"error.message")
## [1] ""
## attr(,"class")
## [1] "dump.frames"
ls(zzz[[1L]])
## [1] "a" "b"
ls(zzz[[2L]])
## [1] "z" "x" "y"
utils::debugger(zzz)
## Message: Available environments had calls:
## 1: h(1, 2)
## 2: #2: g(a, b)
##
## Enter an environment number, or 0 to exit
## Selection: 2
## Browsing in the environment with call:
## #2: g(a, b)
## Called from: debugger.look(ind)
## Browse[1]> ls()
## [1] "x" "y" "z"
## Browse[1]> x == 1 && y == 2 && z == x + y
## [1] TRUE
## Browse[1]> Q
setwd(cwd)
unlink(tmp, recursive = TRUE)
See ?browser if you are unfamiliar with R's environment browser.
My capturing function has the limitation that on.exit calls in the body of func must also use add = TRUE. If you have written func yourself, then it is not much of a limitation at all, and passing add = TRUE is a good habit anyway.
Ultimately, there is no completely safe way to inject code into functions, but, in an interactive setting, I would say that this level of "unsafety" is fine.
I am scratching my head at the following problem:
I am creating two functions inside a for loop with parameters that depend on some dataframe. Each function is then put inside a list.
Printing the parameters inside the for loop shows that eachh function is well defined. Yet, when I use those outside of the loop, only the last parameters are used for both functions. The following example should make that clearer.
dt <- data.frame(color = c("red", "blue"),
a = c(3,9),
b = c(1.3, 1.8))
function_list <- list()
for (col in dt$color) {
a <- dt$a[dt$color == col]
b <- dt$b[dt$color == col]
foo <- function(x) {
a*x^b
}
print(paste(col, foo(1)))
function_list[[col]] <- foo
}
[1] "red 3"
[1] "blue 9"
function_list[["red"]](1)
[1] 9
function_list[["blue"]](1)
[1] 9
To note, this is inspired from the following question: R nested for loop to write multiple functions and plot them
The equivalent solution with assign and get works (my answer to the previous question).
The relevant values of a and b are those when you call the function and not when you define it. The way you create the list, they are taken from the global environment. The solution is to create closures. I'd use Map for this, but you can do the same with a for loop:
funs <- Map(function(a, b) function(x) a*x^b, a = dt$a, b = dt$b)
print(funs)
#[[1]]
#function (x)
#a * x^b
#<environment: 0x000000000a9a4298>
#
#[[2]]
#function (x)
#a * x^b
#<environment: 0x000000000a9a3728>
Notice the different environments.
environment(funs[[1]])$a
#[1] 3
environment(funs[[2]])$a
#[1] 9
funs[[1]](1)
#[1] 3
funs[[2]](1)
#[1] 9
Your confusion will be solved by going a bit deeper with Environments
Let's check why your code doesn't work. When I try to print(function_list), you can see that both of the functions stored will return a*x^b.
# Part 1 : Why it doesn't work
# --------------------------
print(function_list)
# $red
# function (x)
# {
# a * x^b
# }
#
# $blue
# function (x)
# {
# a * x^b
# }
If you try to remove a and re-run the function, an error will be returned .
rm(a)
function_list[['red']](1)
# Error in function_list[["red"]](1) : object 'a' not found
.
And now to how to make your code work:
There is more than one way to make it work, most of which will require either playing around with your environments or changing the data structure.
One way to manage your environments - in such way that it will keep your values and not search for the variable in the global environment - is returning a function from the function.
# Part 2 : How to make it work
# ----------------------------
function_list <- list()
for (col in dt$color) {
a <- dt$a[dt$color == col]
b <- dt$b[dt$color == col]
foo1 <- function(inner.a, inner.b) {
return(function(x) {inner.a*x^inner.b})
}
foo2 <- foo1(a,b)
print(paste(col, foo2(1)))
function_list[[col]] <- foo2
}
Now, if we check what's in the function_list, you can see that the functions are in two environments
print(function_list)
# $red
# function (x)
# {
# inner.a * x^inner.b
# }
# <environment: 0x186fb40>
#
# $blue
# function (x)
# {
# inner.a * x^inner.b
# }
# <environment: 0x2536438>
The output is also as expected. And even when we remove a, it will still work as expected.
function_list[['red']](1) # 3
function_list[['blue']](1) # 9
rm(a)
function_list[['red']](1) #[1] 3
I think that the for loop does not create new environments (you can check this by print(environment) within the loop), so the values of a and b are taken by foo in the global environment where they are 9 and 1.8, i.e. their last assigned values.
In the following example I created the add_timing function operator. The input is a function (say mean) and it returns a function that does the same as mean, but reports on how long it took for the function to complete. See the following example:
library(pryr)
add_timing = function(input_function, specific_info) {
if (missing(specific_info)) specific_info = function(l) 'That'
function(...) {
relevant_value = specific_info(list(...))
start_time = Sys.time()
res = input_function(...)
cat(sprintf('%s took', relevant_value), difftime(Sys.time(), start_time, units = 'secs'), 'sec', '\n')
res
}
}
timed_mean = add_timing(mean)
# > timed_mean(runif(10000000))
# That took 0.4284899 sec
# [1] 0.4999762
Next I tried to use pryr::compose to create the same timed_mean function (I like the syntax):
timed_mean_composed = pryr::compose(add_timing, mean)
But this does get me the required output:
# > timed_mean_composed(runif(100))
# function(...) {
# relevant_value = specific_info(list(...))
# start_time = Sys.time()
# res = input_function(...)
# cat(sprintf('%s took', relevant_value), difftime(Sys.time(), start_time, units = 'secs'), 'sec', '\n')
# res
# }
It seems that the compose operation does not lead to the add_timing function actually being executed. Only after calling the function, the new timed_mean_compose actually shows the correct function output.
Based on the following example from Advanced R by #HadleyWickham I expected this to work as I used it (see below for an excerpt):
dot_every <- function(n, f) {
i <- 1
function(...) {
if (i %% n == 0) cat(".")
i <<- i + 1
f(...)
}
}
download <- pryr::compose(
partial(dot_every, 10),
memoise,
partial(delay_by, 1),
download_file
)
Where the dot_every function operator is used in the same way I use add_timing above.
What am I missing?
The difference is that in your first attempt, you are calling
(add_timing(mean))(runif(1e7)
and with the compose syntax you are calling something more similar to
add_timing(mean(runif(1e7))
These are not exactly equivalent. Actually, the pryr compose function is really expanding the syntax to something more like
x <- runif(1e7)
x <- mean(x)
x <- add_timing(x)
Maybe looking at this will help
a <- function(x) {print(paste("a:", x));x}
b <- function(x) {print(paste("b:", x));x}
x <- pryr::compose(a,b)(print("c"))
# [1] "c"
# [1] "b: c"
# [1] "a: c"
Notice how a isn't called until after b. This means that a would have no way to time b. compose would not be an appropriate way to create a timer wrapper.
The issue is that pryr::compose is aimed at doing something completely different from what you're trying to do in your initial example. You want to create a function factory (called add_timing), which will take a function as input and return a new function as output that does the same thing as the input function but with an additional time printing. I would write that as follows:
add_timing <- function(FUN) { function(...) { print(system.time(r <- FUN(...))); r }}
mean(1:5)
# [1] 3
add_timing(mean)(1:5)
# user system elapsed
# 0 0 0
# [1] 3
The compose function, by contrast, returns a function that represents a series of functions to be evaluated in sequence. The examples in ? compose are helpful here. Here's an example that builds on that:
add1 <- function(x) x + 1
times2 <- function(x) x * 2
# the following two are identical:
add1(1)
# [1] 2
compose(add1)(1)
# [1] 2
# the following two are identical:
times2(1)
# [1] 2
compose(times2)(1)
# [1] 2
compose becomes useful for nesting, when the order of nesting is important:
add1(times2(2))
# [1] 5
compose(add1, times2)(2)
# [1] 5
times2(add1(2))
# [1] 6
compose(times2, add1)(2)
# [1] 6
This means that the reason your example does not work is because your functions are not actually nested in the way that compose is intended to work. In your example, you're asking system.time to, for example, calculate the time to evaluate 3 (the output of mean) rather than the time to evaluate mean(1:5).
I have written a stack "class" with the following functions: add, push, pop, size, isEmpty, clear (and some more).
I'd like to use this "class" as a generic in R, so I may create multiple instances of stacks within my script. How do I go about doing this?
(I have class in quotes because my stack functions are written in a different script (not necessarily the definition of a class per se)
Thanks in advance
list <- ""
cursor = 0
#Initializes stack to empty
stack <- function(){
list <- c()
cursor = -1
assign("list",list,.GlobalEnv)
assign("cursor",cursor,.GlobalEnv)
}
#Where item is a item to be added to generic list
push <- function(item){
if(size(list) == 0){
add(item, -1)
}else{
add(item, 0)
}
assign("list",list,.GlobalEnv)
}
This is a simpler version of the stack implementation #GSee references that avoids using any of the formal object-orientation systems available in R. Simplification proceeds from the fact that all functions in R are closures and functions created during a function call are bound to the environment created for that call.
new_stack <- function() {
stack <- vector()
push <- function(x) stack <<- c(stack, x)
pop <- function() {
tmp<-tail(stack, 1)
stack<<-stack[-length(stack)]
return(tmp)
}
structure(list(pop=pop, push=push), class='stack')
}
x <- new_stack()
x$push(1:3)
x$pop()
# [1] 3
x$pop()
# [1] 2
Here's an S4 implementation, for comparison.
setClass('Stack',
representation(list='list', cursor='numeric'), # type defs
prototype(list=list(), cursor=NA_real_)) # default values
setGeneric('push', function(obj, ...) standardGeneric('push'))
setMethod('push', signature(obj='Stack'),
function(obj, x) {
obj#list <- c(x, obj#list)
obj
})
setGeneric('pop', function(obj, ...) standardGeneric('pop'))
setMethod('pop', signature(obj='Stack'),
function(obj) {
obj#cursor <- obj#list[[1]]
obj#list <- obj#list[-1]
obj
}
)
x <- new('Stack')
# cursor is empty to start
x#cursor
#[1] NA
# add items
x <- push(x, 1)
x <- push(x, 2)
# pop them (move next item to cursor, remove from list)
x <- pop(x)
x#cursor
# [1] 2
x <- pop(x)
x#cursor
# [1] 1
Since you are specifically talking about a stack "class" with push and pop methods, here's an implementation by Jeff Ryan taken from Introducing Closures which you can read for an explanation of what's going on here.
new_stack <- function() {
stack <- new.env()
stack$.Data <- vector()
stack$push <- function(x) .Data <<- c(.Data,x)
stack$pop <- function() {
tmp <- .Data[length(.Data)]
.Data <<- .Data[-length(.Data)]
return(tmp)
}
environment(stack$push) <- as.environment(stack)
environment(stack$pop) <- as.environment(stack)
class(stack) <- "stack"
stack
}
> x <- new_stack()
> x$push(1:3)
> x$pop()
[1] 3
> x$pop()
[1] 2
Then, if you create S3 generics...
push <- function(x, value, ...) UseMethod("push")
pop <- function(x, ...) UseMethod("pop")
push.stack <- function(x, value, ...) x$push(value)
pop.stack <- function(x) x$pop()
> push(x, 5)
> pop(x)
[1] 5
> pop(x)
[1] 1
I would like to write a function that handles multiple data types. Below is an example that works but seems clunky. Is there a standard (or better) way of doing this?
(It's times like this I miss Matlab where everything is one type :>)
myfunc = function(x) {
# does some stuff to x and returns a value
# at some point the function will need to find out the number of elements
# at some point the function will need to access an element of x.
#
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
x.vec <- as.vector(as.matrix(as.data.frame(x)))
n <- length(x.vec)
ret <- x.vec[n/3] # this line only for concreteness
return(ret)
}
Use S3 methods. A quick example to get you started:
myfunc <- function(x) {
UseMethod("myfunc",x)
}
myfunc.data.frame <- function(x) {
x.vec <- as.vector(as.matrix(x))
myfunc(x.vec)
}
myfunc.numeric <- function(x) {
n <- length(x)
ret <- x[n/3]
return(ret)
}
myfunc.default <- function(x) {
stop("myfunc not defined for class",class(x),"\n")
}
Two notes:
The ... syntax passes any additional arguments on to functions. If you're extending an existing S3 method (e.g. writing something like summary.myobject), then including the ... is a good idea, because you can pass along arguments conventionally given to the canonical function.
print.myclass <- function(x,...) {
print(x$keyData,...)
}
You can call functions from other functions and keep things nice and parsimonious.
Hmm, your documentation for the function is
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
and if one supplies an object as you claim is require, isn't it already a vector and not a matrix or a data frame, hence obviating the need for separate methods/specific handling?
> dat <- data.frame(A = 1:10, B = runif(10))
> class(dat[,1])
[1] "integer"
> is.vector(dat[,1])
[1] TRUE
> is.vector(dat$A)
[1] TRUE
> is.numeric(dat$A)
[1] TRUE
> is.data.frame(dat$A)
[1] FALSE
I would:
myfunc <- function(x) {
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
n <- length(x)
ret <- x[n/3] # this line only for concreteness
return(ret)
}
> myfunc(dat[,1])
[1] 3
Now, if you want to handle different types of objects and extract a column, then S3 methods would be a way to go. Perhaps your example is over simplified for actual use? Anyway, S3 methods would be something like:
myfunc <- function(x, ...)
UseMethod("myfunc", x)
myfunc.matrix <- function(x, j = 1, ...) {
x <- x[, j]
myfunc.default(x, ...)
}
myfunc.data.frame <- function(x, j = 1, ...) {
x <- data.matrix(x)
myfunc.matrix(x, j, ...)
}
myfunc.default <- function(x, ...) {
n <- length(x)
x[n/3]
}
Giving:
> myfunc(dat)
[1] 3
> myfunc(data.matrix(dat))
[1] 3
> myfunc(data.matrix(dat), j = 2)
[1] 0.2789631
> myfunc(dat[,2])
[1] 0.2789631
You probably should try to use an S3 method for writing a function that will handle multiple datatypes.
A good reference is here: http://www.biostat.jhsph.edu/~rpeng/biostat776/classes-methods.pdf