I have the following snippet
library(magrittr)
test_fun <- function(x) {
foo <- x %>%
assign("boo", .)
boo
}
test_fun("hello") # I want this to return "hello"
# Error in test_fun("hello") : object 'boo' not found
I'd like to be able to assign values to names in the scope of the function. Is there a way to do it?
EDIT:
The reason behind the pipe here is that, in my actual use case, I'd like to save some intermediate result that can be referred to later on in the pipeline. Put another way, instead of writing e.g.
foo1 <- data %>% ...
foo2 <- foo1 %>% ...
foo3 <- some manipulation with foo1 and foo2
I can do something like
foo <- data %>% ... %>% (assign to foo2) %>% ... %>% some manipulation with foo2
I'm under the impression that this would be a "cleaner" way to code, but am happy to learn otherwise if it's not good practice.
I don't necessarily recommend this (without more understanding of all the requirements), but the issue you are having is with the scope that assign sees at the time of execution. Regardless of where it is looking, what's important is that it is in fact assigning to a variable named boo but not in the scope of test_fun nor in a way that is easily retrievable later.
A quick solution might be to do something like:
test_fun <- function(x) {
myenv <- environment()
foo <- x %>%
assign("boo", ., envir = myenv)
# something else with 'foo'? side-effect? unknown ...
boo
}
test_fun("hello")
# [1] "hello"
Since assign invisibly returns the value stored ("." in your context), you should be able to keep the pipeline moving after it.
Related
I'm trying to get a better understanding of closures, in particular details on a function's scope and how to work with its enclosing environment(s)
Based on the Description section of the help page on rlang::fn_env(), I had the understanding, that a function always has access to all variables in its scope and that its enclosing environment belongs to that scope.
But then, why isn't it possible to manipulate the contents of the closure environment "after the fact", i.e. after the function has been created?
By means of R's lexical scoping, shouldn't bar() be able to find x when I put into its enclosing environment?
foo <- function(fun) {
env_closure <- rlang::fn_env(fun)
env_closure$x <- 5
fun()
}
bar <- function(x) x
foo(bar)
#> Error in fun(): argument "x" is missing, with no default
Ah, I think I got it down now.
It has to do with the structure of a function's formal arguments:
If an argument is defined without a default value, R will complain when you call the function without specifiying that even though it might technically be able to look it up in its scope.
One way to kick off lexical scoping even though you don't want to define a default value would be to set the defaults "on the fly" at run time via rlang::fn_fmls().
foo <- function(fun) {
env_enclosing <- rlang::fn_env(fun)
env_enclosing$x <- 5
fun()
}
# No argument at all -> lexical scoping takes over
baz <- function() x
foo(baz)
#> [1] 5
# Set defaults to desired values on the fly at run time of `foo()`
foo <- function(fun) {
env_enclosing <- rlang::fn_env(fun)
env_enclosing$x <- 5
fmls <- rlang::fn_fmls(fun)
fmls$x <- substitute(get("x", envir = env_enclosing, inherits = FALSE))
rlang::fn_fmls(fun) <- fmls
fun()
}
bar <- function(x) x
foo(bar)
#> [1] 5
I can't really follow your example as I am unfamiliar with the rlang library but I think a good example of a closure in R would be:
bucket <- function() {
n <- 1
foo <- function(x) {
assign("n", n+1, envir = parent.env(environment()))
n
}
foo
}
bar <- bucket()
Because bar() is define in the function environment of bucket then its parent environment is bucket and therefore you can carry some data there. Each time you run it you modify the bucket environment:
bar()
[1] 2
bar()
[1] 3
bar()
[1] 4
Imagine you have a simple function that specifies which statistical tests to run for each variable. Its syntax, simplified for the purposes of this question is as follows:
test <- function(...) {
x <- list(...)
return(x)
}
which takes argument pairs such as Gender = 'Tukey', and intends to pass its result to other functions down the line. The output of test() is as follows:
test(Gender = 'Tukey')
# $Gender
# [1] "Tukey"
What is desired is the ability to replace the literal Gender by a dynamically assigned variable varname (e.g., for looping purposes). Currently what happens is:
varname <- 'Gender'
test(varname = 'Tukey')
# $varname
# [1] "Tukey"
but what is desired is this:
varname <- 'Gender'
test(varname = 'Tukey')
# $Gender
# [1] "Tukey"
I tried tinkering with functions such as eval() and parse(), but to no avail. In practice, I resolved the issue by simply renaming the resulting list, but it is an ugly solution and I am sure there is an elegant R way to achieve it. Thank in advance for the educational value of your answer.
NB: This question occurred to me while trying to program a custom function which uses mcp() from the effects package in its internals. The said mcp() function is the real world counterpart of test().
EDIT1: Perhaps it needs to be clarified that (for educational purposes) changing test() is not an option. The question is about how to pass the tricky argument to test(). If you take a look at NB, it becomes clear why: the real world counterpart of test(), namely mcp(), comes with a package. And while it is possible to create a modified copy of it, I am really curious whether there exists a simple solution in somehow 'converting' the dynamically assigned variable to a literal in the context of dot-arguments.
This works:
test <- function(...) {
x = list(...)
names(x) <- sapply(names(x),
function(p) eval(as.symbol(p)))
return(x)
}
apple = "orange"
test(apple = 5)
We can use
test <- function(...) {
x <- list(...)
if(exists(names(x))) names(x) <- get(names(x))
x
}
test(Gender = 'Tukey')
#$Gender
#[1] "Tukey"
test(varname = 'Tukey')
#$Gender
#[1] "Tukey"
What about this:
varname <- "Gender"
args <- list()
args[[varname]] <- "Tukey"
do.call(test, args)
I am looking for advice on the best way to code passing a list of functions as argument.
What I want to do:
I would like to pass as an argument a list a functions to apply them to a specific input. And give output name based on those.
A "reproducible" example
input = 1:5
Functions to pass are mean, min
Expected call:
foo(input, something_i_ask_help_for)
Expected output:
list(mean = 3, min = 1)
If it's not perfectly clear, please see my two solutions to have an illustration.
Solution 1: Passing functions as arguments
foo <- function(input, funs){
# Initialize output
output = list()
# Compute output
for (fun_name in names(funs)){
# For each function I calculate it and store it in output
output[fun_name] = funs[[fun_name]](input)
}
return(output)
}
foo(1:5, list(mean=mean, min=min))
What I don't like with this method is that we can't call it by doing: foo(1:5, list(mean, min)).
Solution 2: passing functions names as argument and using get
foo2 <- function(input, funs){
# Initialize output
output = list()
# Compute output
for (fun in funs){
# For each function I calculate it and store it in output
output[fun] = get(fun)(input)
}
return(output)
}
foo2(1:5, c("mean", "min"))
What i don't like with this method is that we are not really passing the function-object as argument.
My question:
Both ways works, but I not quite sure which one to choose.
Could you help me by:
Telling me which one is the best?
What are the advantages and defaults of each methods?
Is there a another (better) method
If you need any more information, don't hesitate to ask.
Thanks!
Simplifying solutions in question
The first of the solutions in the question requires that the list be named and the second requires that the functions have names which are passed as character strings. Those two user interfaces could be implemented using the following simplifications. Note that we add an envir argument to foo2 to ensure function name lookup occurs as expected. Of those the first seems cleaner but if the functions were to be used interactively and less typing were desired then the second does do away with having to specify the names.
foo1 <- function(input, funs) Map(function(f) f(input), funs)
foo1(1:5, list(min = min, max = max)) # test
foo2 <- function(input, nms, envir = parent.frame()) {
Map(function(nm) do.call(nm, list(input), envir = envir), setNames(nms, nms))
}
foo2(1:5, list("min", "max")) # test
Alternately we could build foo2 on foo1:
foo2a <- function(input, funs, envir = parent.frame()) {
foo1(input, mget(unlist(funs), envir = envir, inherit = TRUE))
}
foo2a(1:5, list("min", "max")) # test
or base the user interface on passing a formula containing the function names since formulas already incorporate the notion of environment:
foo2b <- function(input, fo) foo2(input, all.vars(fo), envir = environment(fo))
foo2b(1:5, ~ min + max) # test
Optional names while passing function itself
However, the question indicates that it is preferred that
the functions themselves be passed
names are optional
To incorporate those features the following allows the list to have names or not or a mixture. If a list element does not have a name then the expression defining the function (usually its name) is used.
We can derive the names from the list's names or when a name is missing we can use the function name itself or if the function is anonymous and so given as its definition then the name can be the expression defining the function.
The key is to use match.call and pick it apart. We ensure that funs is a list in case it is specified as a character vector. match.fun will interpret functions and character strings naming functions and look them up in the parent frame so we use a for loop instead of Map or lapply in order that we not generate a new function scope.
foo3 <- function(input, funs) {
cl <- match.call()[[3]][-1]
nms <- names(cl)
if (is.null(nms)) nms <- as.character(cl)
else nms[nms == ""] <- as.character(cl)[nms == ""]
funs <- as.list(funs)
for(i in seq_along(funs)) funs[[i]] <- match.fun(funs[[i]])(input)
setNames(funs, nms)
}
foo3(1:5, list(mean = mean, min = "min", sd, function(x) x^2))
giving:
$mean
[1] 3
$min
[1] 1
$sd
[1] 1.581139
$`function(x) x^2`
[1] 1 4 9 16 25
One thing that you are missing is replacing the for loops with lapply. Also for functional programming it is often good practice to separate functions to do one thing. I personally like the version from solution 1 where you pass the functions in directly because it avoids another call in R and therefore is more efficient. In solution 2, it is best to use match.fun instead of get. match.fun is stricter than get in searching for functions.
x <- 1:5
foo <- function(input, funs) {
lapply(funs, function(fun) fun(input))
}
foo(x, c(mean=mean, min=min))
The above code simplifies your solution 1. To add to this function, you could add some error handling such as is.numeric for x and is.function for funs.
It seems possible to assign a vector of functions in R like this:
F <- c(function(){return(0)},function(){return(1)})
so that they can be invoked like this (for example): F[[1]]().
This gave me the impression I could do this:
DF <- data.frame(F=c(function(){return(0)}))
which results in the following error
Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot
coerce class ""function"" to a data.frame
Does this mean it is not possible to put functions into a data frame? Or am I doing something wrong?
No, you cannot directly put a function into a data-frame.
You can, however, define the functions beforehand and put their names in the data frame.
foo <- function(bar) { return( 2 + bar ) }
foo2 <- function(bar) { return( 2 * bar ) }
df <- data.frame(c('foo', 'foo2'), stringsAsFactors = FALSE)
Then use do.call() to use the functions:
do.call(df[1, 1], list(4))
# 6
do.call(df[2, 1], list(4))
# 8
EDIT
The above work around will work as long as you have a named function.
The issue seems to be that R see's the class of the object as a function, looks up the appropriate method for as.data.frame() (i.e. as.data.frame.function()) but can't find it. That causes a call to as.data.frame.default() which pretty must is a wrapper for a stop() call with the message you reported.
In short, they just seem not to have implemented it for that class.
While you can't put a function or other object directly into a data.frame, you can make it work if you go via a matrix.
foo <- function() {print("qux")}
m <- matrix(c("bar", foo), nrow=1, ncol=2)
df <- data.frame(m)
df$X2[[1]]()
Yields:
[1] "qux"
And the contents of df look like:
X1 X2
1 bar function () , {, print("qux"), }
Quite why this works while the direct path does not, I don't know. I suspect that doing this in any production code would be a "bad thing".
I'm working on an R package that has a number of functions that follow a non-R-standard practice of modifying in place the object passed in as an argument. This normally works OK, but fails when the object to be modified is on a list.
An function to give an example of the form of the assignments:
myFun<-function(x){
xn <- deparse(substitute(x))
ev <- parent.frame()
# would do real stuff here ..
# instead set simple value to modify local copy
x[[1]]<-"b"
# assign in parent frame
if (exists(xn, envir = ev))
on.exit(assign(xn, x, pos = ev))
# return invisibly
invisible(x)
}
This works:
> myObj <-list("a")
> myFun(myObj)
> myObj
[[1]]
[1] "b"
But it does not work if the object is a member of a list:
> myObj <-list("a")
> myList<-list(myObj,myObj)
> myFun(myList[[1]])
> myList
[[1]]
[[1]][[1]]
[1] "a"
[[2]]
[[2]][[1]]
[1] "a"
After reading answers to other questions here, I see the docs for assign clearly state:
assign does not dispatch assignment methods, so it cannot be used to set elements of vectors, names, attributes, etc.
Since there is an existing codebase using these functions, we cannot abandon the modify-in-place syntax. Does anyone have suggestions for workarounds or alternative approaches for modifying objects which are members of a list in a parent frame?
UPDATE:
I've considered trying to roll my own assignment function, something like:
assignToListInEnv<-function(name,env,value){
# assume name is something like "myList[[1]]"
#check for brackets
index<-regexpr('[[',name,fixed=TRUE)[1]
if(index>0){
lname<-substr(name,0,index-1)
#check that it exists
if (exists(lname,where=env)){
target<-get(lname,pos=env)
# make sure it is a list
if (is.list(target)){
eval(parse(text=paste('target',substr(name,index,999),'<-value',sep='')))
assign(lname, target, pos = env)
} else {
stop('object ',lname,' is not a list in environment ',env)
}
} else {
stop('unable to locate object ',lname,' in frame ',env)
}
}
}
But it seems horrible brittle, would need to handle many more cases ($ and [ as well as [[) and would probably still fail for [[x]] because x would be evaluated in the wrong frame...
Since it was in the first search results to my query, here's my solution :
You can use paste() with "<<-" to create an expression which will assign the value to your list element when evaluated.
assignToListInEnv<-function(name, value, env = parent.frame()){
cl <- as.list(match.call())
lang <- str2lang(paste(cl["name"], "<<-", cl["value"]))
eval(lang, envir = env)
}
EDIT : revisiting this answer because it got a vote up
I'm not sure why I used <<- instead of <-. If using the 'env' argument, <<-with assign to the parent.frame of that env.
So if you always want it to be the first parent.frame it can just be :
assignToListInParentFrame<-function(name, value){
cl <- as.list(match.call())
paste(cl["name"], "<<-", cl["value"]) |>
str2lang() |>
eval()
}
and if you want to precise in which env to modify the list :
assignToListInEnv<-function(name, value, env){
cl <- as.list(match.call())
paste(cl["name"], "<-", cl["value"]) |>
str2lang() |>
eval(envir = env)
}