I've got a counter function that I like to wrap around another function ("fun") to help keep track of how many times I've called it. I keep track of the calls by creating a new environment "counter.env" if it doesn't already exist and storing the count there.
counter <- function(fun) {
if (!exists("counter.env", envir = .GlobalEnv)) {
counter.env <<- new.env(parent = globalenv())
assign("i", 0, envir = counter.env)
}
function(...) {
local(i <- i+1, env = counter.env)
fun(...)
}
}
Also I have a function "get_calls" which is simply a call to get the count from the environment. I'd like it to run a 0 in case the user calls this before the actual function they're calling, for whatever reason they'd do this.
get_calls <- function() {
if (!exists("counter.env", envir = .GlobalEnv)) {
counter.env <<- new.env(parent = .GlobalEnv)
assign("i", 0, envir = counter.env)
}
get("i", envir = counter.env)
}
Finally lets say the function I'm wrapping is a function with its own argument, "fun(arg1)". So I wrap it.
count.and.call <- counter(fun)
And I call it like this:
count.and.call(arg1)
Immediately "counter.env" is created in my global environment and I can return the call with get_calls.
Now, drum roll When I put these functions in a package, and I build the package, and run
count.and.call(arg1)
the counter.env is not created in the global env. and it shows
error in eval(quote(i <- i + 1), counter.env) :
object 'counter.env' not found
My immediate concern is to fix my counter, which is probably something to do with the environment scoping.
However I am also not sure if I have used the best practices for my counter function, if so, could I get some advice?
The best practice is that your package should not meddle with the global environment. If you want to store state, create an environment for it in your package's namespace. You don't even have to specify the location yourself, it happens automatically by default.
In a source file:
counter.env <- new.env()
# this gets run every time your package is loaded
.onLoad <- function(libname, pkgname)
{
counter.env$i <- 0
}
counter <- function(fun)
{
# do stuff...
counter.env$i <- counter.env$i + 1
}
reset_counter <- function()
{
counter.env$i <- 0
}
# necessary if you want the user to see the counter and you don't export counter.env
get_counter <- function()
{
counter.env$i
}
Another way very R-ish way to do this is using closures. For example:
countingFun <- function(fun) {
count <- 0
function(x) {
count <<- count + 1
fun(x)
}
}
count <- function(fun) {
environment(fun)$count
}
This keeps the count in the environment of the function, which is created automatically, containing all the variables that are local to the call to countingFun. Then you can do
myMean <- countingFun(mean)
mySd <- countingFun(sd)
myMean(x)
mySd(x)
myMean(x)
count(myMean) # 2
count(mySd) # 1
You might want to add some error checking to count, to make sure it isn't being called on a function that isn't being counted.
Related
I am doing some heavy computations which I would like to speed up by performing it in a parallel loop. Moreover, I want the result of each calculation to be assigned to the global environment based on the name of the data currently processed:
fun <- function(arg) {
assign(arg, arg, envir = .GlobalEnv)
}
For loop
In a simple for loop, that would be the following and this works just fine:
for_fun <- function() {
data <- letters[1:10]
for(i in 1:length(data)) {
dat <- quote(data[i])
call <- call("fun", dat)
eval(call)
}
}
# Works as expected
for_fun()
In this function, I first get some data, loop over it, quote it (although not necessary) to be used in a function call. In reality, this function name is also dynamic which is why I am doing it this way.
Foreach
Now, I want to speed this up. My first thought was to use the foreach package (with a doParallel backend):
foreach_fun <- function() {
# Set up parallel backend
cl <- parallel::makeCluster(parallel::detectCores())
doParallel::registerDoParallel(cl)
data <- letters[1:10]
foreach(i = 1:length(data)) %dopar% {
dat <- quote(data[i])
call <- call("fun", dat)
eval(call)
}
# Stop the parallel backend
parallel::stopCluster(cl)
doParallel::stopImplicitCluster()
}
# Error in { : task 1 failed - "could not find function "fun""
foreach_fun()
Replacing the whole quote-call-eval procedure with simply fun(data[i]) resolves the error but still nothing gets assigned.
Future
To ensure it wasn't a problem with the foreach package, I also tried the future package (although I am not familiar with it).
future_fun <- function() {
# Plan a parallel future
cl <- parallel::makeCluster(parallel::detectCores())
future::plan(cluster, workers = cl)
data <- letters[1:10]
# Create an explicit future
future(expr = {
for(i in 1:length(data)) {
dat <- quote(data[i])
call <- call("fun", dat)
eval(call)
}
})
# Stop the parallel future
parallel::stopCluster(cl)
future::plan(sequential)
}
# No errors but nothing assigned
# probably the future was never evaluated
future_fun()
Forcing the future to be evaluated (f <- future(...); value(f)) triggers the same error as by using foreach: Error in { : task 1 failed - "could not find function "fun""
Summary
In short, my questions are:
How do you assign variables to the global environment in a parallel loop?
Why does the function lookup fail?
The following code is supposed to change the value of the enclosed variable some.var calling the function set.var. The latter calls itself the outer function g, whose environment is changed to the parent environment of set.var
new.obj <- function(){
some.var = NULL
set.var <- function(...) {
environment(g) <- parent.frame()
g(x="some.var", ...)
}
get.var <- function(){some.var}
return(list(set.var=set.var, get.var=get.var))
}
g <- function(x) assign(x,1)
However, obj<-new.obj(); obj$set.var(); obj$get.var() returns NULL. What goes wrong here and how can the behaviour be fixed?
The reason why I am considering this construction is, that I would like to reuse the code within g in different closures. Hence, it should be placed outside of these.
I suspect that you are making things more complicated than necessary. Take a look at this example and comment if you need to achieve something that this cannot do:
do_the_job <- function(x) {
return(x * 1000)
}
wrapper_function <- function(y) {
return( do_the_job(y) )
}
my_value <- 5
my_new_value <- wrapper_function(my_value)
Not entirely sure what I'm doing here myself.
I hope it helps/inspires you:
new.obj <- function(){
env1 <- new.env()
env1$some.var = NULL
f <- function() {
environment(g) <- parent.frame()
g(x="some.var", envir = env1)
}
get.var <- function(){ env1$some.var }
return(list(f=f, get.var=get.var))
}
g <- function(x, ...) assign(x, "hihi_changed", envir = ...)
obj<-new.obj(); obj$get.var() null is returned as we expect.
obj$f(); obj$get.var() function g is called eventually that changes some.var.
My trick is to add the variable some.var to a new environment and always refer to that env1 environment.
So in function g(), always use the ellipsis to refer to the new.obj environment env1, where currently some.var lives.
Hope this keeps you going.
I'm having a little trouble understanding why, in R, the two functions below, functionGen1 and functionGen2 behave differently. Both functions attempt to return another function which simply prints the number passed as an argument to the function generator.
In the first instance the generated functions fail as a is no longer present in the global environment, but I don't understand why it needs to be. I would've thought it was passed as an argument, and is replaced with aNumber in the namespace of the generator function, and the printing function.
My question is: Why do the functions in the list list.of.functions1 no longer work when a is not defined in the global environment? (And why does this work for the case of list.of.functions2 and even list.of.functions1b)?
functionGen1 <- function(aNumber) {
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
functionGen2 <- function(aNumber) {
thisNumber <- aNumber
printNumber <- function() {
print(thisNumber)
}
return(printNumber)
}
list.of.functions1 <- list.of.functions2 <- list()
for (a in 1:2) {
list.of.functions1[[a]] <- functionGen1(a)
list.of.functions2[[a]] <- functionGen2(a)
}
rm(a)
# Throws an error "Error in print(aNumber) : object 'a' not found"
list.of.functions1[[1]]()
# Prints 1
list.of.functions2[[1]]()
# Prints 2
list.of.functions2[[2]]()
# However this produces a list of functions which work
list.of.functions1b <- lapply(c(1:2), functionGen1)
A more minimal example:
functionGen1 <- function(aNumber) {
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
a <- 1
myfun <- functionGen1(a)
rm(a)
myfun()
#Error in print(aNumber) : object 'a' not found
Your question is not about namespaces (that's a concept related to packages), but about variable scoping and lazy evaluation.
Lazy evaluation means that function arguments are only evaluated when they are needed. Until you call myfun it is not necessary to evaluate aNumber = a. But since a has been removed then, this evaluation fails.
The usual solution is to force evaluation explicitly as you do with your functionGen2 or, e.g.,
functionGen1 <- function(aNumber) {
force(aNumber)
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
a <- 1
myfun <- functionGen1(a)
rm(a)
myfun()
#[1] 1
I have a variable in my global environment called myList. I have a function that modifies myList and re-assigns it to the global environment called myFunction. I only want myList to be modified by myFunction. Is there a way to prevent any other function from modifying myList?
For background, I am building a general tool for R users. I don't want users of the tool to be able to define their own function to modify myList. I also don't want to myself to be able to modify myList with a function I may write in the future.
I have a potential solution, but I don't like it. When the tool is executed, I could examine the text of every function defined by a user and search for the text that will assign myList to the global environment. I don't like the fact that I need to search over all functions.
Does anyone know if what I am looking for is implementable in R? Thanks for any help that can be provided.
For a reproducible example. I need code that will make the following example possible:
assign('myList', list(), envir = globalenv())
myFunction <- function() {
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
}
userFunction <- function() {
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
}
myFunction() # I need some code that will allow this function to run successfully
userFunction() # and cause an error when this function runs
Sounds like you need the modules package.
Basically, each unit of code has its own scope.
e.g.
# install.packages("modules")
# Load library
library("modules")
# Create a basic module
m <- module({
.myList <- list()
myFunction <- function() {
.myList <<- c(.myList, 'test')
}
get <- function() .myList
})
# Accessor
m$get()
# list()
# Your function
m$myFunction()
# Modification
m$get()
# [[1]]
# [1] "test"
Note, we tweaked the example slightly by changing the variable name to .myList from myList. So, we'll need to update that in the userfunction()
userFunction <- function() {
.myList <- c(.myList, 'test')
}
Running this, we now get:
userFunction()
# Error in userFunction() : object '.myList' not found
As desired.
For more detailed examples see modules vignette.
The alternative is you can define an environment (new.env()) and then lock it after you have loaded myList.
This is all around a bad idea. Beginning with assignment into the global environment (I'd never use a package that does this) to surprising your users. You should probably just use S4 or reference classes.
Anyway, you can lock the bindings (or environment if you followed better practices). You wouldn't stop an advanced user with that, but they would at least know that you don't want them to change the object.
createLocked <- function(x, name, env) {
assign(name, x, envir = env)
lockBinding(name, env)
invisible(NULL)
}
createLocked(list(), "myList", globalenv())
myFunction <- function() {
unlockBinding("myList", globalenv())
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
lockBinding("myList", globalenv())
invisible(NULL)
}
userFunction <- function() {
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
}
myFunction() # runs successfully
userFunction()
#Error in assign("myList", myList, envir = globalenv()) :
# cannot change value of locked binding for 'myList'
In my .Rprofile I have the following two lines defined in my .First
makeActiveBinding(".refresh", function() { system("R"); q("no") }, .GlobalEnv)
makeActiveBinding('.rm', function() {rm(list=ls(envir = .GlobalEnv),envir=.GlobalEnv); gc()}, .GlobalEnv)
They're usually harmless, unless I type them by accident! The first makes a .refresh function that will quit and restart the R session. The second empties the global environment. However, when using the tables() function from data.table these two functions are run which isn't exactly desirable.
For the moment, I've removed them from my .First but I'm curious if there is a way to avoid this. The offending lines in the tables() function are:
tt = objects(envir = env, all.names = TRUE)
ss = which(as.logical(sapply(tt, function(x) is.data.table(get(x,
envir = env)))))
I think you just discovered a downside to using active bindings in that way. Why don't you instead create ordinary functions .rm and .refresh, that you call in the usual way (i.e. .rm() and .refresh()), and which won't be executed upon simple inspection?
Here's what part of your .First might then look like:
.First <- function() {
assign(".rm",
function() {rm(list=ls(envir=.GlobalEnv), envir=.GlobalEnv)},
pos = .GlobalEnv)
}
## Try it out
j <- 1:10
ls()
.First()
.rm()
ls()
Edit, with solution:
On further thought, this seems to work, only executing the core bits when .rm is 'called' directly. It works by inspecting the length of the call stack, and only running rm(...) if there is just one call in it (representing the current call to .rm(). If .rm is called/touched by a call to some other function (e.g. tables()), the call stack will be longer, and rm() won't be executed.:
makeActiveBinding('.rm',
function() {
if(length(sys.calls())==1) {
rm(list=ls(envir = .GlobalEnv),envir=.GlobalEnv); gc()
}
},
.GlobalEnv)
## Try _it_ out
library(data.table)
j <- 100
.rm
ls()
j <- 100
tables()
ls()