Here is a toy example to illustrate my problem.
library(foreach)
library(doMC)
registerDoMC(cores=2)
foreach(i = 1:2) %dopar%{
i + 2
}
[[1]]
[1] 3
[[2]]
[1] 4
So far so good...
But if the code i + 2 is saved in the file addition.R and that I call that file using source() then
> foreach(i = 1:2) %dopar%{
+ source("addition.R")
+ }
Error in { : task 1 failed - "object 'i' not found"
I cannot fully reproduce your toy, but I had a smiliar problem, which I was able to solve by:
source(file, local = TRUE)
which should parse the source in the local environment, i.e. recognizing i.
The comment by NiceE and the answer by Sosel already address this; when calling source(file) it defaults to source(file, local = FALSE), which means that the code in the file sourced is evaluating in the global environment ("user's workspace") and there is, cf. ?source. Note that there is no variable i in the global environment. The solution is to make sure the file sourced in the environment that calls it, i.e. to use source(file, local = TRUE).
Solution:
library("foreach")
y <- foreach(i = 1:2) %dopar% {
i + 2
}
str(y)
doMC::registerDoMC(cores = 2L)
y <- foreach(i = 1:2) %dopar% {
source("addition.R", local = TRUE)
}
str(y)
Example of the same problem with a for() loop:
The fact that source() is evaluated in the global environment which is different from the calling environment where i lives can also be illustrated using a regular for loop by running the for loop in another environment than the global, e.g. inside a function or by:
local({
for(i in 1:2) {
source("addition.R")
}
})
which gives:
Error in eval(ei, envir) : object 'i' not found
Now, the reason why the above foreach(i = 1:2) %dopar% { source("addition.R") } works with registerDoSEQ() if and only if called from the global environment, is that then the foreach iteration is evaluated in the calling environment, which is the global environment, which is the environment that source() uses. However, if one used local(foreach(i = 1:2) %dopar% { ... }) also this fails analoguously to the above local(for(i in 1:2) { ... }) call.
In conclusion: nothing magic happens, but to understand it is a bit tedious.
I finally solved the problem by converting the source("addition.R") to a function and simply passing the variables into it. I don't know why but the suggested solutions based on source(file, local = TRUE) does not work.
Related
I am doing some heavy computations which I would like to speed up by performing it in a parallel loop. Moreover, I want the result of each calculation to be assigned to the global environment based on the name of the data currently processed:
fun <- function(arg) {
assign(arg, arg, envir = .GlobalEnv)
}
For loop
In a simple for loop, that would be the following and this works just fine:
for_fun <- function() {
data <- letters[1:10]
for(i in 1:length(data)) {
dat <- quote(data[i])
call <- call("fun", dat)
eval(call)
}
}
# Works as expected
for_fun()
In this function, I first get some data, loop over it, quote it (although not necessary) to be used in a function call. In reality, this function name is also dynamic which is why I am doing it this way.
Foreach
Now, I want to speed this up. My first thought was to use the foreach package (with a doParallel backend):
foreach_fun <- function() {
# Set up parallel backend
cl <- parallel::makeCluster(parallel::detectCores())
doParallel::registerDoParallel(cl)
data <- letters[1:10]
foreach(i = 1:length(data)) %dopar% {
dat <- quote(data[i])
call <- call("fun", dat)
eval(call)
}
# Stop the parallel backend
parallel::stopCluster(cl)
doParallel::stopImplicitCluster()
}
# Error in { : task 1 failed - "could not find function "fun""
foreach_fun()
Replacing the whole quote-call-eval procedure with simply fun(data[i]) resolves the error but still nothing gets assigned.
Future
To ensure it wasn't a problem with the foreach package, I also tried the future package (although I am not familiar with it).
future_fun <- function() {
# Plan a parallel future
cl <- parallel::makeCluster(parallel::detectCores())
future::plan(cluster, workers = cl)
data <- letters[1:10]
# Create an explicit future
future(expr = {
for(i in 1:length(data)) {
dat <- quote(data[i])
call <- call("fun", dat)
eval(call)
}
})
# Stop the parallel future
parallel::stopCluster(cl)
future::plan(sequential)
}
# No errors but nothing assigned
# probably the future was never evaluated
future_fun()
Forcing the future to be evaluated (f <- future(...); value(f)) triggers the same error as by using foreach: Error in { : task 1 failed - "could not find function "fun""
Summary
In short, my questions are:
How do you assign variables to the global environment in a parallel loop?
Why does the function lookup fail?
I have written a function that sources files that contain scripts for other functions and stores these functions in an alternative environment so that they aren't cluttering up the global environment. The code works, but contains three instances of eval(parse(...)):
# sourceFunctionHidden ---------------------------
# source a function and hide the function from the global environment
sourceFunctionHidden <- function(functions, environment = "env", ...) {
if (environment %in% search()) {
while (environment %in% search()) {
if (!exists("counter", inherits = F)) counter <- 0
eval(parse(text = paste0("detach(", environment, ")")))
counter <- counter + 1
}
cat("detached", counter, environment, "s\n")
} else {cat("no", environment, "attached\n")}
if (!environment %in% ls(.GlobalEnv, all.names = T)) {
assign(environment, new.env(), pos = .GlobalEnv)
cat("created", environment, "\n")
} else {cat(environment, "already exists\n")}
sapply(functions, function(func) {
source(paste0("C:/Users/JT/R/Functions/", func, ".R"))
eval(parse(text = paste0(environment, "$", func," <- ", func)))
cat(func, "created in", environment, "\n")
})
eval(parse(text = paste0("attach(", environment, ")")))
cat("attached", environment, "\n\n")
}
Much has been written about the sub-optimality of the eval(parse(...)) construction (see here and here). However, the discussions that I've found mostly deal with alternate strategies for subsetting. The first and third instances of eval(parse(...)) in my code don't involve subsetting (the second instance might be related to subsetting).
Is there a way to call new.env(...), [environment name]$[function name] <- [function name], and attach(...) without resorting to eval(parse(...))? Thanks.
N.B.: I don't want to change the names of my functions to .name to hide them in the global environment
For what its worth, the function source actually uses eval(parse(...)), albeit in a somewhat subtle way. First, .Internal(parse(...)) is used to create expressions, which after more processing are later passed to eval. So eval(parse(...)) seems to be good enough for the R core team in this instance.
That said, you don't need to jump through hoops to source functions into a new environment. source provides an argument local that can be used for precisely this.
local: TRUE, FALSE or an environment, determining where the parsed expressions are evaluated.
An example:
env = new.env()
source('test.r', local = env)
testing it works:
env$test('hello', 'world')
# [1] "hello world"
ls(pattern = 'test')
# character(0)
And an example test.r file to use this on:
test = function(a,b) paste(a,b)
If you want to keep it off global_env, put it into a package. It's common for people in the R community to put a bunch of frequently used helper functions into their own personal package.
tl;dr: The right way to convert quoted strings to object names is to use assign() and get(). See this post.
The long answer: The answer from #dww about being able to source() directly to a specific environment led me to change the second instance of eval(parse(...)) as follows:
# old version
source(paste0("C:/Users/JT/R/Functions/", func, ".R"))
eval(parse(text = paste0(environment, "$", func," <- ", func)))
# new version
source(
paste0("C:/Users/JT/R/Functions/", func, ".R"),
local = get(environment)
)
The answer from #dww also got me to exploring attach(). attach() has an argument that allows specification of the environment to which to direct the output. This led me to change the third instance of eval(parse(...)) (below). Note the use of get() to convert the "env" that comes from environment to the unquoted env that attach() requires.
# old version
eval(parse(text = paste0("attach(", environment, ")")))
# new version
attach(get(environment), name = environment)
Finally, at some point in this process I was reminded that rm() has a character.only argument. detach() accepts the same argument, so I changed the second instance of eval(parse()) as below:
# old version
eval(parse(text = paste0("detach(", environment, ")")))
# new version
detach(environment, character.only = T)
So my new code is:
# sourceFunctionHidden ---------------------------
# source a function and hide the function from the global environment
sourceFunctionHidden <- function(functions, environment = "env", ...) {
if (environment %in% search()) {
while (environment %in% search()) {
if (!exists("counter", inherits = F)) counter <- 0
detach(environment, character.only = T)
counter <- counter + 1
}
cat("detached", counter, environment, "s\n")
} else {cat("no", environment, "attached\n")}
if (!environment %in% ls(.GlobalEnv, all.names = T)) {
assign(environment, new.env(), pos = .GlobalEnv)
cat("created", environment, "\n")
} else {cat(environment, "already exists\n")}
sapply(functions, function(func) {
source(
paste0("C:/Users/JT/R/Functions/", func, ".R"),
local = get(environment)
)
cat(func, "created in", environment, "\n")
})
attach(get(environment), name = environment)
cat("attached", environment, "\n\n")
}
I am trying to get my head around the Snowfall library and its usage.
Having writing a simulation that makes use of environments, I encountered the following issue. If I source a file to load functions within the parallel mode, the function seems to use a different environment than when I declare the function within parallel mode direclty.
To make things a little bit more clear, lets consider the following two scripts:
q_func.R declares the function
foo.bar <- function(x, envname) assign("val", x, envir = get(envname))
# assigns the value x to the variable "val" in the environment envname
q_snowfall.R main function that uses snowfall
library(snowfall)
SnowFunc <- function(envname) {
# load the functions
# Option 1 not working
source("q_func.R")
# Option 2 working...
# foo.bar <- function(x, envname) assign("val", x, envir = get(envname))
# create the new environment
assign(envname, new.env())
# use the function as declared in q_func.R
# to assign random numbers to the new env
foo.bar(x = rnorm(1), envname = envname)
# return the environment including the random values
return(get("val", envir = get(envname)))
}
sfInit(parallel = TRUE, cpus = 2)
# create environment 'a' and 'b' that each will get a new variable
# called 'val' that gets assigned a random value
envs <- c("a", "b")
result <- sfClusterApplyLB(envs, SnowFunc)
sfStop()
If I execute the script "q_snowfall.R" I get the error
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: object 'a' not found
However, if I use the second option (declaring the function within the SnowFunc-function the error disappears.
Do you know how Snowfall handles the different environments? Or do you even have a solution for the issue. (note that 'q_func.R' actually takes some 100 lines of code, therefore I would prefer to have it in a separate file, thus the "keep option 2" is not a solution!)
Thank you very much!
Edit
If I change all get(envname) to get(envname, envir = globalenv()) it seems to work. But it seems to me that this is more or less a workaround and not a very snowfall-like solution.
I think the issue is not with snowfall but with the fact that you're passing the environment by name (as character). You don't need to change all occurences of get, and having it look in globalEnv may indeed be unsafe.
It is sufficient to change the get call in foo.bar to look in parent.frame() instead (i.e., the environment from which foo.bar was called). The following worked on my machine.
new q_func.R
foo.bar <- function(x, envname) assign("val", x, envir=get(envname,
pos=parent.frame()))
(not so) new q_snowfall.R
library(snowfall)
SnowFunc <- function(envname) {
assign(envname, new.env())
foo.bar(x = rnorm(1), envname = envname)
return(get("val", envir = get(envname)))
}
source("q_func.R")
sfInit(parallel = TRUE, cpus = 2)
sfExport("foo.bar")
envs <- c("a", "b")
result <- sfClusterApplyLB(envs, SnowFunc)
sfStop()
Note also that I source'd before starting the cluster and used sfExport to export foo.bar to each node.
Here is a toy example to illustrate my problem.
library(foreach)
library(doMC)
registerDoMC(cores=2)
foreach(i = 1:2) %dopar%{
i + 2
}
[[1]]
[1] 3
[[2]]
[1] 4
So far so good...
But if the code i + 2 is saved in the file addition.R and that I call that file using source() then
> foreach(i = 1:2) %dopar%{
+ source("addition.R")
+ }
Error in { : task 1 failed - "object 'i' not found"
I cannot fully reproduce your toy, but I had a smiliar problem, which I was able to solve by:
source(file, local = TRUE)
which should parse the source in the local environment, i.e. recognizing i.
The comment by NiceE and the answer by Sosel already address this; when calling source(file) it defaults to source(file, local = FALSE), which means that the code in the file sourced is evaluating in the global environment ("user's workspace") and there is, cf. ?source. Note that there is no variable i in the global environment. The solution is to make sure the file sourced in the environment that calls it, i.e. to use source(file, local = TRUE).
Solution:
library("foreach")
y <- foreach(i = 1:2) %dopar% {
i + 2
}
str(y)
doMC::registerDoMC(cores = 2L)
y <- foreach(i = 1:2) %dopar% {
source("addition.R", local = TRUE)
}
str(y)
Example of the same problem with a for() loop:
The fact that source() is evaluated in the global environment which is different from the calling environment where i lives can also be illustrated using a regular for loop by running the for loop in another environment than the global, e.g. inside a function or by:
local({
for(i in 1:2) {
source("addition.R")
}
})
which gives:
Error in eval(ei, envir) : object 'i' not found
Now, the reason why the above foreach(i = 1:2) %dopar% { source("addition.R") } works with registerDoSEQ() if and only if called from the global environment, is that then the foreach iteration is evaluated in the calling environment, which is the global environment, which is the environment that source() uses. However, if one used local(foreach(i = 1:2) %dopar% { ... }) also this fails analoguously to the above local(for(i in 1:2) { ... }) call.
In conclusion: nothing magic happens, but to understand it is a bit tedious.
I finally solved the problem by converting the source("addition.R") to a function and simply passing the variables into it. I don't know why but the suggested solutions based on source(file, local = TRUE) does not work.
In my .Rprofile I have the following two lines defined in my .First
makeActiveBinding(".refresh", function() { system("R"); q("no") }, .GlobalEnv)
makeActiveBinding('.rm', function() {rm(list=ls(envir = .GlobalEnv),envir=.GlobalEnv); gc()}, .GlobalEnv)
They're usually harmless, unless I type them by accident! The first makes a .refresh function that will quit and restart the R session. The second empties the global environment. However, when using the tables() function from data.table these two functions are run which isn't exactly desirable.
For the moment, I've removed them from my .First but I'm curious if there is a way to avoid this. The offending lines in the tables() function are:
tt = objects(envir = env, all.names = TRUE)
ss = which(as.logical(sapply(tt, function(x) is.data.table(get(x,
envir = env)))))
I think you just discovered a downside to using active bindings in that way. Why don't you instead create ordinary functions .rm and .refresh, that you call in the usual way (i.e. .rm() and .refresh()), and which won't be executed upon simple inspection?
Here's what part of your .First might then look like:
.First <- function() {
assign(".rm",
function() {rm(list=ls(envir=.GlobalEnv), envir=.GlobalEnv)},
pos = .GlobalEnv)
}
## Try it out
j <- 1:10
ls()
.First()
.rm()
ls()
Edit, with solution:
On further thought, this seems to work, only executing the core bits when .rm is 'called' directly. It works by inspecting the length of the call stack, and only running rm(...) if there is just one call in it (representing the current call to .rm(). If .rm is called/touched by a call to some other function (e.g. tables()), the call stack will be longer, and rm() won't be executed.:
makeActiveBinding('.rm',
function() {
if(length(sys.calls())==1) {
rm(list=ls(envir = .GlobalEnv),envir=.GlobalEnv); gc()
}
},
.GlobalEnv)
## Try _it_ out
library(data.table)
j <- 100
.rm
ls()
j <- 100
tables()
ls()