I am getting the following error because I believe a clusterExport() (parallel package) I'm doing is referring to the wrong environment:
Error in get(name, envir = envir) : object 'simulatedExpReturn' not found
I am getting this in a function and specifically at the clusterExport() line of this part:
simulatedExpReturn = list()
# Calculate the number of cores
no_cores <- detectCores()
# Initiate cluster
cl <- makeCluster(no_cores)
clusterExport(cl, c("simulatedExpReturn",
"covariance",
"numAssets",
"assetNames",
"numTimePoints-lag",
"stepSize"), envir = environment(Michaud1998MonteCarlo))
covariance, numAssets, assetNames, numTimePoints-lag, and stepSize are all passed into the function. I have also tried envir = envir and envir = .GlobalEnv and neither worked.
How can this be fixed?
This is a scoping problem, the clusterExport function is searching for your objects in the specified environment, and exports them to each processor's child environment. It does not search the .GlobalEnv where you have defined simulatedExpReturn.
This is why the following returns 1 and not an empty list:
> Michaud1998MonteCarlo <- new.env()
> simulatedExpReturn = list()
> assign("simulatedExpReturn", 1, envir = Michaud1998MonteCarlo)
>
> # Calculate the number of cores
> no_cores <- detectCores()
>
> # Initiate cluster
> cl <- makeCluster(no_cores)
>
> clusterExport(cl, c("simulatedExpReturn"), envir = Michaud1998MonteCarlo)
> clusterCall(cl, function() simulatedExpReturn)
[[1]]
[1] 1
[[2]]
[1] 1
[[3]]
[1] 1
[[4]]
[1] 1
To resolve, simply assign the value to the environment before running the clusterExport:
assign("simulatedExpReturn", list(), envir = Michaud1998MonteCarlo)
A simple example of passing variable via its name to another function:
print.variable.from.env <- function (x,e) { cat("Echoing", get(x, envir = e)) }
my.f <- function()
{
my.local <- "my local "
print.variable.from.env("my.local", environment())
}
my.f()
if you run it, it will simply print
Echoing my local
i.e. by passing the environment to print.variable.from.env, the function is able to get access to the varialbe given by its name in x
And one more example:
print.variable.from.env <- function (x,e) { cat("Echoing", get(x, envir = e), "\n") }
my.f <- function()
{
my.local <- "my local "
print.variable.from.env("my.local", environment())
print.variable.from.env("global.variable", parent.env(environment()))
}
global.variable <- "global"
my.f()
This shows the access to "global.variable" from function's parent env.
When executed it'll print
Echoing my local
Echoing global
Or even simpler, if you just want to access the caller's environment:
print.variable.from.env <- function (x) {
cat("Echoing", get(x, envir = parent.frame()))
}
my.f <- function() {
my.local <- "my local "
print.variable.from.env("my.local")
}
my.f()
Related
I was looking for an alternative to furrr:future_map() because when this function is run inside another function it copies all objects defined inside that function to each worker regardless of whether those objects are explicitly passed (https://github.com/DavisVaughan/furrr/issues/26).
It looks like parLapply() does the same thing when using clusterExport():
fun <- function(x) {
big_obj <- 1
cl <- parallel::makeCluster(2)
parallel::clusterExport(cl, c("x"), envir = environment())
parallel::parLapply(cl, c(1), function(x) {
x + 1
env <- environment()
parent_env <- parent.env(env)
return(list(this_env = env, parent_env = parent_env))
})
}
res <- fun(1)
names(res[[1]]$parent_env)
#> [1] "cl" "big_obj" "x"
Created on 2020-01-06 by the reprex package (v0.3.0)
How can I keep big_obj from getting copied to each worker? I am using a Windows machine so forking isn't an option.
You can change the environment of your local function so that it does not include big_obj by assigning e.g. only the base environment.
fun <- function(x) {
big_obj <- 1
cl <- parallel::makeCluster(2)
on.exit(parallel::stopCluster(cl), add = TRUE)
parallel::clusterExport(cl, c("x"), envir = environment())
local_fun <- function(x) {
x + 1
env <- environment()
parent_env <- parent.env(env)
return(list(this_env = env, parent_env = parent_env))
}
environment(local_fun) <- baseenv()
parallel::parLapply(cl, c(1), local_fun)
}
res <- fun(1)
"big_obj" %in% names(res[[1]]$parent_env) # FALSE
Why this matters
For drake, I want users to be able to execute mclapply() calls within a locked global environment. The environment is locked for the sake of reproducibility. Without locking, data analysis pipelines could invalidate themselves.
Evidence that mclapply() adds or removes global bindings
set.seed(0)
a <- 1
# Works as expected.
rnorm(1)
#> [1] 1.262954
tmp <- parallel::mclapply(1:2, identity, mc.cores = 2)
# No new bindings allowed.
lockEnvironment(globalenv())
# With a locked environment
a <- 2 # Existing bindings are not locked.
b <- 2 # As expected, we cannot create new bindings.
#> Error in eval(expr, envir, enclos): cannot add bindings to a locked environment
tmp <- parallel::mclapply(1:2, identity, mc.cores = 2) # Unexpected error.
#> Warning in parallel::mclapply(1:2, identity, mc.cores = 2): all scheduled
#> cores encountered errors in user code
Created on 2019-01-16 by the reprex package (v0.2.1)
EDIT
For the original motivating problem, see https://github.com/ropensci/drake/issues/675 and https://ropenscilabs.github.io/drake-manual/hpc.html#parallel-computing-within-targets.
I think parallel:::mc.set.stream() has the answer. Apparently, mclapply() tries to remove .Random.seed from the global environment by default. Since the default RNG algorithm is Mersenne Twister, we dive into the else block below.
> parallel:::mc.set.stream
function ()
{
if (RNGkind()[1L] == "L'Ecuyer-CMRG") {
assign(".Random.seed", get("LEcuyer.seed", envir = RNGenv),
envir = .GlobalEnv)
}
else {
if (exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
rm(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
}
}
<bytecode: 0x4709808>
<environment: namespace:parallel>
We can use mc.set.seed = FALSE to make the following code work, but this is probably not a good idea in practice.
set.seed(0)
lockEnvironment(globalenv())
parallel::mclapply(1:2, identity, mc.cores = 2, mc.set.seed = FALSE)
I wonder if there is a way to lock the environment while still allowing us to remove .Random.seed.
You can remove the .Random.seed yourself before you lock the environment. Also you need to load the library (or use the function before) and assign tmp to something.
library(parallel)
tmp <- NULL
rm(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
lockEnvironment(globalenv())
tmp <- parallel::mclapply(1:2, identity, mc.cores = 2)
Of course this will not allow functions that need .Random.seed like rnorm to work.
A workaround is to to change the RNG kind to "L'Ecuyer-CMRG", see also here ?nextRNGStream:
library(parallel)
tmp <- NULL
RNGkind("L'Ecuyer-CMRG")
lockEnvironment(globalenv())
tmp <- parallel::mclapply(1:2, rnorm, mc.cores = 2)
EDIT
I thought of another solution to your problem and I think this will work with any RNG (did not test much). You can override the function that removes .Random.seed with one that just sets it to NULL
library(parallel)
mc.set.stream <- function () {
if (RNGkind()[1L] == "L'Ecuyer-CMRG") {
assign(".Random.seed", get("LEcuyer.seed", envir = RNGenv),
envir = .GlobalEnv)
} else {
if (exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE)) {
assign(".Random.seed", NULL, envir = .GlobalEnv)
}
}
}
assignInNamespace("mc.set.stream", mc.set.stream, asNamespace("parallel"))
tmp <- NULL
set.seed(0)
lockEnvironment(globalenv())
tmp <- parallel::mclapply(1:2, rnorm, mc.cores = 2)
One final thought: you can create a new environment containing all things you don't want to be changed, lock it and work in there.
Surely this is possible, but I can't seem to find how to do it:
I'd like to have a default of a function input, but override the default and get() a variable from the global environment if it exists in the global environment. If it doesn't exist in the global environment, take the default of the function, with any setting in the function being top level and overriding them all.
Ideally it would work like this made-up non-working function:
###Does not work, desired example
myfunc <- function(x=30){
if(exists.in.global.env(x)){x <- get(x)}
###Top level is tough
if(x.is.defined.as.function.input=TRUE ????){x <- x.defined.as.input}
}else{ x <- 30}
return(x)
}
So that if I do:
myfunc()
[1] 30
But if I create x I want it to override the default x=30 of the function and take the global environment value instead:
x <- 100
myfunc()
[1] 100
But if I have x defined inside the function, I'd like that to be top level, i.e. override everything else even if x is defined globally:
x <- 100
myfunc(x=300)
[1] 300
Thanks in advance!
You can modify your function to check if x exists in the .GlobalEnv and get it from there if it does, otherwise return the default value.
myfunc <- function(x = 30) {
if ("x" %in% ls(envir = .GlobalEnv)) {
get("x", envir = .GlobalEnv)
} else {
x
}
}
So if "x" %in% ls(envir = .GlobalEnv) is FALSE it would return
myfunc()
[1] 30
If x is found it would return it. if x <- 100:
myfunc()
[1] 100
Edit after comment
If you want to make sure to only return x from the global environment if x is not specified as an argument to myfunc, you can use missing(). It returns TRUE if x was not passed and FALSE if it was:
myfunc <- function(x = 30) {
if ("x" %in% ls(envir = .GlobalEnv) & missing(x)) {
get("x", envir = .GlobalEnv)
} else {
x
}
}
So for your example:
x <- 100
myfunc(x=300)
[1] 300
The simplest method would be to set an appropriate default argument:
myfunc <- function(x=get("x", globalenv())
{
x
}
> x <- 100
> f()
[1] 100
> f(30)
[1] 30
> rm(x)
> f()
Error in get("x", globalenv()) : object 'x' not found
I wrote a function in which I define variables and load objects. Here's a simplified version:
fn1 <- function(x) {
load("data.RData") # a vector named "data"
source("myFunctions.R")
library(raster)
library(rgdal)
a <- 1
b <- 2
r1 <- raster(ncol = 10, nrow = 10)
r1 <- init(r1, fun = runif)
r2 <- r1 * 100
names(r1) <- "raster1"
names(r2) <- "raster2"
m <- stack(r1, r2) # basically, a list of two rasters in which it is possible to access a raster by its name, like this: m[["raster1"]]
c <- fn2(m)
}
Function "fn2" is can be found in "myFunctions.R" and is defined as:
fn2 <- function(x) {
fn3 <- function(y) {
x[[y]] * 100 * data
}
cl <- makeSOCKcluster(8)
clusterExport(cl, list("x"), envir = environment())
clusterExport(cl, list("a", "b", "data"))
clusterEvalQ(cl, c(library(raster), library(rgdal), rasterOptions(maxmemory = a, chunksize = b)))
f <- parLapply(cl, names(x), fn3)
stopCluster(cl)
}
Now, when I run fn1, I get an error like this:
Error in get(name, envir = envir) : object 'a' not found
From what I understand from ?clusterExport, the default value for envir is .GlobalEnv, so I would assume that "a" and "b" would be accessible to fn2. However, it doesn't seem to be the case. How can I access the environment to which "a" and "b" belong?
So far, the only solution I have found is to pass "a" and "b" as arguments to fn2. Is there a way to use these two variables in fn2 without passing them as arguments?
Thanks a lot for your help.
You're getting the error when calling clusterExport(cl, list("a", "b", "data")) because clusterExport is trying to find the variables in .GlobalEnv, but fn1 isn't setting them in .GlobalEnv but in its own local environment.
An alternative is to pass the local environment of fn1 to fn2, and specify that environment to clusterExport. The call to fn2 would be:
c <- fn2(m, environment())
If the arguments to fn2 are function(x, env), then the call to clusterExport would be:
clusterExport(cl, list("a", "b", "data"), envir = env)
Since environments are passed by reference, there should be no performance problem doing this.
I'm trying to write a function 'exported' in R that will assign a value to a name in a desired environment (say .GlobalEnv). I'd like to use the following syntax.
# desired semantics: x <- 60
exported(x) <- 60
# ok if quotes prove necessary.
exported("x") <- 60
I've tried several variations. Most basically:
`export<-` <- function(x, obj) {
call <- as.list(match.call())
elem <- as.character(call[[2]])
assign(elem, obj, .GlobalEnv)
get(elem, .GlobalEnv)
}
exported(x) <- 50
The foregoing gives an error about the last argument being unused. The following complains that "object 'x' is not found."
setGeneric("exported<-", function(object, ...) {
standardGeneric("exported<-")
})
setReplaceMethod("exported", "ANY", function(object, func) {
call <- as.list(match.call())
name <- as.character(call$object)
assign(name, func, other.env)
assign(name, func, .GlobalEnv)
get(name, .GlobalEnv)
})
exported(x) <- 50
The above approach using a character vector in place of a name yields "target of assignment expands to non-language object."
Is this possible in R?
EDIT: I would actually like to do more work inside 'exported.' Code was omitted for brevity. I also realize I can use do something like:
exported(name, func) {
...
}
but am interested in seeing if my syntax is possible.
I can't understand why you wouldn't use assign?
assign( "x" , 60 , env = .GlobalEnv )
x
[1] 60
The env argument specifies the environment into which to assign the variable.
e <- new.env()
assign( "y" , 50 , env = e )
ls( env = e )
[1] "y"