I have an expensive problem I'm trying to split into pieces.
It's an optimization problem, and consists of an initial expensive setup step, followed by a recursive structure, such that the workers can only perform one step at a time before the results need to be collected, and a new task sent to the workers.
A complicating feature is that an initial setup step for the sub computations that should occur on each worker, has to be performed directly on each worker, and cannot be exported to the worker via clusterExport or similar.
I had hoped to be able to use clusterApply to assign the outcome of this initial setup to be stored on the specific worker, but can't seem to achieve this.
The first part of my code below shows my current attempts and describes what I would like, the second shows an attempt to see all objects available on the worker and where they are located.
library(parallel)
### What I would like to do:
test2<-function(){
MYOBJECT <-0
cl=makeCluster(2,type='PSOCK')
clusterExport(cl,c('MYOBJECT'),envir = environment())
clusterApply(cl,1:2,function(x) { #attempt to modify / create MYOBJECT on the worker processes
y <- x * 2 #expensive operation I only want to do once, that *cannot* be exported to the worker
MYOBJECT <<- y
MYOBJECT <- y
assign('MYOBJECT',y,envir = parent.frame()) #envs[[1]])
})
clusterApply(cl,1:2,function(x) MYOBJECT * .5) #cheap operation to be done many times
}
test2() #should return a list of 1 and 2, without assignment into the test2 function environment / re exporting
#trying to find out where MYOBJECT is on the worker
test<-function(){
MYOBJECT <-1
cl=makeCluster(1,type='PSOCK')
clusterExport(cl,c('MYOBJECT'),envir = environment())
clusterApply(cl,1,function(x) {
MYOBJECT <<- list('hello')
assign('MYOBJECT',list('hellohello'),envir = parent.frame()) #envs[[1]])
})
clusterApply(cl,1,function(x)
lapply(sys.frames(),ls) #where is MYOBJECT?
)
}
test()
Simple solution in the end -- to modify the contents of individual workers in a persistent manner, the assignment within the clusterApply function needs to be made to the global environment.
library(parallel)
### What I would like to do:
test2<-function(){
MYOBJECT <-0
cl=makeCluster(2,type='PSOCK')
clusterExport(cl,c('MYOBJECT'),envir = environment())
clusterApply(cl,1:2,function(x) { #attempt to modify / create MYOBJECT on the worker processes
y <- x * 2 #expensive operation I only want to do once, that *cannot* be exported to the worker
assign('MYOBJECT2',y,envir = globalenv()) #envs[[1]])
})
clusterApply(cl,1:2,function(x) MYOBJECT2 * .5) #cheap operation to be done many times
}
test2() #should return a list of 1 and 2, without assignment into the test2 function environment / re exporting
Related
There is an object that must be returned by a function regardless of why it terminated.
For example, while that function is running I press the red button on the top right corner of Rstudio to force it to terminate. In this situation, the function should return the object as well.
I remember there being something like a "on.exist" function that does something when a function exists. can't remember.
You can't use on.exit for that purpose but you could just write out any intermediate calculation so that if the function is halted you still have the last value calculated. Alternately you could write it out to a file in which case you would have it even if R or the computer crashes.
# test function
f <- function() {
for(i in 1:100) {
x <<- i
Sys.sleep(10)
}
i
}
x <- NULL
x <- f()
# press red button before it finishes
x # last value it calculated
I am working with a program which includes many function calls inside a for loop. For short, it is something like this:
function something()
....
....
timer = zeros(NSTEP);
for it = 1:NSTEP # time steps
tic = time_ns();
Threads.#threads for p in 1:2 # Star parallel of two sigma functions
Threads.lock(l);
Threads.unlock(l);
arg_in_sig[p] = func_sig[p](arg_in_sig[p]);
end
.....
.....
Threads.#threads for p in 1:2
Threads.lock(l)
Threads.unlock(l)
arg_in_vel[p] = func_vel[p](arg_in_vel[p])
end
toc=time_ns();
timer[i] = toc-tic;
end # time loop
writedlm("timer.txt",timer)
return
end
What I am trying to do, is to meassure the time that takes to perform on each loop iteration, saving the result in an output file called "timer.txt". The thing is that it doesn't work.
It saves a file with all zeros on it (except two or three values, which is more confusing).
I made a toy example like:
using DelimitedFiles;
function test()
a=zeros(1000)
for i=1:1000
tic = time_ns();
C = rand(20,20)*rand(20,20);
toc = time_ns();
a[i] = toc-tic;
end
writedlm("aaa.txt",a);
return a;
end
and these actually works (it saves fine!). Is there something to do with the fact that I am implementing Threads.#threads?. What can be happening between writedlm() and time_ns() in my program?
Any help would be much apreciated!
You are iterating over it but try to save by:
timer[i] = toc-tic;
while it should be
timer[it] = toc-tic;
Perhaps you have some i in global scope and hence the code still works.
Additionally locking the thread and immediately unlocking does not seem to make much sense. Moreover, when you iterate over p which happens to be also index of the Vector cell where you save the results there is no need to use the locking mechanism at all (unless you are calling some functions that depend on a global state).
In R, is there a way to exit from the calling function and return a value? Something like return(), but from the parent function?
parent <- function(){
child()
# stuff afterward should not be executed
}
child <- function(){
returnFromParent("a message returned by parent()")
}
It seems stop() is doing something like that. What I want to do is to write a small replacement for stop() that returns the message that stop() writes to stderr.
Update after G5W's suggestion: I have a large number of checks, each resulting in a stop() if the test fails, but subsequent conditions cannot be evaluated if earlier checks fail, so the function must exit after a failing one. To do this 'properly', I would have to build up a huge if else construct, which I wanted to avoid.
Got it. I guess I was looking for something like this:
parent <- function(){
parent_killing_child()
print("do not run this")
}
parent_killing_child <- function(){
do.call(return, list("my message"), envir = sys.frame(-1))
}
parent()
Thanks for all the advices.
Disclaimer: This sounds a XY problem, printing the stop message to stdout has few to no value, if interactive it should not be a problem, if in a script just use the usual redirection 2 > &1 to write stderr messages to stdout, or maybe use sink as in answer in this question.
Now, if I understood properly what you're after I'll do something like the following to avoid too much code refactoring.
First define a function to handle errors:
my_stop <- function() {
e <- geterrmessage()
print(e)
}
Now configure the system to send errors to your function (error handler) and suppress error messages:
options(error = my_stop)
options(show.error.messages=FALSE)
Now let's test it:
f1 <- function() {
f2()
print("This should not be seen")
}
f2 <- function() {
stop("This is a child error message")
}
Output:
> f1()
[1] "Error in f2() : This is a child error message\n"
For the parent function, make a list of tests. Then loop over the tests, and return your message at the first failed test. Subsequent tests will not be executed after the first failure.
Sample code:
test1 <- function(){criteria <- T; return(ifelse(criteria,T,F))}
test2 <- function(){criteria <- F; return(ifelse(criteria,T,F))}
test3 <- function(){criteria <- T; return(ifelse(criteria,T,F))}
parent <- function() {
tests <- c('test1', 'test2', 'test3')
for (i in 1:length(tests)) {
passed <- do.call(tests[i],args = list())
#print(passed)
if (!passed){
return(paste("Testing failed on test ", i, ".", sep=''))
}
}
return('Congrats! All tests passed!')
}
parent()
Update
Kudos to #chris for their clever application of do.call() in their successful solution.
In five years since then, the R team has released the rlang package within the tidyverse, which provides the apt function rlang::return_from() in tandem with rlang::return_to().
While base::return() can only return from the current local frame,
these two functions will return from any frame on the current
evaluation stack, between the global and the currently active context.
They provide a way of performing arbitrary non-local jumps out of the
function currently under evaluation.
Solution
Thus, you can simply do
child <- function() {
rlang::return_from(
# Return from the parent context (1 frame back).
frame = rlang::caller_env(n = 1),
# Return the message text.
value = "some text returned by parent()"
)
}
where the parent is identified via rlang::caller_env().
Results
When called from a parent() function
parent <- function() {
child()
# stuff afterward should not be executed
return("text that should NOT be returned by parent()")
}
the child() function will force parental behavior like this:
parent()
#> [1] "some text returned by parent()"
Bonus
See my solution here for throwing an error from a parent (or from any arbitrary "ancestor").
Does R have any mechanism to run different calculation in different threads (Windows-like mechanism of threads/tasks)? Let's
func1 <- function(x) { return (x^2); }
func2 <- function(y) { return (y^3); }
I need to execute something like this (imagine code):
thread1 <- thread_run(func1);
thread2 <- thread_run(func2);
with same mechanism of synchronization, like:
wait(thread1);
wait(thread2);
You can do that with the future package
install.packages(future)
library(future)
And then just use your code and just change the assigment to
thread1 %<-% thread_run(func1);
thread2 %<-% thread_run(func2);
Here more to read: http://www.r-bloggers.com/a-future-for-r-slides-from-user-2016/
I was just wondering if there was a way to force a function to only accept certain data types, without having to check for it within the function; or, is this not possible because R's type-checking is done at runtime (as opposed to those programming languages, such as Java, where type-checking is done during compilation)?
For example, in Java, you have to specify a data type:
class t2 {
public int addone (int n) {
return n+1;
}
}
In R, a similar function might be
addone <- function(n)
{
return(n+1)
}
but if a vector is supplied, a vector will (obviously) be returned. If you only want a single integer to be accepted, then is the only way to do to have a condition within the function, along the lines of
addone <- function(n)
{
if(is.vector(n) && length(n)==1)
{
return(n+1)
} else
{
return ("You must enter a single integer")
}
}
Thanks,
Chris
This is entirely possible using S3 classes. Your example is somewhat contrived in the context or R, since I can't think of a practical reason why one would want to create a class of a single value. Nonetheless, this is possible. As an added bonus, I demonstrate how the function addone can be used to add the value of one to numeric vectors (trivial) and character vectors (so A turns to B, etc.):
Start by creating a generic S3 method for addone, utlising the S3 despatch mechanism UseMethod:
addone <- function(x){
UseMethod("addone", x)
}
Next, create the contrived class single, defined as the first element of whatever is passed to it:
as.single <- function(x){
ret <- unlist(x)[1]
class(ret) <- "single"
ret
}
Now create methods to handle the various classes. The default method will be called unless a specific class is defined:
addone.default <- function(x) x + 1
addone.character <- function(x)rawToChar(as.raw(as.numeric(charToRaw(x))+1))
addone.single <- function(x)x + 1
Finally, test it with some sample data:
addone(1:5)
[1] 2 3 4 5 6
addone(as.single(1:5))
[1] 2
attr(,"class")
[1] "single"
addone("abc")
[1] "bcd"
Some additional information:
Hadley's devtools wiki is a valuable source of information on all things, including the S3 object system.
The S3 method doesn't provide strict typing. It can quite easily be abused. For stricter object orientation, have a look at S4 classes, reference based classesor the proto package for Prototype object-based programming.
You could write a wrapper like the following:
check.types = function(classes, func) {
n = as.name
params = formals(func)
param.names = lapply(names(params), n)
handler = function() { }
formals(handler) = params
checks = lapply(seq_along(param.names), function(I) {
as.call(list(n('assert.class'), param.names[[I]], classes[[I]]))
})
body(handler) = as.call(c(
list(n('{')),
checks,
list(as.call(list(n('<-'), n('.func'), func))),
list(as.call(c(list(n('.func')), lapply(param.names, as.name))))
))
handler
}
assert.class = function(x, cls) {
stopifnot(cls %in% class(x))
}
And use it like
f = check.types(c('numeric', 'numeric'), function(x, y) {
x + y
})
> f(1, 2)
[1] 3
> f("1", "2")
Error: cls %in% class(x) is not TRUE
Made somewhat inconvenient by R not having decorators. This is kind of hacky
and it suffers from some serious problems:
You lose lazy evaluation, because you must evaluate an argument to determine
its type.
You still can't check the types until call time; real static type checking
lets you check the types even of a call that never actually happens.
Since R uses lazy evaluation, (2) might make type checking not very useful,
because the call might not actually occur until very late, or never.
The answer to (2) would be to add static type information. You could probably
do this by transforming expressions, but I don't think you want to go there.
I've found stopifnot() to be highly useful for these situations as well.
x <- function(n) {
stopifnot(is.vector(n) && length(n)==1)
print(n)
}
The reason it is so useful is because it provides a pretty clear error message to the user if the condition is false.