Asynchronous programming in R - r

Overview
I am writing a program (in R) that makes API calls at certain designated times. The API calls take a while, but I need the timer (main loop) to continue counting while the API call is made. To do so, I need to "outsource" the API call to another CPU thread. I believe this is possible and have looked into the future and promises packages, but haven't found a solution yet.
Reproducible Example
Let's run a for loop that counts from 0 to 100. When the counter (i) gets to 50, it has to complete a resource-intensive process (calling the function sampler, which samples 1 million normal distributions 10,000 times for the sake of taking up computation space). The desire is for the counter to continue counting while sampler() is doing its work on another thread.
#Something to take up computation space
sampler <- function(){
for(s in 1:10000) sample(1000000)
}
#Get this counter to continue while sampler() runs on another thread
for(i in 1:100){
message(i)
if(i == 50){
sampler()
}
}
What I have tried (unsuccessfully)
library(future)
sampler <- function(){
for(s in 1:10000) sample(1000000)
}
for(i in 1:100){
message(i)
if(i == 50){
mySamples <- future({ sampler() }) %plan% multiprocess
}
}

It seems to me your call is only blocking while the workers are created, but not for the duration of the actual work. E.g. if do the plan() first, the counter will not block:
library(future)
sampler <- function(){
for(s in 1:10000) sample(1000000)
}
plan(multiprocess)
for(i in 1:100){
message(i)
if(i == 50){
mySamples <- future({ sampler() })
}
}
Also note, that the runtime of sampler() is much longer than the duration of the blocking call in your code and that, after executing your code, mySamples still has the status resolved: FALSE and CPU usage is still high.

Related

How to redo tryCatch after error in for loop

I am trying to implement tryCatch in a for loop.
The loop is built to download data from a remote server. Sometimes the server no more responds (when the query is big).
I have implemented tryCatch in order to make the loop keeping.
I also have added a sys.sleep() pause if an error occurs in order to wait some minutes before sending next query to the remote server (it works).
The problem is that I don't figure out how to ask the loop to redo the query that failed and lead to a tryCatch error (and to sys.sleep()).
for(i in 1:1000){
tmp <- tryCatch({download_data(list$tool[i])},
error = function(e) {Sys.sleep(800)})
}
Could you give me some hints?
You can do something like this:
for(i in 1:1000){
download_finished <- FALSE
while(!download_finished) {
tmp <- tryCatch({
download_data(list$tool[i])
download_finished <- TRUE
},
error = function(e) {Sys.sleep(800)})
}
}
If you are certain that waiting for 800 seconds always fixes the issue this change should do it.
for(i in 1:1000) {
tmp <- tryCatch({
download_data(list$tool[i])
},
error = function(e) {
Sys.sleep(800)
download_data(list$tool[i])
})
}
A more sophisticated approach could be, to collect the information of which request failed and then rerun the script until all requests succeed.
One way to do this is to use the possibly() function from the purrr package. It would look something like this:
todo <- rep(TRUE, length(list$tool))
res <- list()
while (any(todo)) {
res[todo] <- map(list$tool[todo],
possibly(download_data, otherwise = NA))
todo <- map_lgl(res, ~ is.na(.))
}

Execution of future package in R results in an endless waiting time

i have a Problem with the future package. In my Task I try to set up an asynchronous process. I am doing this using Futures. If I run my script for the first time (in a clean RSession) everything is working fine and as expected. Running the same function for the second time, within the same R Session, Ends up in an endless waiting time. The execution stops in the line, where the Futures are started. No Errors are thrown. The Code just runs forever. If I interrupt the Code manually, the browser is called from the line:
Sys.sleep(interval).
Doing this a little bit earlier the call is made from:
Called from: socketSelect(list(con), write = FALSE, timeout = timeout).
I have written a small program, which has basically the same structure as my script and the same Problem occurs. While not obvious in this little example, this structure has some advantage in my original code:
library(future)
library(parallel)
asynchronousfunction <- function(){
Threads.2.start <- availableCores()
cl <- parallel::makePSOCKcluster(Threads.2.start)
plan(cluster, workers = cl)
threads <- lapply(1:Threads.2.start, function(index){
future::cluster({Sys.getpid()},persistent = TRUE, workers = cl[[index]])
})
while(!any(resolved(threads))){
Sys.sleep(0.1)
}
threads <- lapply(1:Threads.2.start, function(index){
future::cluster({Sys.getpid()},persistent = TRUE, workers = cl[[index]])
})
stopCluster(cl = cl)
}
asynchronousfunction() # First call to the function. Everything is working fine.
asynchronousfunction() #Second call to the function. Endless Execution.
I am working on Windows 10 and the R Version is 3.4.2. The package Version is 1.6.2.
I hope you Guys can help me.
Thanks in advance.
Best Regards,
Harvard
Author future here. It looks a like you've tried to overdo it a bit and I am not 100% sure what you're trying to achieve. Things that looks suspicious to me is your use of:
cluster() - call future() instead.
cluster(..., workers = cl[[index]]) - don't specify workers when you set up a future.
Is there a reason why you want to use persistent = TRUE?
resolve(threads) basically does the same as your while() loop.
You are not collecting the values of the futures, i.e. you're not calling value() or values().
For troubleshooting, you can get more details on what's going on under the hood by setting option(future.debug = TRUE).
If I'd rewrite your example as close to what you have now, a working example would look like this:
library("future")
asynchronousfunction <- function() {
n <- availableCores()
cl <- makeClusterPSOCK(n)
plan(cluster, workers = cl)
fs <- lapply(1:n, function(index) {
future({ Sys.getpid() }, persistent = TRUE)
})
## Can be replaced by resolve(fs)
while(!any(resolved(fs))) {
Sys.sleep(0.1)
}
fs <- lapply(1:n, function(index) {
future({ Sys.getpid() }, persistent = TRUE)
})
parallel::stopCluster(cl = cl)
}
Instead of rolling your own lapply() + future(), would it be sufficient for you to use future_lapply()? For example,
asynchronousfunction <- function() {
n <- availableCores()
cl <- makeClusterPSOCK(n)
plan(cluster, workers = cl)
pids <- future_lapply(1:n, function(ii) {
Sys.getpid()
})
str(pids)
pids <- future_lapply(1:n, function(ii) {
Sys.getpid()
})
str(pids)
parallel::stopCluster(cl = cl)
}

How to flush the print buffer in R?

I want to run a long-running simulation and have updates printed periodically. However, I am finding that my print statements are being buffered, even when I explicitly try to flush. Here, for example:
for (i in 1:10)
{
print(i)
flush(stdout())
Sys.sleep(1)
}
I would expect this to increment every 1sec, but it outputs everything at the end, after 10 seconds.
How would you force a flush of the print buffer?
I usually do it like this:
for (i in 1:10) {
message(i,"\r",appendLF=FALSE)
flush.console()
Sys.sleep(1)
}
You can also use cat():
for (i in 1:10) {
# Sleep for 1 second
Sys.sleep(1)
# Print the current iteration
cat(paste0("\r", i))
}

Is there any way to break out of a foreach loop?

I am using the R package foreach() with %dopar% to do long (~days) calculations in parallel. I would like the ability to stop the entire set of calculations in the event that one of them produces an error. However, I have not found a way to achieve this, and from the documentation and various forums I have found no indication that this is possible. In particular, break() does not work and stop() only stops the current calculation, not the whole foreach loop.
Note that I cannot use a simple for loop, because ultimately I want to parallelize this using the doRNG package.
Here is a simplified, reproducible version of what I am attempting (shown here in serial with %do%, but I have the same problem when using doRNG and %dopar%). Note that in reality I want to run all of the elements of this loop (here 10) in parallel.
library(foreach)
myfunc <- function() {
x <- foreach(k = 1:10, .combine="cbind", .errorhandling="stop") %do% {
cat("Element ", k, "\n")
Sys.sleep(0.5) # just to show that stop does not cause exit from foreach
if(is.element(k, 2:6)) {
cat("Should stop\n")
stop("Has stopped")
}
k
}
return(x)
}
x <- myfunc()
# stop() halts the processing of k=2:6, but it does not stop the foreach loop itself.
# x is not returned. The execution produces the error message
# Error in { : task 2 failed - "Has stopped"
What I would like to achieve is that the entire foreach loop can be exited immediately upon some condition (here, when the stop() is encountered).
I have found no way to achieve this with foreach. It seems that I would need a way to send a message to all the other processes to make them stop too.
If not possible with foreach, does anyone know of alternatives? I have also tried to achieve this with parallel::mclapply, but that does not work either.
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] foreach_1.4.0
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.0.0 iterators_1.0.6
It sounds like you want an impatient version of the "stop" error handling. You could implement that by writing a custom combine function, and arranging for foreach to call it as soon as each result is returned. To do that you need to:
Use a backend that supports calling combine on-the-fly, like doMPI or doRedis
Don't enable .multicombine
Set .inorder to FALSE
Set .init to something (like NULL)
Here's an example that does that:
library(foreach)
parfun <- function(errval, n) {
abortable <- function(errfun) {
comb <- function(x, y) {
if (inherits(y, 'error')) {
warning('This will leave your parallel backend in an inconsistent state')
errfun(y)
}
c(x, y)
}
foreach(i=seq_len(n), .errorhandling='pass', .export='errval',
.combine='comb', .inorder=FALSE, .init=NULL) %dopar% {
if (i == errval)
stop('testing abort')
Sys.sleep(10)
i
}
}
callCC(abortable)
}
Note that I also set the error handling to "pass" so foreach will call the combine function with an error object. The callCC function is used to return from the foreach loop regardless of the error handling used within foreach and the backend. In this case, callCC will call the abortable function, passing it a function object that is used force callCC to immediately return. By calling that function from the combine function we can escape from the foreach loop when we detect an error object, and have callCC return that object. See ?callCC for more information.
You can actually use parfun without a parallel backend registered and verify that the foreach loop "breaks" as soon as it executes a task that throws an error, but that could take awhile since the tasks are executed sequentially. For example, this takes 20 seconds to execute if no backend is registered:
print(system.time(parfun(3, 4)))
When executing parfun in parallel, we need to do more than simply break out of the foreach loop: we also need to stop the workers, otherwise they will continue to compute their assigned tasks. With doMPI, the workers can be stopped using mpi.abort:
library(doMPI)
cl <- startMPIcluster()
registerDoMPI(cl)
r <- parfun(getDoParWorkers(), getDoParWorkers())
if (inherits(r, 'error')) {
cat(sprintf('Caught error: %s\n', conditionMessage(r)))
mpi.abort(cl$comm)
}
Note that the cluster object can't be used after the loop aborts, because things weren't properly cleaned up, which is why the normal "stop" error handling doesn't work this way.
It's not a direct answer to your question, but using when() you can avoid entering the loop if a condition is satisfied:
x <- foreach(k = 1:10, .combine="cbind", .errorhandling="stop") %:%
when( !is.element(k, 2:6) ) %do%
{
cat("Element ", k, "\n")
Sys.sleep(0.5)
k
}
EDIT:
I forgot something: I think it's by design, that you cannot just stop the foreach loop. If you run the loop in parallel, each turn is processed independently, which means when you stop the entire loop for k=2 it is not predictable if the process for k=1 terminated already or is still running. Hence, using the when() condition gives you a deterministic result.
EDIT 2: Another solution considering your comment.
shouldStop <- FALSE
x <- foreach(k = 1:10, .combine="cbind", .errorhandling="stop") %do%
{
if( !shouldStop ){
# put your time consuming code here
cat("Element ", k, "\n")
Sys.sleep(0.5)
shouldStop <- shouldStop || is.element(k, 2:6)
k
}
}
Using this solution, the processes which are running while the stop condition becomes true are still calculated to an end, but you avoid time consumption on all upcoming processes.
Instead of trying to break out of a loop, I write a small file to the disk when I've reached my terminal loop, then have all remaining iterations simply skip based on the existence of that file.
Checking if a file exists costs us less than a milisecond of computing time.
# 1.4 seconds to check if a file exists a million times
system.time(lapply(1:1e6, function(x) file.exists("checker.txt")))
user system elapsed
1.204 0.233 1.437
This is great when you don't have a fixed number of iterations or your process can finish before all of the iterations are complete (like a convergence, for example)
library(foreach)
alist <- foreach(i = 1:5000) %dopar% {
if(file.exists("checker.txt")) {
return(NULL)
} else {
if(i = 20) {
write("", "checker.txt") # write an empty file
}
return(i)
}
}
file.remove("checker.txt")
The great thing about this is that even if your list is extremely long, if you just unlist() you only get the values.
> length(alist)
[1] 5000
> unlist(res)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Don't bother trying to break, instead, just "skip the rest"!
The answer I got from REvolution Technical support: "no--foreach doesn't currently have a way to stop all parallel computations on an error to any one".
I am not having much luck getting foreach to do what I want, so here is a solution using the parallel package which seems to do what I want. I use the intermediate option in mcparallel() to pass results from my function, do.task(), immediately to the function check.res(). If do.task() throws an error, then this is used in check.res() to trigger calling tools::pskill to explicitly kill all workers. This might not be very elegant, but it works in the sense that it causes an instant stop of all worked. Furthermore, I can simply inherit all the variables I need for the processing in do.task() from the present environment. (In reality do.task() is a much more complex function requiring many variables to be passed in.)
library(parallel)
# do.task() and check.res() inherit some variables from enclosing environment
do.task <- function(x) {
cat("Starting task", x, "\n")
Sys.sleep(5*x)
if(x==stopat) {
stop("Error in job", x) # thrown to mccollect() which sends it to check.res()
}
cat(" Completed task", x, "\n")
return(10*x)
}
check.res <- function(r) { # r is list of results so far
cat("Called check.res\n")
sendKill <- FALSE
for(j in 1:Njob) { # check whether need to kill
if(inherits(r[[j]], 'try-error')) {
sendKill <- TRUE
}
}
if(sendKill) { # then kill all
for(j in 1:Njob) {
cat("Killing job", job[[j]]$pid, "\n")
tools::pskill(job[[j]]$pid) # mckill not accessible
}
}
}
Tstart <- Sys.time()
stopat <- 3
Njob <- 4
job <- vector("list", length=Njob)
for(j in 1:Njob) {
job[[j]]<- mcparallel(do.task(j))
}
res <- mccollect(job, intermediate=check.res) # res is in order 1:Njob, regardless of how long jobs took
cat("Collected\n")
Tstop <- Sys.time()
print(difftime(Tstop,Tstart))
for(j in 1:Njob) {
if(inherits(res[[j]], 'try-error')) {
stop("Parallel part encountered an error")
}
}
This gives the following screen dump and results for variable res
> source("exp5.R")
Starting task 1
Starting task 2
Starting task 3
Starting task 4
Completed task 1
Called check.res
Called check.res
Completed task 2
Called check.res
Called check.res
Called check.res
Killing job 21423
Killing job 21424
Killing job 21425
Killing job 21426
Called check.res
Killing job 21423
Killing job 21424
Killing job 21425
Killing job 21426
Called check.res
Killing job 21423
Killing job 21424
Killing job 21425
Killing job 21426
Collected
Time difference of 15.03558 secs
Error in eval(expr, envir, enclos) : Parallel part encountered an error
> res
$`21423`
[1] 10
$`21424`
[1] 20
$`21425`
[1] "Error in do.task(j) : Error in job3\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in do.task(j): Error in job3>
$`21426`
NULL
Steve Weston's original answer essentially answered this. But here is a slightly modified version of his answer which also preserves two additional features in the way I need them: (1) random number generation; (2) printing run-time diagnostics.
suppressMessages(library(doMPI))
comb <- function(x, y) {
if(inherits(y, 'error')) {
stop(y)
}
rbind(x, y) # forces the row names to be 'y'
}
myfunc <- function() {
writeLines(text="foreach log", con="log.txt")
foreach(i=1:12, .errorhandling='pass', .combine='comb', .inorder=FALSE, .init=NULL) %dopar% {
set.seed(100)
sink("log.txt", append=TRUE)
if(i==6) {
stop('testing abort')
}
Sys.sleep(10)
cat("Completed task", i, "\n")
sink(NULL)
rnorm(5,mean=i)
}
}
myerr <- function(e) {
cat(sprintf('Caught error: %s\n', conditionMessage(e)))
mpi.abort(cl$comm)
}
cl <- startMPIcluster(4)
registerDoMPI(cl)
r <- tryCatch(myfunc(), error=myerr)
closeCluster(cl)
When this file is sourced, it exits as intended with an error message
> source("exp2.R")
4 slaves are spawned successfully. 0 failed.
Caught error: testing abort
[ganges.local:16325] MPI_ABORT invoked on rank 0 in communicator with errorcode 0
The 'log.txt' files provides correct diagnostics up to the point of error, and then provides additional error information. Crucially, the execution of all tasks is halted as soon as the stop() in the foreach loop is encountered: it does not wait until the entire foreach loop has completed. Thus I only see the 'Completed task' message up to i=4. (Note that if Sys.sleep() is shorter, then later tasks may be started before the mpi.abort() is processed.)
If I change the stop condition to be "i==100", then the stop and hence the error is not triggered. The code successfully exists without an error message, and r is a 2D array with dimensions 12*5.
Incidentally, it seems that I don't actually need .inorder=FALSE (I think that just gives me a small speed increase in the event that an error is found).

foreach - dopar do not start workers

I have the following piece of code that I would like to run with the doMC engine:
who_wins<-function(probs_a,probs_b,delta_order=0,delta_down=0){
#browser()
team_a<-runif(5,0,1)
team_b<-runif(5,0,1)
sya<-syb<-0
for(i in 1:5){
for(j in 1:2){
if(j==1){
if(sya<syb){
team_a[i]<-(1-delta_down)*team_a[i]
}
team_a[i]<-(1-(i-1)*delta_order)*team_a[i]
sya<-sya+(team_a[i]<probs_a[i])
}
else{
if(syb<sya){
team_b[i]<-(1-delta_down)*team_b[i]
}
team_b[i]<-(1-(i-1)*delta_order)*team_b[i]
syb<-syb+(team_b[i]<probs_b[i])
}
}
}
if(sya>syb){
return(1)
}
else if(sya<syb){
return(2)
}
else {
return(0)
}
}
library(doMC)
registerDoMC(8)
probs_a<-seq(.6,.8,length.out=5)
probs_b<-probs_a[5:1]
nsim<-20000
results<-foreach(icount(nsim), .combine=c) %dopar% {
return(who_wins(probs_a,probs_b))
}
The problem is that a couple of seconds after the first worker starts, the engine tries to launch the remaining. I see an spike in all processors, but they all die quickly, even the first one. Then, a new process is launched and the remaining of the code is run through this lone worker.
I have tried with different pieces of code and the engine works perfectly. But with this specific rutine, it doesn't.
Can anybody tell me what is happening? Thanks in advance.
Adding a Sys.sleep(0.01) inside your loop, I see all 8 processes “busy” with that one. After they are done, the main process remains busy for some time. I assume that the overhead of collecting the data from the individual processes and combining it into a single result is on a similar scale than the actual benefit from the parallelized computation. If you simply change the “computation” to return(1), you will see that this takes about as long as your computation, so the time is not spent on the workload but assembling the result.
Neither .inorder=FALSE nor use of doParallel instead of doMC change this. However, I would consider this a problem in the foreach package, as mclapply has significantly less overhead:
result <- unlist(mclapply(1:nsim, function(i) {
return(who_wins(probs_a, probs_b))
}, mc.cores=8))

Resources