Execution of future package in R results in an endless waiting time - r

i have a Problem with the future package. In my Task I try to set up an asynchronous process. I am doing this using Futures. If I run my script for the first time (in a clean RSession) everything is working fine and as expected. Running the same function for the second time, within the same R Session, Ends up in an endless waiting time. The execution stops in the line, where the Futures are started. No Errors are thrown. The Code just runs forever. If I interrupt the Code manually, the browser is called from the line:
Sys.sleep(interval).
Doing this a little bit earlier the call is made from:
Called from: socketSelect(list(con), write = FALSE, timeout = timeout).
I have written a small program, which has basically the same structure as my script and the same Problem occurs. While not obvious in this little example, this structure has some advantage in my original code:
library(future)
library(parallel)
asynchronousfunction <- function(){
Threads.2.start <- availableCores()
cl <- parallel::makePSOCKcluster(Threads.2.start)
plan(cluster, workers = cl)
threads <- lapply(1:Threads.2.start, function(index){
future::cluster({Sys.getpid()},persistent = TRUE, workers = cl[[index]])
})
while(!any(resolved(threads))){
Sys.sleep(0.1)
}
threads <- lapply(1:Threads.2.start, function(index){
future::cluster({Sys.getpid()},persistent = TRUE, workers = cl[[index]])
})
stopCluster(cl = cl)
}
asynchronousfunction() # First call to the function. Everything is working fine.
asynchronousfunction() #Second call to the function. Endless Execution.
I am working on Windows 10 and the R Version is 3.4.2. The package Version is 1.6.2.
I hope you Guys can help me.
Thanks in advance.
Best Regards,
Harvard

Author future here. It looks a like you've tried to overdo it a bit and I am not 100% sure what you're trying to achieve. Things that looks suspicious to me is your use of:
cluster() - call future() instead.
cluster(..., workers = cl[[index]]) - don't specify workers when you set up a future.
Is there a reason why you want to use persistent = TRUE?
resolve(threads) basically does the same as your while() loop.
You are not collecting the values of the futures, i.e. you're not calling value() or values().
For troubleshooting, you can get more details on what's going on under the hood by setting option(future.debug = TRUE).
If I'd rewrite your example as close to what you have now, a working example would look like this:
library("future")
asynchronousfunction <- function() {
n <- availableCores()
cl <- makeClusterPSOCK(n)
plan(cluster, workers = cl)
fs <- lapply(1:n, function(index) {
future({ Sys.getpid() }, persistent = TRUE)
})
## Can be replaced by resolve(fs)
while(!any(resolved(fs))) {
Sys.sleep(0.1)
}
fs <- lapply(1:n, function(index) {
future({ Sys.getpid() }, persistent = TRUE)
})
parallel::stopCluster(cl = cl)
}
Instead of rolling your own lapply() + future(), would it be sufficient for you to use future_lapply()? For example,
asynchronousfunction <- function() {
n <- availableCores()
cl <- makeClusterPSOCK(n)
plan(cluster, workers = cl)
pids <- future_lapply(1:n, function(ii) {
Sys.getpid()
})
str(pids)
pids <- future_lapply(1:n, function(ii) {
Sys.getpid()
})
str(pids)
parallel::stopCluster(cl = cl)
}

Related

Communicate progress from future or from the promise back to the master process?

Say I have this:
library(future)
plan(multisession)
promise = promises::future_promise({
for (i in 1:100) {
Sys.sleep(1)
## SOMEHOW REPORT PROGRESS TO MAIN HERE ##
}
i
})
In the end, it will resolve the promise with a value of 100, but how can I report each i value while the promise's worker is still running? What's the easiest way to do that?
I'm on windows 10, using Rstudio.
Not sure how to do this with promise, but an alternative could be to use future.apply and progressr:
library(future.apply)
library(progressr)
plan(multisession)
handlers(global = TRUE)
handlers("progress")
my_fcn <- function(xs) {
p <- progressor(along = xs)
y <- future_lapply(xs, function(x, ...) {
Sys.sleep(1)
p()
})
}
my_fcn(1:100)
I've found a proper yet simple IPC solution that made for futures to communicate arbitrary messages between child and master processes without being limited to progressr and future_lapply:
http://htmlpreview.github.io/?https://github.com/fellstat/ipc/blob/master/inst/doc/shinymp.html

How to redo tryCatch after error in for loop

I am trying to implement tryCatch in a for loop.
The loop is built to download data from a remote server. Sometimes the server no more responds (when the query is big).
I have implemented tryCatch in order to make the loop keeping.
I also have added a sys.sleep() pause if an error occurs in order to wait some minutes before sending next query to the remote server (it works).
The problem is that I don't figure out how to ask the loop to redo the query that failed and lead to a tryCatch error (and to sys.sleep()).
for(i in 1:1000){
tmp <- tryCatch({download_data(list$tool[i])},
error = function(e) {Sys.sleep(800)})
}
Could you give me some hints?
You can do something like this:
for(i in 1:1000){
download_finished <- FALSE
while(!download_finished) {
tmp <- tryCatch({
download_data(list$tool[i])
download_finished <- TRUE
},
error = function(e) {Sys.sleep(800)})
}
}
If you are certain that waiting for 800 seconds always fixes the issue this change should do it.
for(i in 1:1000) {
tmp <- tryCatch({
download_data(list$tool[i])
},
error = function(e) {
Sys.sleep(800)
download_data(list$tool[i])
})
}
A more sophisticated approach could be, to collect the information of which request failed and then rerun the script until all requests succeed.
One way to do this is to use the possibly() function from the purrr package. It would look something like this:
todo <- rep(TRUE, length(list$tool))
res <- list()
while (any(todo)) {
res[todo] <- map(list$tool[todo],
possibly(download_data, otherwise = NA))
todo <- map_lgl(res, ~ is.na(.))
}

Asynchronous programming in R

Overview
I am writing a program (in R) that makes API calls at certain designated times. The API calls take a while, but I need the timer (main loop) to continue counting while the API call is made. To do so, I need to "outsource" the API call to another CPU thread. I believe this is possible and have looked into the future and promises packages, but haven't found a solution yet.
Reproducible Example
Let's run a for loop that counts from 0 to 100. When the counter (i) gets to 50, it has to complete a resource-intensive process (calling the function sampler, which samples 1 million normal distributions 10,000 times for the sake of taking up computation space). The desire is for the counter to continue counting while sampler() is doing its work on another thread.
#Something to take up computation space
sampler <- function(){
for(s in 1:10000) sample(1000000)
}
#Get this counter to continue while sampler() runs on another thread
for(i in 1:100){
message(i)
if(i == 50){
sampler()
}
}
What I have tried (unsuccessfully)
library(future)
sampler <- function(){
for(s in 1:10000) sample(1000000)
}
for(i in 1:100){
message(i)
if(i == 50){
mySamples <- future({ sampler() }) %plan% multiprocess
}
}
It seems to me your call is only blocking while the workers are created, but not for the duration of the actual work. E.g. if do the plan() first, the counter will not block:
library(future)
sampler <- function(){
for(s in 1:10000) sample(1000000)
}
plan(multiprocess)
for(i in 1:100){
message(i)
if(i == 50){
mySamples <- future({ sampler() })
}
}
Also note, that the runtime of sampler() is much longer than the duration of the blocking call in your code and that, after executing your code, mySamples still has the status resolved: FALSE and CPU usage is still high.

registerDoParallel in doParallel using Shiny

Below is some pseudo code for a general question I think can be answered without a reproducible code. If I'm wrong and the reproducible code is needed, I will update my post.
I have a Shiny app that makes use of the dopar function from the doParallel package in R. In the server portion of my code, I have the first reactive expression that has a dependency on the action button "runScoring". When it executes, it runs registerDoParallel to register the parallel back end, as required.
Once the process generating a firstObject is done, it passes that result to a secondObject ( which depends on the same action button as the firstObject ) and then again runs the dopar function within the second reactive expression.
However, I only register the parallel back end within the first reactive expression and do not register it again within the second.
Is it necessary to place registerDoParallel within the second reactive expression in addition to having it within the first?
firstObject <- eventReactive(input$runScoring, {
registerDoParallel(cores = input$cores4Scoring)
itx1 <- iter(data1)
itx2 <- iter(data2)
result2 <- foreach(i = itx1, j = itx2, .export = funs) %dopar% {
do stuff ...
}
})
secondObject <- eventReactive(input$runScoring, {
registerDoParallel(cores = input$cores4Scoring)
itx1 <- iter(firstObject())
result <- foreach(i = itx1 .export = funs) %dopar% {
do stuff ...
}
})

Skipping slow tasks in a loop in R

I'm trying to run a simulation in R where I make a whole bunch of phylogenetic trees. The tree simulation is a bit of a problem because its run time is highly variable, sometimes 0.005 seconds and sometimes minutes. I want to avoid the slow trees so I'm trying to use evalWithTimeout to skip them. So far I'm having problems because I can't make it kill slow tasks without also killing the loop.
My problem is similar to this question but the solution to that question hasn't helped me.
library(TreeSim)
library(R.utils)
for (i in 1:100){
tryCatch(
expr = {
evalWithTimeout(sim.rateshift.taxa(10,1,c(0.5,2),c(0.5,0),
c(1,1),c(0,0.5),complete=F),
timeout=0.005)
},
TimeoutException = function(ex) cat("Timeout. Skipping.\n")
)
print(i)
}
This is what I have so far. I would like to it continue printing "i" regardless of whether the simulation goes over the time limit, but currently it gives me "reached CPU time limit" errors and stops.
Using https://www.rdocumentation.org/packages/R.utils/versions/2.5.0/topics/withTimeout as a source. Here's a test unit that works as expected.
foo = function() {
print("Tic");
x <- ceiling(runif(1) * 100)+1;
for (kk in 1:x) {
print(kk);
Sys.sleep(runif(1));
}
print("Tac");
}
bar = function() {
for (i in 1:100) {
tryCatch({
res <- withTimeout({
foo();
}, timeout=1.08);
}, TimeoutException=function(ex) {
cat("Timeout. Skipping.\n");
});
print(i);
}
}
So the question is, is there an error that is thrown by interrupting by sim.rateshift.taxa that is not being caught, use error as thc mentions in order to catch that but use the TimeoutException to skip over proper timeouts
There is also a problem with setting too low a time limit:
https://github.com/mhahsler/arules/issues/22
You may want to simply use setTimeLimit yourself and ensure that transient is set to TRUE that way you have finer control.
Here's an example taken from http://blog.revolutionanalytics.com/2014/10/r-in-production-controlling-runtime.html
system.time(Sys.sleep(5))
##user system elapsed
## 0.000 0.000 5.005
system.time(local({
setTimeLimit(elapsed = 1, transient = TRUE)
Sys.sleep(5)
}))
## Error in Sys.sleep(5): reached elapsed time limit
## Timing stopped at: 0 0 5.006
Try this:
library(TreeSim)
library(R.utils)
for (i in 1:100){
tryCatch(
expr = {
evalWithTimeout(sim.rateshift.taxa(10,1,c(0.5,2),c(0.5,0),
c(1,1),c(0,0.5),complete=F), timeout=0.005)
}, error = function(ex) cat("Timeout. Skipping.\n"))
print(i)
}
As #AhmedMasud mentions in the comment, the function is throwing a different error. So using error = ... catches any other issue as well.

Resources