checking status of individual node of cluster created with parallel::makeCluster - r

Let's say I've created a cluster using: cl <- parallel::makeCluster(2) and I send a call to the first node using parallel:::sendCall(cl[[1]], f, arg).
I want to get the results of a specific node (in this case the first node). I can do that using parallel:::recvResult(cl[[1]]). However, this process blocks until a result is received. Is there any way to check the status of of a specific node? I.e. a status like "is processing" or "is finished".

I'd recommend using the standard socketSelect function. For example:
library(parallel)
cl <- makePSOCKcluster(3, outfile="")
# Send task to worker 1
x <- 2
parallel:::sendCall(cl[[1]], sqrt, list(x), tag=1)
# Wait up to 5 seconds for worker 1 to send the result back
ready <- socketSelect(list(cl[[1]]$con), timeout=5)
if (ready > 0) {
result <- parallel:::recvData(cl[[1]])
cat(sprintf("sqrt(%f) = %f\n", x, result$value))
} else {
cat("result not ready after five seconds\n")
}
See the source for the recvOneData.SOCKcluster function in the file snowSOCK.R for a more complete example.

Related

scheduled cores ... did not deliver results, all values of the jobs will be affected in parallel::mclapply() in R 4.0.1

I'm using parallel::mclapply() with R 4.0.1 and getting the following warning: "scheduled cores ... did not deliver results, all values of the jobs will be affected".
Here the result of my investigation: inspecting the function source code, I realized that it happens when the vector dr is not all TRUE. This means that for some cores the second condition inside the for loop below (is.raw(a)) is never executed. a is the value returned by readChild(), that if returned raw data at least once, the condition would be verified at least once. So I'm thinking that readChild() is returning NULL.
readChild and readChildren return a raw vector with a "pid" attribute if data were available, an integer vector of length one with the process ID if a child terminated or NULL if the child no longer exists (no children at all for readChildren).
I ask you to validate or reject my conclusions. Finally if true what are the possible reasons?
while (!all(fin)) {
s <- selectChildren(ac[!fin], -1)
if (is.null(s)) break # no children -> no hope we get anything (should not happen)
if (is.integer(s))
for (ch in s) {
a <- readChild(ch)
if (is.integer(a)) {
core <- which(cp == a)
fin[core] <- TRUE
} else if (is.raw(a)) {
core <- which(cp == attr(a, "pid"))
job.res[[core]] <- ijr <- unserialize(a)
if (inherits(ijr, "try-error"))
has.errors <- c(has.errors, core)
dr[core] <- TRUE
} else if (is.null(a)) {
# the child no longer exists (should not happen)
core <- which(cp == ch)
fin[core] <- TRUE
}
}
}
This error message can occur when the child process dies/crashes, e.g.
> y <- parallel::mclapply(1:2, FUN = function(x) if (x == 1) quit("no") else x)
Warning message:
In parallel::mclapply(1:2, FUN = function(x) if (x == 1) quit("no") else x) :
scheduled core 1 did not deliver a result, all values of the job will be affected
> str(y)
List of 2
$ : NULL
$ : int 2
That a child process completely dies is of course not good. It can happen for several reasons. My best guess is that you parallelize something that must not be parallelized. Forked processing (=mclapply()) is known to be unstable with code that multi-thread, among other things.
For what's it worth, if you use the future framework instead (disclaimer: I'm the author), you'll get an error message that is a bit more informative, e.g.
> library(future.apply)
> plan(multicore)
> y <- future_lapply(1:2, FUN = function(x) if (x == 1) quit("no") else x)
Error: Failed to retrieve the result of MulticoreFuture (future_lapply-1) from
the forked worker (on localhost; PID 19959). Post-mortem diagnostic: No process
exists with this PID, i.e. the forked localhost worker is no longer alive.

Recursion error in R (Fibonacci sequence)

So I am trying to learn R on my own and am just working through the online tutorial. I am trying to code a recursive function that prints the first n terms of the Fibonacci sequence and can't get the code to run without the error:
Error in if (nterms <= 0) { : missing value where TRUE/FALSE needed
My code does ask me for input before entering the if else statement either which I think is odd as well. Below is my code any help is appreciated.
#Define the fibonacci sequence
recurse_fibonacci <- function(n) {
# Define the initial two values of the sequence
if (n <= 1){
return(n)
} else {
# define the rest of the terms of the sequence using recursion
return(recurse_fibonacci(n-1) + recurse_fibonacci(n-2))
}
}
#Take input from the user
nterms = as.integer(readline(prompt="How many terms? "))
# check to see if the number of terms entered is valid
if(nterms <= 0) {
print("please enter a positive integer")
} else {
# This part actually calculates and displays the first n terms of the sequence
print("Fibonacci Sequence: ")
for(i in 0:(nterms - 1)){
print(recurse_fibonacci(i))
}
}
This is a problem of readline in non-interactive mode. readline does not wait for a keypress and immediately executes the next instruction. The solution below is the solution posted in this other SO post.
I post below a complete answer, with the Fibonnaci numbers function a bit modified.
recurse_fibonacci <- function(n) {
# Define the initial two values of the sequence
if (n <= 1){
n
} else{
# define the rest of the terms of the sequence using recursion
Recall(n - 1) + Recall(n - 2)
}
}
#Take input from the user
cat("How many terms?\n")
repeat{
nterms <- scan("stdin", what = character(), n = 1)
if(nchar(nterms) > 0) break
}
nterms <- as.integer(nterms)
# check to see if the number of terms entered is valid
if(nterms <= 0) {
print("please enter a positive integer")
} else {
# This part actually calculates and displays the first n terms of the sequence
print("Fibonacci Sequence: ")
for(i in 0:(nterms - 1)){
print(recurse_fibonacci(i))
}
}
This code is the contents of file fib.R. Running in a Ubuntu 20.04 terminal gives
rui#rui:~$ Rscript fib.R
How many terms?
8
Read 1 item
[1] "Fibonacci Sequence: "
[1] 0
[1] 1
[1] 1
[1] 2
[1] 3
[1] 5
[1] 8
[1] 13
rui#rui:~$
To make it work with Rscript replace
nterms = as.integer(readline(prompt="How many terms? "))
with
cat ("How many terms?")
nterms = as.integer (readLines ("stdin", n = 1))
Then you can run it as Rscript fib.R, assuming that the code is in the file fib.R in the current working directory.
Otherwise, execute it with source ("fib.R") within an R shell.
Rscript does not operate in interactive mode and does not expect any input from the terminal. Check what interactive () returns in both the cases. Rscript will return FALSE as it is non-interactive, but the same function when run within an R shell (with source ()) it will be true.
?readline mentions that it cannot be used in non-interactive mode. Whereas readLines explicitely connect to stdin.
The code works fine but you shouldn't enter it into the terminal as is. My suggestion: put the code into a script file (ending .R) and source it (get help about it with ?source but it's actually pretty straightforward).
In R-Studio you can simply hit the source button.

R - Waiting for a list of promises to resolve

I am receiving some values in my R process and I want to compute them asynchronously. I am using promises and future package.
This is how my current code looks like:
arr = list()
i=0
while(i < 10)
{
a = read messages from KAFKA topic
arr[[i]] = future(DoSomething(a))
i = i + 1
}
Now, arr contains a list of promises
How do I get value() of the promise that has resolved first (and so on)?
Something like promise.race in javascript.
Edit: I just re-read your question and saw that you were asking about getting the first result, not just all results. Below is the code for getting that. It's a while loop that waits for any result to be ready and then moves forward when there's a result ready.
There is also a function called promise_race in the promises package, but the issue with the promises package is that it can only output results. You can't get the value produced back into a variable for further computations in the main thread.
require(future)
plan(multiprocess)
longRunningFunction <- function(value) {
random1<- runif(min= 5 ,max = 30,n = 1)
Sys.sleep(random1)
return(value)
}
arr = list()
#changed starting number to 1 since R lists start at 1, not 0
i=1
#If the number of futures generated is more than the number of cores available, then the main thread will block until the first future completes and allows more futures to be started
while(i < 6)
{
arr[[i]] = future(longRunningFunction(i))
i = i + 1
}
while(all(!resolved(arr))){ }
raceresults_from_future<-lapply(arr[resolved(arr)], value)
print(paste("raceresults_from_future: ",raceresults_from_future) )

R: Let users use the console while inside a function, record their inputs and react

Is it possible to write a function in R which will hold its execution, giving the users control over the console (while in interactive mode of course), meanwhile recording their inputs, and continuing execution either:
after a certain input has been made
or after a certain output has been made
or a certain duration of time has passed
Example: ask the user a question (without using readline() for the answer)
question <- function() {
message("How much is 2 + 2?")
#let users take control of the console
#continue to next statement only if they input "2+2", or "4" or a minute has passed
#meanwhile record their last input similar to ".Last.Value", e.g.:
startTime <- Sys.time()
timeout <- FALSE
lastInput <- lastInput()
while (eval(parse(text = lastInput)) != 4 & !timeout) {
if (difftime(Sys.time(), startTime, units = "mins") > 1) {
timeout <- TRUE
}
lastInput <- lastInput()
}
if (timeout) {
stop("Sorry, timeout.")
} else {
message("Correct! Let's continue with this function:")
}
}
Where lastInput() is a function which "listens" to user input when it changes.
Obviously the above structure is tentative and won't give me what I want, some way to "listen" or "observe" and only react when the user inputs something to the console.
The final user experience should be:
> question()
How much is 2+2?
> #I'm the user, I can do whatever
> head(mtcars)
> plot(1:10)
> 3
> 2 + 2
[1] 4
Correct! Let's continue with this function:
Am I too optimistic or is there some R magic for this?
Thanks to #parth I have looked at swirl's source code and got acquainted with the addTaskCallback function. From the help file:
addTaskCallback registers an R function that is to be called each time a top-level task is completed.
And so we can make R check the users input ("top-level task") with a specific function, responding accordingly.
But since the swirl code is very "heavy", I think I need to supply a minimal example:
swirllike <- function(...){
removeTaskCallback("swirllike")
e <- new.env(globalenv())
e$prompt <- TRUE
e$startTime <- Sys.time()
cb <- function(expr, val, ok, vis, data=e){
e$expr <- expr
e$val <- val
e$ok <- ok
e$vis <- vis
# The result of f() will determine whether the callback
# remains active
return(f(e, ...))
}
addTaskCallback(cb, name = "swirllike")
message("How much is 2+2?")
}
OK, so the swirllike function evokes the 2+2 question, but it also declares a new environment e with some objects the user needs not know. It then adds the swirllike task callback to the task callback list (or rather vector). This "task callback" holds the cb function which calls the f function - the f function will run with every input.
If you run this, make sure you see the swirllike task callback with:
> getTaskCallbackNames()
[1] "swirllike"
Now the f function is similar to my sketch in the question:
f <- function(e, ...){
if (e$prompt) {
if (difftime(Sys.time(), e$startTime, units = "mins") > 1) {
timeout <- TRUE
stop("Sorry, timeout.")
}
if(!is.null(.Last.value) && .Last.value == 4) {
message("Correct! Let's continue with this function:")
e$prompt <- FALSE
while (!e$prompt) {
#continue asking questions or something, but for this example:
break
}
}
}
return(TRUE)
}
And don't forget to remove the swirllike task callback with:
removeTaskCallback("swirllike")

How to manually add a task to a redis queue for a redisWorker?

Is it possible to manually add a task to the redis queue, so that it can be executed by a redis worker?
As a simple example, I'm launching a worker using:
require('doRedis')
redisWorker('jobs')
In another R session, I'm creating a queue and would like to send a simple expression (for example: print("hello world")) to the queue so that it is executed by the worker.
I know how to do it using foreach:
require('doRedis')
registerDoRedis('jobs')
foreach(j=1,.combine=sum,.multicombine=TRUE) %dopar% {
print("hello world")
1
}
I would like to be able to add a task to the queue without using foreach. The reason is, I do not want to have my R session wait for the output (the script would write its results to the disk).
Here is what I have tried so far, based on code in the .doRedis() function:
data <- list(queue = "jobs")
queue <- data$queue
queueCounter <- sprintf("%s:counter", queue) # job task ID counter
ID <- redisIncr(queueCounter)
queueEnv <- sprintf("%s:%.0f.env",queue,ID) # R job environment
queueTasks <- sprintf("%s:%.0f",queue,ID) # Job tasks hash
queueResults <- sprintf("%s:%.0f.results",queue,ID) # Output values
queueStart <- sprintf("%s:%.0f.start*",queue,ID)
queueAlive <- sprintf("%s:%.0f.alive*",queue,ID)
# add the environment to the queue
redisSet(key = queueEnv,
value = list(expr=expression(),
exportenv=baseenv(),
packages=NULL)
# put tasks in queue
taskblock <- list(ex1 <- expression('print("hello world")'))
j <- 1
taskLabel <- I
task_id = as.character(taskLabel(j))
task <- list(task_id=task_id, args=taskblock)
redisHSet(key = queueTasks,
field = task_id,
value = task)
redisRPush(key = queue, value = ID)
It doesn't work, and I think (at least) the definition of the environment is wrong...
Any help is very welcome !

Resources