stop a running mcparallel job prematurely - r

I have three tasks:
is disk I/O bound
is network I/O bound
is CPU bound on a remote machine
The result of 3 will tell me whether the answer I want will come from task 1 or task 2. Since each task requires separate resources, I'd like to start all three tasks with mcparallel, then wait on the result from the third task and determine whether to terminate task 1 or task 2. However, I can not determine how to prematurely cancel an mcparallel task from within R. Is it safe to just kill the PID of the forked process from a call to system()? If not, is there a better way to cancel the unneeded computation?

I don't think the parallel package supports an official way to kill a process started via mcparallel, but my guess is that it's safe to do, and you can use the pskill function from the tools package to do it. Here's an example:
library(parallel)
library(tools)
fun1 <- function() {Sys.sleep(20); 1}
fun2 <- function() {Sys.sleep(20); 2}
fun3 <- function() {Sys.sleep(5); sample(2, 1)}
f1 <- mcparallel(fun1())
f2 <- mcparallel(fun2())
f3 <- mcparallel(fun3())
r <- mccollect(f3)
if (r[[1]] == 1) {
cat('killing fun1...\n')
pskill(f1$pid)
print(mccollect(f1))
r <- mccollect(f2)
} else {
cat('killing fun2...\n')
pskill(f2$pid)
print(mccollect(f2))
r <- mccollect(f1)
}
print(r)
It's usually dangerous to randomly kill threads within a multi-threaded application because they might be holding a shared lock of some kind, but these of course are processes, and the master process seems to handle the situation just fine.

current version of parallel::mccollect() supports wait argument.
Simply pass FALSE to quit any running jobs prematurely.
> mccollect(wait = FALSE)

Related

Manually interrupt a loop in R and continue below

I have a loop in R that does very time-consuming calculations. I can set a max-iterations variable to make sure it doesn't continue forever (e.g. if it is not converging), and gracefully return meaningful output.
But sometimes the iterations could be stopped way before max-iterations is reached. Anyone who has an idea about how to give the user the opportunity to interrupt a loop - without having to wait for user input after each iteration? Preferably something that works in RStudio on all platforms.
I cannot find a function that listens for keystrokes or similar user input without halting until something is done by the user. Another solution would be to listen for a global variable change. But I don't see how I could change such a variable value when a script is running.
The best idea I can come up with is to make another script that creates a file that the first script checks for the existence of, and then breaks if it is there. But that is indeed an ugly hack.
Inspired by Edo's reply, here is an example of what I want to do:
test.it<-function(t) {
a <- 0
for(i in 1:10){
a <- a + 1
Sys.sleep(t)
}
print(a)
}
test.it(1)
As you see, when I interrupt by hitting the read button in RStudio, I break out of the whole function, not just the loop.
Also inspired by Edo's response I discovered the withRestarts function, but I don't think it catches interrupts.
I tried to create a loop as you described it.
a <- 0
for(i in 1:10){
a <- a + 1
Sys.sleep(1)
if(i == 5) break
}
print(a)
If you let it go till the end, a will be equal to 5, because of the break.
If you stop it manually by clicking on the STOP SIGN on the Rstudio Console, you get a lower number.
So it actually works as you would like.
If you want a better answer, you should post a reproducible example of your code.
EDIT
Based on the edit you posted... Try with this.
It's a trycatch solution that returns the last available a value
test_it <- function(t) {
a <- 0
tryCatch(
for(i in 1:10){
a <- a + 1
message("I'm at ", i)
Sys.sleep(t)
if(i==5) break
},
interrupt = function(e){a}
)
a
}
test_it(1)
If you stop it by clicking the Stop Sign, it returns the last value a is equal to.

Erlang: Make a ring

I'm quite new to Erlang (Reading through "Software for a Concurrent World"). From what I've read, we link two processes together to form a reliable system.
But if we need more than two process, I think we should connect them in a ring. Although this is slightly tangential to my actual question, please let me know if this is incorrect.
Given a list of PIDs:
[1,2,3,4,5]
I want to form these in a ring of {My_Pid, Linked_Pid} tuples:
[{1,2},{2,3},{3,4},{4,5},{5,1}]
I have trouble creating an elegant solution that adds the final {5,1} tuple.
Here is my attempt:
% linkedPairs takes [1,2,3] and returns [{1,2},{2,3}]
linkedPairs([]) -> [];
linkedPairs([_]) -> [];
linkedPairs([X1,X2|Xs]) -> [{X1, X2} | linkedPairs([X2|Xs])].
% joinLinks takes [{1,2},{2,3}] and returns [{1,2},{2,3},{3,1}]
joinLinks([{A, _}|_]=P) ->
{X, Y} = lists:last(P)
P ++ [{Y, A}].
% makeRing takes [1,2,3] and returns [{1,2},{2,3},{3,1}]
makeRing(PIDs) -> joinLinks(linkedPairs(PIDs)).
I cringe when looking at my joinLinks function - list:last is slow (I think), and it doesn't look very "functional".
Is there a better, more idiomatic solution to this?
If other functional programmers (non-Erlang) stumble upon this, please post your solution - the concepts are the same.
Use lists:zip with the original list and its 'rotated' version:
1> L=[1,2,3].
[1,2,3]
2> lists:zip(L, tl(L) ++ [hd(L)]).
[{1,2},{2,3},{3,1}]
If you are manipulating long lists, you can avoid the creation of the intermediary list tl(L) ++ [hd(L)] using an helper function:
1> L = lists:seq(1,5).
[1,2,3,4,5]
2> Link = fun Link([Last],First,Acc) -> lists:reverse([{Last,First}|Acc]);
Link([X|T],First,Acc) -> Link(T,First,[{X,hd(T)}|Acc]) end.
#Fun<erl_eval.42.127694169>
3> Joinlinks = fun(List) -> Link(List,hd(List),[]) end.
#Fun<erl_eval.6.127694169>
4> Joinlinks(L).
[{1,2},{2,3},{3,4},{4,5},{5,1}]
5>
But if we need more than two process, I think we should connect them
in a ring.
No. For instance, suppose you want to download the text of 10 different web pages. Instead of sending a request, then waiting for the server to respond, then sending the next request, etc., you can spawn a separate process for each request. Each spawned process only needs the pid of the main process, and the main process collects the results as they come in. When a spawned process gets a reply from a server, the spawned process sends a message to the main process with the results, then terminates. The spawned processes have no reason to send messages to each other. No ring.
I would guess that it is unlikely that you will ever create a ring of processes in your erlang career.
I have trouble creating an elegant solution that adds the final {5,1} tuple.
You can create the four other processes passing them self(), which will be different for each spawned process. Then, you can create a separate branch of your create_ring() function that terminates the recursion and returns the pid of the last created process to the main process:
init(N) ->
LastPid = create_ring(....),
create_ring(0, PrevPid) -> PrevPid;
create_ring(N, PrevPid) when N > 0 ->
Pid = spawn(?MODULE, loop, [PrevPid]),
create_ring(.......).
Then, the main process can call (not spawn) the same function that is being spawned by the other processes, passing the function the last pid that was returned by the create_ring() function:
init(N) ->
LastPid = create_ring(...),
loop(LastPid).
As a result, the main process will enter into the same message loop as the other processes, and the main process will have the last pid stored in the loop parameter variable to send messages to.
In erlang, you will often find that while you are defining a function, you won't be able to do everything that you want in that function, so you need to call another function to do whatever it is that is giving you trouble, and if in the second function you find you can't do everything you need to do, then you need to call another function, etc. Applied to the ring problem above, I found that init() couldn't do everything I wanted in one function, so I defined the create_ring() function to handle part of the problem.

R - get worker name when running in parallel

I am running a function in parallel. In order to get progress updates on the state of the work, I would like one but only one worker to report periodically on its progress. My natural thought for how to do this would be to have the function that the workers execute check the name of the worker, and only give the status update if the name matches a particular value. But, I can't find a reliable way to determine this in advance. In Julia for instance, there is a simple myid() function that will give a worker's ID (i.e. 1, 2, etc.). I am looking for something equivalent in R. The best that I've found so far is to have each worker call Sys.getpid(). But, I don't know a reliable way to write my script so that I'll know in advance what one of the pids that gets assigned to a worker would be. The basic functionality script that I'm looking to write looks like the below, with the exception that I'm looking for R's equivalent to the myid() function:
library(parallel)
Test_Fun = function(a){
for (idx in 1:10){
Sys.sleep(1)
if (myid() == 1){
print(idx)
}
}
}
mclapply(1:4, Test_Fun, mc.cores = 4)
The parallel package doesn't provide a worker ID function as of R 3.3.2. There also isn't a mechanism provided to initialize the workers before they start to execute tasks.
I suggest that you pass an additional task ID argument to the worker function by using the mcmapply function. If the number of tasks is equal to the number of workers, the task ID can be used as a worker ID. For example:
library(parallel)
Test_Fun = function(a, taskid){
for (idx in 1:10){
Sys.sleep(1)
if (taskid == 1){
print(idx)
}
}
}
mcmapply(Test_Fun, 1:4, 1:4, mc.cores = 4)
But if there are more tasks than workers, you'll only see the progress messages for the first task. You can work around that by initializing each of the workers when they execute their first task:
WORKERID <- NA # indicates worker is uninitialized
Test_Fun = function(a, taskid){
if (is.na(WORKERID)) WORKERID <<- taskid
for (idx in 1:10){
Sys.sleep(1)
if (WORKERID == 1){
print(idx)
}
}
}
cores <- 4
mcmapply(Test_Fun, 1:8, 1:cores, mc.cores = cores)
Note that this assumes that mc.preschedule is TRUE, which is the default. If mc.preschedule is FALSE and the number of tasks is greater than the number of workers, the situation is much more dynamic because each task is executed by a different worker process and the workers don't all execute concurrently.

How to run external program from Julia and wait until it finishes, then read its output

I'm trying to execute external program from Julia via run, then wait until it finishes and store its output into a variable.
The only solution I came up with is this:
callback = function(data)
print(data)
end
open(`minizinc com.mzn com.dzn`) do f
x = readall(f)
callback(x)
end
The problem is that I do not want to use callbacks.
Is there any way, how to wait until the process is finished and then continue in executing?
Thanks in advance
You can just call readall (or readstring on Julia master) on the command object:
julia> readall(`echo Hello`)
"Hello\n"

Detecting keystrokes in Julia

I have a piece of code in Julia in which a solver iterates many, many times as it seeks a solution to a very complex problem. At present, I have to provide a number of iterations for the code to do, set low enough that I don't have to wait hours for the code to halt in order to save the current state, but high enough that I don't have to keep activating the code every 5 minutes.
Is there a way, with the current state of Julia (0.2), to detect a keystroke instructing the code to either end without saving (in case of problems) or end with saving? I require a method such that the code will continue unimpeded unless such a keystroke event has happened, and that will interrupt on any iteration.
Essentially, I'm looking for a command that will read in a keystroke if a keystroke has occurred (while the terminal that Julia is running in has focus), and run certain code if the keystroke was a specific key. Is this possible?
Note: I'm running julia via xfce4-terminal on Xubuntu, in case that affects the required command.
You can you an asynchronous task to read from STDIN, blocking until something is available to read. In your main computation task, when you are ready to check for input, you can call yield() to lend a few cycles to the read task, and check a global to see if anything was read. For example:
input = ""
#async while true
global input = readavailable(STDIN)
end
for i = 1:10^6 # some long-running computation
if isempty(input)
yield()
else
println("GOT INPUT: ", input)
global input = ""
end
# do some other work here
end
Note that, since this is cooperative multithreading, there are no race conditions.
You may be able to achieve this by sending an interrupt (Ctrl+C). This should work from the REPL without any changes to your code – if you want to implement saving you'll have to handle the resulting InterruptException and prompt the user.
I had some trouble with the answer from steven-g-johnson, and ended up using a Channel to communicate between tasks:
function kbtest()
# allow 'q' pressed on the keyboard to break the loop
quitChannel = Channel(10)
#async while true
kb_input = readline(stdin)
if contains(lowercase(kb_input), "q")
put!(quitChannel, 1)
break
end
end
start_time = time()
while (time() - start_time) < 10
if isready(quitChannel)
break
end
println("in loop # $(time() - start_time)")
sleep(1)
end
println("out of loop # $(time() - start_time)")
end
This requires pressing and then , which works well for my needs.

Resources