Python crashes when I put process.join in script when using multiprocessing - python-3.4

I have been researching multiprocessing and came upon an example of it on a website. However, when I try to run that example on my MacBook retina, nothing happens. The following was the example:
import random
import multiprocessing
def list_append(count, id, out_list):
"""
Creates an empty list and then appends a
random number to the list 'count' number
of times. A CPU-heavy operation!
"""
for i in range(count):
out_list.append(random.random())
if __name__ == "__main__":
size = 10000000 # Number of random numbers to add
procs = 2 # Number of processes to create
# Create a list of jobs and then iterate through
# the number of processes appending each process to
# the job list
jobs = []
for i in range(0, procs):
out_list = list()
process = multiprocessing.Process(target=list_append,
args=(size, i, out_list))
jobs.append(process)
# Start the processes (i.e. calculate the random number lists)
for j in jobs:
j.start()
# Ensure all of the processes have finished
for j in jobs:
j.join()
print ("List processing complete.")
As it turns out after I put a print statement in the 'list_append' function, nothing printed, so the problem is actually not the j.join() but rather the j.start() bit.

When you create a process with multiprocessing.Process, you prepare a sub-function to be run in a different process asynchronously. The computation starts when you call the start method. The join method waits for the computation to be done. So if you just start the process and do not wait for the to complete (or join) nothing will happen as the process will be killed when your program exit.
Here, one issue is that you are not using an object sharable in multiprocessing. When you use a common list(), each process will use a different list in memory. The local process will be cleared when the processes exit and the main list will be empty. If you want to be able to exchange processes between data you should use a multiprocessing.Queue:
import random
import multiprocessing
def list_append(count, id, out_queue):
"""
Creates an empty list and then appends a
random number to the list 'count' number
of times. A CPU-heavy operation!
"""
for i in range(count):
out_queue.put((id, random.random()))
if __name__ == "__main__":
size = 10000 # Number of random numbers to add
procs = 2 # Number of processes to create
# Create a list of jobs and then iterate through
# the number of processes appending each process to
# the job list
jobs = []
q = multiprocessing.Queue()
for i in range(0, procs):
process = multiprocessing.Process(target=list_append,
args=(size, i, q))
process.start()
jobs.append(process)
result = []
for k in range(procs*size):
result += [q.get()]
# Wait for all the processes to finish
for j in jobs:
j.join()
print("List processing complete. {}".format(result))
Note that this code can hang quite easily if you do not compute correctly the number of results sent back in out_queue.
If you try to retrieve too many results, q.get will wait for an extra result that will never come. If you do not retrieve all the result from q, your processes will freeze as the out_queue will be full, and out_queue.put will not return. Your processes will thus never exit and you will not be able to join them.
If your computation are independent, I strongly advise to look at higher level tools like Pool or even more robust third party library like joblib as it will take care of these aspects for you. (see this answer for some insights on Process vs Pool/joblib)
I actually reduced the number size as the program become to slow if you try to put to many objects in a Queue. If you need to pass a lot of small object, try passing all of them in one batch:
import random
import multiprocessing
def list_append(count, id, out_queue):
a = [random.random() for i in range(count)]
out_queue.put((id, a))
if __name__ == "__main__":
size = 10000 # Number of random numbers to add
procs = 2 # Number of processes to create
jobs = []
q = multiprocessing.Queue()
for i in range(0, procs):
process = multiprocessing.Process(target=list_append,
args=(size, i, q))
process.start()
jobs.append(process)
result += [q.get() for _ in range(procs)]
for j in jobs:
j.join()
print("List processing complete.")

Related

How to output subsections in correct order from Jupyter notebooks when using concurrent execution?

My code is, simplified, something like:
import concurrent.futures
from IPython.display import display, display_markdown
def f(parameter):
display_markdown("## %s" % (column), raw=True)
# Do some processing
display(parameter)
# Do some more processing
display(parameter)
# Do even more processing
display(parameter)
with concurrent.futures.ThreadPoolExecutor() as executor:
for result in executor.map(f, range(5)):
pass # Intentionally ignore results
The problem with this is that, because the function f gets intentionally executed multiple times in parallel and the processing takes an individual amount of time, that the display_markdown and display calls are executed interleaved.
How can I ensure that the subsections/ output of each invocation of f are outputted together/ without interleaving with the others?
And, because the processing is taking some time, to see the intermediate results/ outputs while they are executed?
Logically somehow Jupyter has to maintain some cursor/ pointer for each invocation of f and insert its output at the memorized point, while further output already happened after it, instead of just outputting it at the end.

Process of executing calls from a recursion stack, how are the calls processed, what do they return and how does that matter?

I have heard that recursion calls take place from a stack. But the toughest thing, I find is how are they processed from the recursion stack? I have heard that the calls get stored sequentially inside a stack, the last call returns something to its previous call in the stack and the process continues. I could comprehend that while computing factorial of a number. But got stuck with the following code of converting a sorted array to a binary search tree.
Following is the code :
class Node:
def __init__(self, d):
self.data = d
self.left = None
self.right = None
# function to convert sorted array to a
# balanced BST
# input : sorted array of integers
# output: root node of balanced BST
def sortedArrayToBST(arr):
if not arr:
return None
# find middle
mid = (len(arr)) / 2
# make the middle element the root
root = Node(arr[mid])
# left subtree of root has all
# values <arr[mid]
root.left = sortedArrayToBST(arr[:mid])
# right subtree of root has all
# values >arr[mid]
root.right = sortedArrayToBST(arr[mid+1:])
return root
Now my example set is : [-10,-3,0,5,9]
I understand that this has to be recursive because of the same process taking place multiple times. But what baffles me is what's happening inside the stack? When the control encounters a leaf node and adds it as the left child(root.left = sortedArrayToBST(arr[:mid])), how does the call pop out from the stack or even return something to its immediate next such that the process continues smoothly?
Can anyone please illustrate using a stack and showing the push and pop tasks happening with every call? Thanks in advance.

Erlang: Make a ring

I'm quite new to Erlang (Reading through "Software for a Concurrent World"). From what I've read, we link two processes together to form a reliable system.
But if we need more than two process, I think we should connect them in a ring. Although this is slightly tangential to my actual question, please let me know if this is incorrect.
Given a list of PIDs:
[1,2,3,4,5]
I want to form these in a ring of {My_Pid, Linked_Pid} tuples:
[{1,2},{2,3},{3,4},{4,5},{5,1}]
I have trouble creating an elegant solution that adds the final {5,1} tuple.
Here is my attempt:
% linkedPairs takes [1,2,3] and returns [{1,2},{2,3}]
linkedPairs([]) -> [];
linkedPairs([_]) -> [];
linkedPairs([X1,X2|Xs]) -> [{X1, X2} | linkedPairs([X2|Xs])].
% joinLinks takes [{1,2},{2,3}] and returns [{1,2},{2,3},{3,1}]
joinLinks([{A, _}|_]=P) ->
{X, Y} = lists:last(P)
P ++ [{Y, A}].
% makeRing takes [1,2,3] and returns [{1,2},{2,3},{3,1}]
makeRing(PIDs) -> joinLinks(linkedPairs(PIDs)).
I cringe when looking at my joinLinks function - list:last is slow (I think), and it doesn't look very "functional".
Is there a better, more idiomatic solution to this?
If other functional programmers (non-Erlang) stumble upon this, please post your solution - the concepts are the same.
Use lists:zip with the original list and its 'rotated' version:
1> L=[1,2,3].
[1,2,3]
2> lists:zip(L, tl(L) ++ [hd(L)]).
[{1,2},{2,3},{3,1}]
If you are manipulating long lists, you can avoid the creation of the intermediary list tl(L) ++ [hd(L)] using an helper function:
1> L = lists:seq(1,5).
[1,2,3,4,5]
2> Link = fun Link([Last],First,Acc) -> lists:reverse([{Last,First}|Acc]);
Link([X|T],First,Acc) -> Link(T,First,[{X,hd(T)}|Acc]) end.
#Fun<erl_eval.42.127694169>
3> Joinlinks = fun(List) -> Link(List,hd(List),[]) end.
#Fun<erl_eval.6.127694169>
4> Joinlinks(L).
[{1,2},{2,3},{3,4},{4,5},{5,1}]
5>
But if we need more than two process, I think we should connect them
in a ring.
No. For instance, suppose you want to download the text of 10 different web pages. Instead of sending a request, then waiting for the server to respond, then sending the next request, etc., you can spawn a separate process for each request. Each spawned process only needs the pid of the main process, and the main process collects the results as they come in. When a spawned process gets a reply from a server, the spawned process sends a message to the main process with the results, then terminates. The spawned processes have no reason to send messages to each other. No ring.
I would guess that it is unlikely that you will ever create a ring of processes in your erlang career.
I have trouble creating an elegant solution that adds the final {5,1} tuple.
You can create the four other processes passing them self(), which will be different for each spawned process. Then, you can create a separate branch of your create_ring() function that terminates the recursion and returns the pid of the last created process to the main process:
init(N) ->
LastPid = create_ring(....),
create_ring(0, PrevPid) -> PrevPid;
create_ring(N, PrevPid) when N > 0 ->
Pid = spawn(?MODULE, loop, [PrevPid]),
create_ring(.......).
Then, the main process can call (not spawn) the same function that is being spawned by the other processes, passing the function the last pid that was returned by the create_ring() function:
init(N) ->
LastPid = create_ring(...),
loop(LastPid).
As a result, the main process will enter into the same message loop as the other processes, and the main process will have the last pid stored in the loop parameter variable to send messages to.
In erlang, you will often find that while you are defining a function, you won't be able to do everything that you want in that function, so you need to call another function to do whatever it is that is giving you trouble, and if in the second function you find you can't do everything you need to do, then you need to call another function, etc. Applied to the ring problem above, I found that init() couldn't do everything I wanted in one function, so I defined the create_ring() function to handle part of the problem.

Julia #distributed: subsequent code run before all workers finish

I have been headbutting on a wall for a few days around this code:
using Distributed
using SharedArrays
# Dimension size
M=10;
N=100;
z_ijw = zeros(Float64,M,N,M)
z_ijw_tmp = SharedArray{Float64}(M*M*N)
i2s = CartesianIndices(z_ijw)
#distributed for iall=1:(M*M*N)
# get index
i=i2s[iall][1]
j=i2s[iall][2]
w=i2s[iall][3]
# Assign function value
z_ijw_tmp[iall]=sqrt(i+j+w) # Any random function would do
end
# Print the last element of the array
println(z_ijw_tmp[end])
println(z_ijw_tmp[end])
println(z_ijw_tmp[end])
The first printed out number is always 0, the second number is either 0 or 10.95... (sqrt of 120, which is correct). The 3rd is either 0 or 10.95 (if the 2nd is 0)
So it appears that the print code (#mainthread?) is allowed to run before all the workers finish. Is there anyway for the print code to run properly the first time (without a wait command)
Without multiple println, I thought it was a problem with scope and spend a few days reading about it #.#
#distributed with a reducer function, i.e. #distributed (+), will be synced, whereas #distributed without a reducer function will be started asynchronously.
Putting a #sync in front of your #distributed should make the code behave the way you want it to.
This is also noted in the documentation here:
Note that without a reducer function, #distributed executes asynchronously, i.e. it spawns independent tasks on all available workers and returns immediately without waiting for completion. To wait for completion, prefix the call with #sync

Detecting keystrokes in Julia

I have a piece of code in Julia in which a solver iterates many, many times as it seeks a solution to a very complex problem. At present, I have to provide a number of iterations for the code to do, set low enough that I don't have to wait hours for the code to halt in order to save the current state, but high enough that I don't have to keep activating the code every 5 minutes.
Is there a way, with the current state of Julia (0.2), to detect a keystroke instructing the code to either end without saving (in case of problems) or end with saving? I require a method such that the code will continue unimpeded unless such a keystroke event has happened, and that will interrupt on any iteration.
Essentially, I'm looking for a command that will read in a keystroke if a keystroke has occurred (while the terminal that Julia is running in has focus), and run certain code if the keystroke was a specific key. Is this possible?
Note: I'm running julia via xfce4-terminal on Xubuntu, in case that affects the required command.
You can you an asynchronous task to read from STDIN, blocking until something is available to read. In your main computation task, when you are ready to check for input, you can call yield() to lend a few cycles to the read task, and check a global to see if anything was read. For example:
input = ""
#async while true
global input = readavailable(STDIN)
end
for i = 1:10^6 # some long-running computation
if isempty(input)
yield()
else
println("GOT INPUT: ", input)
global input = ""
end
# do some other work here
end
Note that, since this is cooperative multithreading, there are no race conditions.
You may be able to achieve this by sending an interrupt (Ctrl+C). This should work from the REPL without any changes to your code – if you want to implement saving you'll have to handle the resulting InterruptException and prompt the user.
I had some trouble with the answer from steven-g-johnson, and ended up using a Channel to communicate between tasks:
function kbtest()
# allow 'q' pressed on the keyboard to break the loop
quitChannel = Channel(10)
#async while true
kb_input = readline(stdin)
if contains(lowercase(kb_input), "q")
put!(quitChannel, 1)
break
end
end
start_time = time()
while (time() - start_time) < 10
if isready(quitChannel)
break
end
println("in loop # $(time() - start_time)")
sleep(1)
end
println("out of loop # $(time() - start_time)")
end
This requires pressing and then , which works well for my needs.

Resources