Multiprocessing with worker.run() does work in serie instead of parallel? - python-3.4

I'm trying to create a program which in its essence works like this:
import multiprocessing
import time
def worker(numbers):
print(numbers)
time.sleep(2)
return
if __name__ =='__main__':
multiprocessing.set_start_method("spawn")
p1 = multiprocessing.Process(target=worker, args=([0,1,2,3,4],))
p2 = multiprocessing.Process(target=worker, args=([5,6,7,8],))
p1.start()
p2.start()
p1.join()
p2.join()
while(1):
p1.run()
p2.run()
p1.join()
p2.join()
print('Done!')
The first time the processes are called via p#.start(), they are executed in parallel. The second time they are called via the p#.run() method, they are executed in series.
How can I make sure the subsequent method calls are also performed in parallel?
Edit: It is important that the processes start together. It cannot happen that process 1 gets executed twice while process 2 only gets executed once.
Edit: I should also note that this code is running on a raspberry pi v3 model B.

As far as I know, a thread can only be started once. After that when you call the run method, it's just a simple function. That's why it isn't run in parallel.

Related

Dagster -Execute an #Op only when all parallel executions are finished(DynamicOutput)

I have a problem that in fact I am not able to solve in dagster.
I have the following configuration:
I have step 1 where I get the data from an endpoint
step 2 gets a list of customers dynamically:
step 3 is the database update with the response from step 1, for each customer from step 2, but in parallel.
before calling step 3, I have a function that serves to create DynamicOutput for each client of step 2, with the name "parallelize_clients "so that when it is invoked, it parallelizes the update processes of step_3 and finally I have a graph to join operations.
#op()
def step_1_get_response():
return {'exemple': 'data'}
#op()
def step_2_get_client_list():
return ['client_1', 'client_2', 'client_3'] #the number of customers is dynamic.
#op(out=DynamicOut())
def parallelize_clients(context, client_list):
for client in client_list:
yield DynamicOutput(client, mapping_key=str(client))
#op()
def step_3_update_database_cliente(response, client):
...OPERATION UPDATE IN DATABASE CLIENT
#graph()
def job_exemple_graph():
response = step_1_get_response()
clients_list = step_2_get_client_list()
clients = parallelize_clients(clients_list)
#run the functions in parallel
clients.map(lambda client: step_3_update_database_cliente(response, client))
According to the documentation, an #Op starts as soon as its dependencies are fulfilled, and in the case of Ops that have no dependency, they are executed instantly, without having an exact order of execution. Example: My step1 and step2 have no dependencies, so both are running in parallel automatically. After the clients return, the "parallelize_clients()" function is executed, and finally, I have a map in the graph that dynamically creates several executions according to the amount of client(DynamicOutput)
So far it works, and everything is fine. Here's the problem. I need to execute a specific function only when step3 is completely finished, and as it is created dynamically, several executions are generated in parallel, however, I am not able to control to execute a function only when all these executions in parallel are finished.
in the graph I tried to put the call to an op "exemplolaststep() step_4" at the end, however, step 4 is executed together with "step1" and "step2", and I really wanted step4 to only execute after step3, but not I can somehow get this to work. Could someone help me?
I tried to create a fake dependency with
#op(ins={"start": In(Nothing)})
def step_4():
pass
and in the graph, when calling the operations, I tried to execute the map call inside the step_4() function call; Example
#graph()
def job_exemple_graph():
response = step_1_get_response()
clients_list = step_2_get_client_list()
clients = parallelize_clients(clients_list)
#run the functions in parallel
step_4(start=clients.map(lambda client: step_3_update_database_cliente(response, client)))
I have tried other approaches as well, however, to no avail.
You just need to add a .collect() call on the mapped function in your graph, to indicate that all the parallel operations should join before moving on. Something like
#graph()
def job_exemple_graph():
response = step_1_get_response()
clients_list = step_2_get_client_list()
clients = parallelize_clients(clients_list)
# run the functions in parallel
step_4(
start=clients.map(
lambda client: step_3_update_database_cliente(response, client)
).collect()
)

Setup Time Delay Before Executing Cells in Jupyter Notebook [duplicate]

This question already has answers here:
How do I get my program to sleep for 50 milliseconds?
(6 answers)
Closed 3 years ago.
How do I put a time delay in a Python script?
This delays for 2.5 seconds:
import time
time.sleep(2.5)
Here is another example where something is run approximately once a minute:
import time
while True:
print("This prints once a minute.")
time.sleep(60) # Delay for 1 minute (60 seconds).
Use sleep() from the time module. It can take a float argument for sub-second resolution.
from time import sleep
sleep(0.1) # Time in seconds
How can I make a time delay in Python?
In a single thread I suggest the sleep function:
>>> from time import sleep
>>> sleep(4)
This function actually suspends the processing of the thread in which it is called by the operating system, allowing other threads and processes to execute while it sleeps.
Use it for that purpose, or simply to delay a function from executing. For example:
>>> def party_time():
... print('hooray!')
...
>>> sleep(3); party_time()
hooray!
"hooray!" is printed 3 seconds after I hit Enter.
Example using sleep with multiple threads and processes
Again, sleep suspends your thread - it uses next to zero processing power.
To demonstrate, create a script like this (I first attempted this in an interactive Python 3.5 shell, but sub-processes can't find the party_later function for some reason):
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed
from time import sleep, time
def party_later(kind='', n=''):
sleep(3)
return kind + n + ' party time!: ' + __name__
def main():
with ProcessPoolExecutor() as proc_executor:
with ThreadPoolExecutor() as thread_executor:
start_time = time()
proc_future1 = proc_executor.submit(party_later, kind='proc', n='1')
proc_future2 = proc_executor.submit(party_later, kind='proc', n='2')
thread_future1 = thread_executor.submit(party_later, kind='thread', n='1')
thread_future2 = thread_executor.submit(party_later, kind='thread', n='2')
for f in as_completed([
proc_future1, proc_future2, thread_future1, thread_future2,]):
print(f.result())
end_time = time()
print('total time to execute four 3-sec functions:', end_time - start_time)
if __name__ == '__main__':
main()
Example output from this script:
thread1 party time!: __main__
thread2 party time!: __main__
proc1 party time!: __mp_main__
proc2 party time!: __mp_main__
total time to execute four 3-sec functions: 3.4519670009613037
Multithreading
You can trigger a function to be called at a later time in a separate thread with the Timer threading object:
>>> from threading import Timer
>>> t = Timer(3, party_time, args=None, kwargs=None)
>>> t.start()
>>>
>>> hooray!
>>>
The blank line illustrates that the function printed to my standard output, and I had to hit Enter to ensure I was on a prompt.
The upside of this method is that while the Timer thread was waiting, I was able to do other things, in this case, hitting Enter one time - before the function executed (see the first empty prompt).
There isn't a respective object in the multiprocessing library. You can create one, but it probably doesn't exist for a reason. A sub-thread makes a lot more sense for a simple timer than a whole new subprocess.
Delays can be also implemented by using the following methods.
The first method:
import time
time.sleep(5) # Delay for 5 seconds.
The second method to delay would be using the implicit wait method:
driver.implicitly_wait(5)
The third method is more useful when you have to wait until a particular action is completed or until an element is found:
self.wait.until(EC.presence_of_element_located((By.ID, 'UserName'))
There are five methods which I know: time.sleep(), pygame.time.wait(), matplotlib's pyplot.pause(), .after(), and asyncio.sleep().
time.sleep() example (do not use if using tkinter):
import time
print('Hello')
time.sleep(5) # Number of seconds
print('Bye')
pygame.time.wait() example (not recommended if you are not using the pygame window, but you could exit the window instantly):
import pygame
# If you are going to use the time module
# don't do "from pygame import *"
pygame.init()
print('Hello')
pygame.time.wait(5000) # Milliseconds
print('Bye')
matplotlib's function pyplot.pause() example (not recommended if you are not using the graph, but you could exit the graph instantly):
import matplotlib
print('Hello')
matplotlib.pyplot.pause(5) # Seconds
print('Bye')
The .after() method (best with Tkinter):
import tkinter as tk # Tkinter for Python 2
root = tk.Tk()
print('Hello')
def ohhi():
print('Oh, hi!')
root.after(5000, ohhi) # Milliseconds and then a function
print('Bye')
Finally, the asyncio.sleep() method (has to be in an async loop):
await asyncio.sleep(5)
A bit of fun with a sleepy generator.
The question is about time delay. It can be fixed time, but in some cases we might need a delay measured since last time. Here is one possible solution:
Delay measured since last time (waking up regularly)
The situation can be, we want to do something as regularly as possible and we do not want to bother with all the last_time, next_time stuff all around our code.
Buzzer generator
The following code (sleepy.py) defines a buzzergen generator:
import time
from itertools import count
def buzzergen(period):
nexttime = time.time() + period
for i in count():
now = time.time()
tosleep = nexttime - now
if tosleep > 0:
time.sleep(tosleep)
nexttime += period
else:
nexttime = now + period
yield i, nexttime
Invoking regular buzzergen
from sleepy import buzzergen
import time
buzzer = buzzergen(3) # Planning to wake up each 3 seconds
print time.time()
buzzer.next()
print time.time()
time.sleep(2)
buzzer.next()
print time.time()
time.sleep(5) # Sleeping a bit longer than usually
buzzer.next()
print time.time()
buzzer.next()
print time.time()
And running it we see:
1400102636.46
1400102639.46
1400102642.46
1400102647.47
1400102650.47
We can also use it directly in a loop:
import random
for ring in buzzergen(3):
print "now", time.time()
print "ring", ring
time.sleep(random.choice([0, 2, 4, 6]))
And running it we might see:
now 1400102751.46
ring (0, 1400102754.461676)
now 1400102754.46
ring (1, 1400102757.461676)
now 1400102757.46
ring (2, 1400102760.461676)
now 1400102760.46
ring (3, 1400102763.461676)
now 1400102766.47
ring (4, 1400102769.47115)
now 1400102769.47
ring (5, 1400102772.47115)
now 1400102772.47
ring (6, 1400102775.47115)
now 1400102775.47
ring (7, 1400102778.47115)
As we see, this buzzer is not too rigid and allow us to catch up with regular sleepy intervals even if we oversleep and get out of regular schedule.
The Tkinter library in the Python standard library is an interactive tool which you can import. Basically, you can create buttons and boxes and popups and stuff that appear as windows which you manipulate with code.
If you use Tkinter, do not use time.sleep(), because it will muck up your program. This happened to me. Instead, use root.after() and replace the values for however many seconds, with a milliseconds. For example, time.sleep(1) is equivalent to root.after(1000) in Tkinter.
Otherwise, time.sleep(), which many answers have pointed out, which is the way to go.
Delays are done with the time library, specifically the time.sleep() function.
To just make it wait for a second:
from time import sleep
sleep(1)
This works because by doing:
from time import sleep
You extract the sleep function only from the time library, which means you can just call it with:
sleep(seconds)
Rather than having to type out
time.sleep()
Which is awkwardly long to type.
With this method, you wouldn't get access to the other features of the time library and you can't have a variable called sleep. But you could create a variable called time.
Doing from [library] import [function] (, [function2]) is great if you just want certain parts of a module.
You could equally do it as:
import time
time.sleep(1)
and you would have access to the other features of the time library like time.clock() as long as you type time.[function](), but you couldn't create the variable time because it would overwrite the import. A solution to this to do
import time as t
which would allow you to reference the time library as t, allowing you to do:
t.sleep()
This works on any library.
If you would like to put a time delay in a Python script:
Use time.sleep or Event().wait like this:
from threading import Event
from time import sleep
delay_in_sec = 2
# Use time.sleep like this
sleep(delay_in_sec) # Returns None
print(f'slept for {delay_in_sec} seconds')
# Or use Event().wait like this
Event().wait(delay_in_sec) # Returns False
print(f'waited for {delay_in_sec} seconds')
However, if you want to delay the execution of a function do this:
Use threading.Timer like this:
from threading import Timer
delay_in_sec = 2
def hello(delay_in_sec):
print(f'function called after {delay_in_sec} seconds')
t = Timer(delay_in_sec, hello, [delay_in_sec]) # Hello function will be called 2 seconds later with [delay_in_sec] as the *args parameter
t.start() # Returns None
print("Started")
Outputs:
Started
function called after 2 seconds
Why use the later approach?
It does not stop execution of the whole script (except for the function you pass it).
After starting the timer you can also stop it by doing timer_obj.cancel().
asyncio.sleep
Notice in recent Python versions (Python 3.4 or higher) you can use asyncio.sleep. It's related to asynchronous programming and asyncio. Check out next example:
import asyncio
from datetime import datetime
#asyncio.coroutine
def countdown(iteration_name, countdown_sec):
"""
Just count for some countdown_sec seconds and do nothing else
"""
while countdown_sec > 0:
print(f'{iteration_name} iterates: {countdown_sec} seconds')
yield from asyncio.sleep(1)
countdown_sec -= 1
loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(countdown('First Count', 2)),
asyncio.ensure_future(countdown('Second Count', 3))]
start_time = datetime.utcnow()
# Run both methods. How much time will both run...?
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
print(f'total running time: {datetime.utcnow() - start_time}')
We may think it will "sleep" for 2 seconds for first method and then 3 seconds in the second method, a total of 5 seconds running time of this code. But it will print:
total_running_time: 0:00:03.01286
It is recommended to read asyncio official documentation for more details.
While everyone else has suggested the de facto time module, I thought I'd share a different method using matplotlib's pyplot function, pause.
An example
from matplotlib import pyplot as plt
plt.pause(5) # Pauses the program for 5 seconds
Typically this is used to prevent the plot from disappearing as soon as it is plotted or to make crude animations.
This would save you an import if you already have matplotlib imported.
This is an easy example of a time delay:
import time
def delay(period='5'):
# If the user enters nothing, it'll wait 5 seconds
try:
# If the user not enters a int, I'll just return ''
time.sleep(period)
except:
return ''
Another, in Tkinter:
import tkinter
def tick():
pass
root = Tk()
delay = 100 # Time in milliseconds
root.after(delay, tick)
root.mainloop()
You also can try this:
import time
# The time now
start = time.time()
while time.time() - start < 10: # Run 1- seconds
pass
# Do the job
Now the shell will not crash or not react.

Python script run in background not writing to file?

I've created my first ever python script and it works fine in the foreground, but when I run it in the background it creates, but fails to write anything to the file. I run the script with command : python -u testit.py &
Please help me understand why this happens.
#!/usr/bin/env python
import time
import datetime
dt = datetime.datetime.now()
dtLog = dt.strftime("ThermoLogs/TempLOG%Y%m%d")
f = open(dtLog,"a")
while True:
dt = datetime.datetime.now()
print('{:%Y-%m-%d %H:%M:%S}'.format(dt))
f.write('{:%Y-%m-%d %H:%M:%S}'.format(dt))
f.write('\n')
time.sleep(5)
The output will arrive in bursts (it is being buffered). With the sleep set to 0.01 I am getting bursts about every 2 seconds. So for a delay of 5 you will get them much less frequent. You may also note that it outputs all pending outputs when you terminate it.
To make the output arrive now call f.flush().
I don't see any difference between foreground and background. However it should not buffer stdout when going to a terminal.

After first run ,Jupyter notebook with python 3.6.1, using asyncio basic example gives: RuntimeError: Event loop is closed

In Jupyter Notebook (python 3.6.1) I went to run the basic python docs Hello_World in (18.5.3.1.1. Example: Hello World coroutine) and noticed that it was giving me a RuntimeError. After trying a long time to find the problem with the program(my understanding is that the docs may not be totally up to date), I finally noticed that it only does this after the second run and tested in a restarted Kernel. I've since then copied the same small python program in two successive cells(In 1 and 2) and found that it gives the error on the second not the first and gives the error to both there after. This repeats this after restarting the Kernel.
import asyncio
def hello_world(loop):
print('Hello World')
loop.stop()
loop = asyncio.get_event_loop()
# Schedule a call to hello_world()
loop.call_soon(hello_world, loop)
# Blocking call interrupted by loop.stop()
loop.run_forever()
loop.close()
The traceback:
RuntimeError Traceback (most recent call last)
<ipython-input-2-0930271bd896> in <module>()
6 loop = asyncio.get_event_loop()
7 # Blocking call which returns when the hello_world() coroutine
----> 8 loop.run_until_complete(hello_world())
9 loop.close()
/home/pontiac/anaconda3/lib/python3.6/asyncio/base_events.py in run_until_complete(self, future)
441 Return the Future's result, or raise its exception.
442 """
--> 443 self._check_closed()
444
445 new_task = not futures.isfuture(future)
/home/pontiac/anaconda3/lib/python3.6/asyncio/base_events.py in _check_closed(self)
355 def _check_closed(self):
356 if self._closed:
--> 357 raise RuntimeError('Event loop is closed')
358
359 def _asyncgen_finalizer_hook(self, agen):
RuntimeError: Event loop is closed
I don't get this error when running a file in the interpreter with all the Debug settings set. I am running this Notebook in my recently reinstalled Anaconda set up which only has the 3.6.1 python version installed.
the issue is that loop.close() makes the loop unavailable for future use. That is, you can never use a loop again after calling close. The loop stays around as an object, but almost all methods on th eloop will raise an exception once the loop is closed. However, asyncio.get_event_loop() returns the same loop if you call it more than once. You often want this, so that multiple parts of an application get the same event loop.
However if you plan on closing a loop, you are better off calling asyncio.new_event_loop rather than asyncio.get_event_loop. That will give you a fresh event loop. If you call new_event_loop rather than get_event_loop, you're responsible for making sure that the right loop gets used in all parts of the application that run in this thread. If you want to be able to run multiple times to test you could do something like:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
After that, you'll find that asyncio.get_event_loop returns the same thing as loop. So if you do that near the top of your program, you will have a new fresh event loop each run of the code.

Callback from "multiprocessing" with CFFI segfaults after ~100 iterations

A PyPy callback, that works perfectly (in an infinite loop) when implemented (straightforwardly) as method of a Python object, segfaults after approximately 100 iterations when I move the Python object into a separate multiprocessing process.
In the main code I have:
import multiprocessing as mp
class Task(object):
def __init__(self, com, lib):
self.com = com # communication queue
self.lib = lib # ffi library
self.proc = mp.Process(target=self.spawn, args=(self.com,))
self.register_callback()
def spawn(self, com):
print('%s spawned.'%self.name)
# loop (keeping 'self' alive) until BREAK:
while True:
cmd = com.get()
if cmd == self.BREAK:
break
print("%s stopped."%self.name)
#ffi.calback("int(void*, Data*"): # old cffi (ABI mode)
def callback(self, data):
# <work on data>
return 1
def register_callback(self):
s = ffi.new_handle(self)
self.lib.register_callback(s, self.callback) # C-call
The idea is that multiple tasks should serve an equal number of callbacks concurrently. I have no clue what may cause the segfault, especially since it runs fine for the first ~100 iterations or so. Help much appreciated!
Solution
Handle 's' is garbage collected when returning from 'register_callback()'. Making the handle an attribute of 'self' and passing keeps it alive.
Standard CPython (cffi 1.6.0) segfaulted at the first iteration (i.e. gc was immediate) and provided me a crucial informative error message. PyPy on the other hand segfaulted after approximately 100 iterations without providing a message... Both run fine now.

Resources