How to define a composite solid with multiple arguments, including `Nothing`? - dagster

I have a solid that needs to run after 2 solids. One will return a value, another doesn't return anything but has dependency solids and will take time to run.
I execute the pipeline in multiprocessing mode, where solids run at the same time if they don't have dependencies defined.
Below is the sample situation I am looking for. Say I have below solids.
#solid(input_defs=[InputDefinition("start", Nothing)])
def solid_a(context):
import time
time.sleep(2)
context.log.info('yey')
#solid
def solid_b(context):
return 1
#composite_solid
def my_composite_solid(wait_solid_a: Nothing, solid_b_output: int):
some_other_solid(solid_b_output)
And when executed, these solids will be running in the below timeline.
Time Passed
solid
0
pipeline starts...
1 sec
solid_b started
3 sec
solid_a dependency solids are running. solid_a did not started yet.
5 sec
solid_b finished
10 sec
solid_a started now
15 sec
solid_a finished
20 sec
my_composite_solid should start now.
So, according to this timeline, in order for my_composite_solid to start, I need both solid_a and solid_b to finish executing. However, when I make this, dagster throws an error saying:
dagster.core.errors.DagsterInvalidDefinitionError: #composite_solid 'my_composite_solid' has unmapped input 'wait_solid_a'. Remove it or pass it to the appropriate solid invocation.
If I don't put the solid_a output as a dependency to my_composite_solid, it will start immediately after the result of solid_b. What should I do?

Related

Setup Time Delay Before Executing Cells in Jupyter Notebook [duplicate]

This question already has answers here:
How do I get my program to sleep for 50 milliseconds?
(6 answers)
Closed 3 years ago.
How do I put a time delay in a Python script?
This delays for 2.5 seconds:
import time
time.sleep(2.5)
Here is another example where something is run approximately once a minute:
import time
while True:
print("This prints once a minute.")
time.sleep(60) # Delay for 1 minute (60 seconds).
Use sleep() from the time module. It can take a float argument for sub-second resolution.
from time import sleep
sleep(0.1) # Time in seconds
How can I make a time delay in Python?
In a single thread I suggest the sleep function:
>>> from time import sleep
>>> sleep(4)
This function actually suspends the processing of the thread in which it is called by the operating system, allowing other threads and processes to execute while it sleeps.
Use it for that purpose, or simply to delay a function from executing. For example:
>>> def party_time():
... print('hooray!')
...
>>> sleep(3); party_time()
hooray!
"hooray!" is printed 3 seconds after I hit Enter.
Example using sleep with multiple threads and processes
Again, sleep suspends your thread - it uses next to zero processing power.
To demonstrate, create a script like this (I first attempted this in an interactive Python 3.5 shell, but sub-processes can't find the party_later function for some reason):
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed
from time import sleep, time
def party_later(kind='', n=''):
sleep(3)
return kind + n + ' party time!: ' + __name__
def main():
with ProcessPoolExecutor() as proc_executor:
with ThreadPoolExecutor() as thread_executor:
start_time = time()
proc_future1 = proc_executor.submit(party_later, kind='proc', n='1')
proc_future2 = proc_executor.submit(party_later, kind='proc', n='2')
thread_future1 = thread_executor.submit(party_later, kind='thread', n='1')
thread_future2 = thread_executor.submit(party_later, kind='thread', n='2')
for f in as_completed([
proc_future1, proc_future2, thread_future1, thread_future2,]):
print(f.result())
end_time = time()
print('total time to execute four 3-sec functions:', end_time - start_time)
if __name__ == '__main__':
main()
Example output from this script:
thread1 party time!: __main__
thread2 party time!: __main__
proc1 party time!: __mp_main__
proc2 party time!: __mp_main__
total time to execute four 3-sec functions: 3.4519670009613037
Multithreading
You can trigger a function to be called at a later time in a separate thread with the Timer threading object:
>>> from threading import Timer
>>> t = Timer(3, party_time, args=None, kwargs=None)
>>> t.start()
>>>
>>> hooray!
>>>
The blank line illustrates that the function printed to my standard output, and I had to hit Enter to ensure I was on a prompt.
The upside of this method is that while the Timer thread was waiting, I was able to do other things, in this case, hitting Enter one time - before the function executed (see the first empty prompt).
There isn't a respective object in the multiprocessing library. You can create one, but it probably doesn't exist for a reason. A sub-thread makes a lot more sense for a simple timer than a whole new subprocess.
Delays can be also implemented by using the following methods.
The first method:
import time
time.sleep(5) # Delay for 5 seconds.
The second method to delay would be using the implicit wait method:
driver.implicitly_wait(5)
The third method is more useful when you have to wait until a particular action is completed or until an element is found:
self.wait.until(EC.presence_of_element_located((By.ID, 'UserName'))
There are five methods which I know: time.sleep(), pygame.time.wait(), matplotlib's pyplot.pause(), .after(), and asyncio.sleep().
time.sleep() example (do not use if using tkinter):
import time
print('Hello')
time.sleep(5) # Number of seconds
print('Bye')
pygame.time.wait() example (not recommended if you are not using the pygame window, but you could exit the window instantly):
import pygame
# If you are going to use the time module
# don't do "from pygame import *"
pygame.init()
print('Hello')
pygame.time.wait(5000) # Milliseconds
print('Bye')
matplotlib's function pyplot.pause() example (not recommended if you are not using the graph, but you could exit the graph instantly):
import matplotlib
print('Hello')
matplotlib.pyplot.pause(5) # Seconds
print('Bye')
The .after() method (best with Tkinter):
import tkinter as tk # Tkinter for Python 2
root = tk.Tk()
print('Hello')
def ohhi():
print('Oh, hi!')
root.after(5000, ohhi) # Milliseconds and then a function
print('Bye')
Finally, the asyncio.sleep() method (has to be in an async loop):
await asyncio.sleep(5)
A bit of fun with a sleepy generator.
The question is about time delay. It can be fixed time, but in some cases we might need a delay measured since last time. Here is one possible solution:
Delay measured since last time (waking up regularly)
The situation can be, we want to do something as regularly as possible and we do not want to bother with all the last_time, next_time stuff all around our code.
Buzzer generator
The following code (sleepy.py) defines a buzzergen generator:
import time
from itertools import count
def buzzergen(period):
nexttime = time.time() + period
for i in count():
now = time.time()
tosleep = nexttime - now
if tosleep > 0:
time.sleep(tosleep)
nexttime += period
else:
nexttime = now + period
yield i, nexttime
Invoking regular buzzergen
from sleepy import buzzergen
import time
buzzer = buzzergen(3) # Planning to wake up each 3 seconds
print time.time()
buzzer.next()
print time.time()
time.sleep(2)
buzzer.next()
print time.time()
time.sleep(5) # Sleeping a bit longer than usually
buzzer.next()
print time.time()
buzzer.next()
print time.time()
And running it we see:
1400102636.46
1400102639.46
1400102642.46
1400102647.47
1400102650.47
We can also use it directly in a loop:
import random
for ring in buzzergen(3):
print "now", time.time()
print "ring", ring
time.sleep(random.choice([0, 2, 4, 6]))
And running it we might see:
now 1400102751.46
ring (0, 1400102754.461676)
now 1400102754.46
ring (1, 1400102757.461676)
now 1400102757.46
ring (2, 1400102760.461676)
now 1400102760.46
ring (3, 1400102763.461676)
now 1400102766.47
ring (4, 1400102769.47115)
now 1400102769.47
ring (5, 1400102772.47115)
now 1400102772.47
ring (6, 1400102775.47115)
now 1400102775.47
ring (7, 1400102778.47115)
As we see, this buzzer is not too rigid and allow us to catch up with regular sleepy intervals even if we oversleep and get out of regular schedule.
The Tkinter library in the Python standard library is an interactive tool which you can import. Basically, you can create buttons and boxes and popups and stuff that appear as windows which you manipulate with code.
If you use Tkinter, do not use time.sleep(), because it will muck up your program. This happened to me. Instead, use root.after() and replace the values for however many seconds, with a milliseconds. For example, time.sleep(1) is equivalent to root.after(1000) in Tkinter.
Otherwise, time.sleep(), which many answers have pointed out, which is the way to go.
Delays are done with the time library, specifically the time.sleep() function.
To just make it wait for a second:
from time import sleep
sleep(1)
This works because by doing:
from time import sleep
You extract the sleep function only from the time library, which means you can just call it with:
sleep(seconds)
Rather than having to type out
time.sleep()
Which is awkwardly long to type.
With this method, you wouldn't get access to the other features of the time library and you can't have a variable called sleep. But you could create a variable called time.
Doing from [library] import [function] (, [function2]) is great if you just want certain parts of a module.
You could equally do it as:
import time
time.sleep(1)
and you would have access to the other features of the time library like time.clock() as long as you type time.[function](), but you couldn't create the variable time because it would overwrite the import. A solution to this to do
import time as t
which would allow you to reference the time library as t, allowing you to do:
t.sleep()
This works on any library.
If you would like to put a time delay in a Python script:
Use time.sleep or Event().wait like this:
from threading import Event
from time import sleep
delay_in_sec = 2
# Use time.sleep like this
sleep(delay_in_sec) # Returns None
print(f'slept for {delay_in_sec} seconds')
# Or use Event().wait like this
Event().wait(delay_in_sec) # Returns False
print(f'waited for {delay_in_sec} seconds')
However, if you want to delay the execution of a function do this:
Use threading.Timer like this:
from threading import Timer
delay_in_sec = 2
def hello(delay_in_sec):
print(f'function called after {delay_in_sec} seconds')
t = Timer(delay_in_sec, hello, [delay_in_sec]) # Hello function will be called 2 seconds later with [delay_in_sec] as the *args parameter
t.start() # Returns None
print("Started")
Outputs:
Started
function called after 2 seconds
Why use the later approach?
It does not stop execution of the whole script (except for the function you pass it).
After starting the timer you can also stop it by doing timer_obj.cancel().
asyncio.sleep
Notice in recent Python versions (Python 3.4 or higher) you can use asyncio.sleep. It's related to asynchronous programming and asyncio. Check out next example:
import asyncio
from datetime import datetime
#asyncio.coroutine
def countdown(iteration_name, countdown_sec):
"""
Just count for some countdown_sec seconds and do nothing else
"""
while countdown_sec > 0:
print(f'{iteration_name} iterates: {countdown_sec} seconds')
yield from asyncio.sleep(1)
countdown_sec -= 1
loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(countdown('First Count', 2)),
asyncio.ensure_future(countdown('Second Count', 3))]
start_time = datetime.utcnow()
# Run both methods. How much time will both run...?
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
print(f'total running time: {datetime.utcnow() - start_time}')
We may think it will "sleep" for 2 seconds for first method and then 3 seconds in the second method, a total of 5 seconds running time of this code. But it will print:
total_running_time: 0:00:03.01286
It is recommended to read asyncio official documentation for more details.
While everyone else has suggested the de facto time module, I thought I'd share a different method using matplotlib's pyplot function, pause.
An example
from matplotlib import pyplot as plt
plt.pause(5) # Pauses the program for 5 seconds
Typically this is used to prevent the plot from disappearing as soon as it is plotted or to make crude animations.
This would save you an import if you already have matplotlib imported.
This is an easy example of a time delay:
import time
def delay(period='5'):
# If the user enters nothing, it'll wait 5 seconds
try:
# If the user not enters a int, I'll just return ''
time.sleep(period)
except:
return ''
Another, in Tkinter:
import tkinter
def tick():
pass
root = Tk()
delay = 100 # Time in milliseconds
root.after(delay, tick)
root.mainloop()
You also can try this:
import time
# The time now
start = time.time()
while time.time() - start < 10: # Run 1- seconds
pass
# Do the job
Now the shell will not crash or not react.

Is there a way to make a Denodo 8 VDP scheduler job WAIT() for a certain amount of time?

I want to make a VDP scheduler job in Denodo 8 wait for a certain amount of time. The wait function in the job creation process is not working as expected so I figured I'd write it into the VQL. However when i try the suggested function from the documentation (https://community.denodo.com/docs/html/browse/8.0/en/vdp/vql/stored_procedures/predefined_stored_procedures/wait) the Denodo 8 VQL shell doesn't recognize the function.
--Not working
SELECT WAIT('10000');
Returns the following error:
Function 'wait' with arity 1 not found
--Not working
WAIT('10000');
Returns the following error:
Error parsing command 'WAIT('10000')'
Any suggestions would be much appreciated.
There are two ways of invoking WAIT:
Option #1
-- Wait for one minute
CALL WAIT(60000);
Option #2:
-- Wait for ten seconds
SELECT timeinmillis
FROM WAIT()
WHERE timeinmillis = 10000;

quit 1000 of QThread - pyqt5

I have python gui app using Qt
I'm using pyqt5
the app should creating 1000 or more Qthreads, each thread will use pycurl to open external URL
I'm using this code to open the threads:
self.__threads = []
# workerClass thread
for i in range(1000):
workerInstance = workerClass()
workerInstance.sig.connect(self.ui.log.append)
thread = QThread()
self.__threads.append((thread, workerInstance))
workerInstance.moveToThread(thread)
thread.started.connect(workerInstance.worker_func)
thread.start()
WorkerClass will use a simple pycurl to visit an external URL and emit signal to add some info on the log, here is the code:
class workerClass(QObject):
x = 1
sig = pyqtSignal(str)
def __init__(self, linksInstance, timeOut):
super().__init__()
self.linksInstance = linksInstance
self.timeOut = timeOut
#pyqtSlot()
def worker_func(self):
while self.x:
# run the pycurl code
self.sig.emit('success!') ## emit success message
Now the problem is with stopping all this, I used the below code for stop button:
def stop_func(self):
workerClass.x = 0
if self.__threads:
for thread, worker in self.__threads:
thread.quit()
thread.wait()
changing the workerClass.x to 0 will stop the while self.x in the class workerClass and quit and wait should close all threads
all this is working properly, BUT only in the low amount of threads, 10 or may be 100
but if I rung 1000 threads, it takes much time, I wait for 10 minutes but it didn't stopped (mean threads not terminated), however the pycurl time out is only 15 seconds
I also used: self.thread.exit() self.thread.quit() self.thread.terminate() but it gives no difference
if any one have any idea about this, this will be really great :)
after some tries, I removed the pycurl code and test to start 10k QThreads and stopped it, and this working perfect without any errors. so I define that the error is with the pycurl code.
as per the advice of ekhumoro in the question comments, I may try to use pycurl multi
but for now I added this line to the pycurl code:
c.setopt(c.TIMEOUT, self.timeOut)
I was using only this:
c.setopt (c.CONNECTTIMEOUT, self.timeOut)
so now with the both 2 parameters all is properly working, starting and stopping bulk qthreads

After first run ,Jupyter notebook with python 3.6.1, using asyncio basic example gives: RuntimeError: Event loop is closed

In Jupyter Notebook (python 3.6.1) I went to run the basic python docs Hello_World in (18.5.3.1.1. Example: Hello World coroutine) and noticed that it was giving me a RuntimeError. After trying a long time to find the problem with the program(my understanding is that the docs may not be totally up to date), I finally noticed that it only does this after the second run and tested in a restarted Kernel. I've since then copied the same small python program in two successive cells(In 1 and 2) and found that it gives the error on the second not the first and gives the error to both there after. This repeats this after restarting the Kernel.
import asyncio
def hello_world(loop):
print('Hello World')
loop.stop()
loop = asyncio.get_event_loop()
# Schedule a call to hello_world()
loop.call_soon(hello_world, loop)
# Blocking call interrupted by loop.stop()
loop.run_forever()
loop.close()
The traceback:
RuntimeError Traceback (most recent call last)
<ipython-input-2-0930271bd896> in <module>()
6 loop = asyncio.get_event_loop()
7 # Blocking call which returns when the hello_world() coroutine
----> 8 loop.run_until_complete(hello_world())
9 loop.close()
/home/pontiac/anaconda3/lib/python3.6/asyncio/base_events.py in run_until_complete(self, future)
441 Return the Future's result, or raise its exception.
442 """
--> 443 self._check_closed()
444
445 new_task = not futures.isfuture(future)
/home/pontiac/anaconda3/lib/python3.6/asyncio/base_events.py in _check_closed(self)
355 def _check_closed(self):
356 if self._closed:
--> 357 raise RuntimeError('Event loop is closed')
358
359 def _asyncgen_finalizer_hook(self, agen):
RuntimeError: Event loop is closed
I don't get this error when running a file in the interpreter with all the Debug settings set. I am running this Notebook in my recently reinstalled Anaconda set up which only has the 3.6.1 python version installed.
the issue is that loop.close() makes the loop unavailable for future use. That is, you can never use a loop again after calling close. The loop stays around as an object, but almost all methods on th eloop will raise an exception once the loop is closed. However, asyncio.get_event_loop() returns the same loop if you call it more than once. You often want this, so that multiple parts of an application get the same event loop.
However if you plan on closing a loop, you are better off calling asyncio.new_event_loop rather than asyncio.get_event_loop. That will give you a fresh event loop. If you call new_event_loop rather than get_event_loop, you're responsible for making sure that the right loop gets used in all parts of the application that run in this thread. If you want to be able to run multiple times to test you could do something like:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
After that, you'll find that asyncio.get_event_loop returns the same thing as loop. So if you do that near the top of your program, you will have a new fresh event loop each run of the code.

Multiprocessing with worker.run() does work in serie instead of parallel?

I'm trying to create a program which in its essence works like this:
import multiprocessing
import time
def worker(numbers):
print(numbers)
time.sleep(2)
return
if __name__ =='__main__':
multiprocessing.set_start_method("spawn")
p1 = multiprocessing.Process(target=worker, args=([0,1,2,3,4],))
p2 = multiprocessing.Process(target=worker, args=([5,6,7,8],))
p1.start()
p2.start()
p1.join()
p2.join()
while(1):
p1.run()
p2.run()
p1.join()
p2.join()
print('Done!')
The first time the processes are called via p#.start(), they are executed in parallel. The second time they are called via the p#.run() method, they are executed in series.
How can I make sure the subsequent method calls are also performed in parallel?
Edit: It is important that the processes start together. It cannot happen that process 1 gets executed twice while process 2 only gets executed once.
Edit: I should also note that this code is running on a raspberry pi v3 model B.
As far as I know, a thread can only be started once. After that when you call the run method, it's just a simple function. That's why it isn't run in parallel.

Resources