I have two components comp1 and comp2 which form a problem, which should be run several times. To do that I found that I could use the UniformDriver (I don't know if this is the most appropriate one for my purpose). However, I would like to feedback an output from comp2 into comp1. So after the first run, I obtain an output from comp2, which for the next run should be an input to comp1. I think the following example makes it a bit more clear what I would like to do:
from openmdao.api import Component, Group, Problem, UniformDriver
class Times2Plus(Component):
def __init__(self):
super(Times2Plus, self).__init__()
self.add_param('x', 1.0)
self.add_param('z', 2.0)
self.add_output('y', shape=1)
def solve_nonlinear(self, params, unknowns, resids):
unknowns['y'] = params['x'] * 2.0 + params['z']
class Power3(Component):
def __init__(self):
super(Power3, self).__init__()
self.add_param('y', shape=1)
self.add_output('x', shape=1) # feedback to params['x'] as input in next run
def solve_nonlinear(self, params, unknowns, resids):
unknowns['x'] = params['y'] ** 3.0
prob = Problem(root=Group())
prob.driver = UniformDriver(num_samples=5)
prob.root.add('comp1', Times2Plus())
prob.root.add('comp2', Power3())
prob.root.connect('comp1.y', 'comp2.y')
prob.setup()
prob.run()
Basically the output x of the component Power3 of the previous run shall be connected to the input x of component Times2Plus. In addition I have some parameter z, which I know beforehand, for component Times2Plus which differs for each run. What would be the best way to include this changing parameter and the feedback option?
You don't want a drive in this case. You're talking about a solver (distinct from a driver in OpenMDAO > 1.0).
You can see an example of how to set up this kind of thing in our Sellar Example, using a Non Linear Gauss Seidel. Basically you just need to connect up the components in a circular fashion and then add an appropriate solver to make sure they are converged.
Related
So, I have a couple tasks in a DAG. Let's say I calculate a value and assign it to a variable in the first task. I want to be able to use the variable in the subsequent tasks.
How can I do this?
In a python program, I can just elevate the status of the variable in a function to Global , so that I can use that variable in other functions.
How can I achieve a similar thing with Airflow - use a variable from first task in the subsequent tasks.
I know I can use XCOMS. Is there any other way?
I tried elevating the status of variable to Global in the function called by first task and tried to use it in the subsequent tasks. It did not work.
As you see in below answer from a similar question, as long as you don't use an XCOM or persistent file, it is impossible to pass variables between tasks.
https://stackoverflow.com/a/60495564/19969296
You will need to either write the variable to a file that persists (can be a local file, a relational database, a file in a blob storage) or use XCom (see this guide for concrete examples) or Airflow Variables. Is there a reason you don't want to use XCom?
XCom example which is a more common pattern than the Airflow Variable example below:
from airflow.decorators import dag, task
from pendulum import datetime
#dag(
start_date=datetime(2022,12,10),
schedule=None,
catchup=False,
)
def write_var():
#task
def set_var():
return "bar"
#task
def retrive_var(my_variable):
print(my_variable)
retrive_var(set_var())
write_var()
The alternative to XCom would be to save the variable as an Airflow variable as shown in this DAG:
from airflow.decorators import dag, task
from pendulum import datetime
from airflow.models import Variable
#dag(
start_date=datetime(2022,12,10),
schedule=None,
catchup=False,
)
def write_var():
#task
def set_var():
Variable.set("foo", "bar")
#task
def retrive_var():
var = Variable.get("foo")
print(var)
set_var() >> retrive_var()
write_var()
I know there is possible to run tests for specific lines and methods in some languages/frameworks as for example Ruby on Rails. I would like to know if there is some specific syntax to do the same in SAGE math. For example, suppose I have the next code. I would like to run only the EXAMPLES of method1 with something like sage -t module.py:method1
#module.py
def method1():
"""
EXAMPLES::
sage: 5+0
5
"""
return 1
def method2():
"""
EXAMPLES::
sage: 5+2
7
"""
return 2
When I run this handler in a simple Tornado app and make two requests to it with curl, it doesn't run in parallel. It prints out "1 2 3 4 5 1 2 3 4 5", when I want it to print "1 1 2 2 3 3 4 4 5 5".
class SleepHandler(RequestHandler):
def get(self):
for i in range(5):
print(i)
time.sleep(1)
What am I doing wrong?
The reason for this is that time.sleep is a blocking function: it doesn’t allow control to return to the IOLoop so that other handlers can be run.
Of course, time.sleep is often just a placeholder in these examples, the point is to show what happens when something in a handler gets slow. No matter what the real code is doing, to achieve concurrency blocking code must be replaced with non-blocking equivalents. This means one of three things:
Find a coroutine-friendly equivalent. For time.sleep, use tornado.gen.sleep instead:
class CoroutineSleepHandler(RequestHandler):
#gen.coroutine
def get(self):
for i in range(5):
print(i)
yield gen.sleep(1)
When this option is available, it is usually the best approach. See the Tornado wiki for links to asynchronous libraries that may be useful.
Find a callback-based equivalent. Similar to the first option, callback-based libraries are available for many tasks, although they are slightly more complicated to use than a library designed for coroutines. These are typically used with tornado.gen.Task as an adapter:
class CoroutineTimeoutHandler(RequestHandler):
#gen.coroutine
def get(self):
io_loop = IOLoop.current()
for i in range(5):
print(i)
yield gen.Task(io_loop.add_timeout, io_loop.time() + 1)
Again, the Tornado wiki can be useful to find suitable libraries.
Run the blocking code on another thread. When asynchronous libraries are not available, concurrent.futures.ThreadPoolExecutor can be used to run any blocking code on another thread. This is a universal solution that can be used for any blocking function whether an asynchronous counterpart exists or not:
executor = concurrent.futures.ThreadPoolExecutor(8)
class ThreadPoolHandler(RequestHandler):
#gen.coroutine
def get(self):
for i in range(5):
print(i)
yield executor.submit(time.sleep, 1)
See the Asynchronous I/O chapter of the Tornado user’s guide for more on blocking and asynchronous functions.
I have been researching multiprocessing and came upon an example of it on a website. However, when I try to run that example on my MacBook retina, nothing happens. The following was the example:
import random
import multiprocessing
def list_append(count, id, out_list):
"""
Creates an empty list and then appends a
random number to the list 'count' number
of times. A CPU-heavy operation!
"""
for i in range(count):
out_list.append(random.random())
if __name__ == "__main__":
size = 10000000 # Number of random numbers to add
procs = 2 # Number of processes to create
# Create a list of jobs and then iterate through
# the number of processes appending each process to
# the job list
jobs = []
for i in range(0, procs):
out_list = list()
process = multiprocessing.Process(target=list_append,
args=(size, i, out_list))
jobs.append(process)
# Start the processes (i.e. calculate the random number lists)
for j in jobs:
j.start()
# Ensure all of the processes have finished
for j in jobs:
j.join()
print ("List processing complete.")
As it turns out after I put a print statement in the 'list_append' function, nothing printed, so the problem is actually not the j.join() but rather the j.start() bit.
When you create a process with multiprocessing.Process, you prepare a sub-function to be run in a different process asynchronously. The computation starts when you call the start method. The join method waits for the computation to be done. So if you just start the process and do not wait for the to complete (or join) nothing will happen as the process will be killed when your program exit.
Here, one issue is that you are not using an object sharable in multiprocessing. When you use a common list(), each process will use a different list in memory. The local process will be cleared when the processes exit and the main list will be empty. If you want to be able to exchange processes between data you should use a multiprocessing.Queue:
import random
import multiprocessing
def list_append(count, id, out_queue):
"""
Creates an empty list and then appends a
random number to the list 'count' number
of times. A CPU-heavy operation!
"""
for i in range(count):
out_queue.put((id, random.random()))
if __name__ == "__main__":
size = 10000 # Number of random numbers to add
procs = 2 # Number of processes to create
# Create a list of jobs and then iterate through
# the number of processes appending each process to
# the job list
jobs = []
q = multiprocessing.Queue()
for i in range(0, procs):
process = multiprocessing.Process(target=list_append,
args=(size, i, q))
process.start()
jobs.append(process)
result = []
for k in range(procs*size):
result += [q.get()]
# Wait for all the processes to finish
for j in jobs:
j.join()
print("List processing complete. {}".format(result))
Note that this code can hang quite easily if you do not compute correctly the number of results sent back in out_queue.
If you try to retrieve too many results, q.get will wait for an extra result that will never come. If you do not retrieve all the result from q, your processes will freeze as the out_queue will be full, and out_queue.put will not return. Your processes will thus never exit and you will not be able to join them.
If your computation are independent, I strongly advise to look at higher level tools like Pool or even more robust third party library like joblib as it will take care of these aspects for you. (see this answer for some insights on Process vs Pool/joblib)
I actually reduced the number size as the program become to slow if you try to put to many objects in a Queue. If you need to pass a lot of small object, try passing all of them in one batch:
import random
import multiprocessing
def list_append(count, id, out_queue):
a = [random.random() for i in range(count)]
out_queue.put((id, a))
if __name__ == "__main__":
size = 10000 # Number of random numbers to add
procs = 2 # Number of processes to create
jobs = []
q = multiprocessing.Queue()
for i in range(0, procs):
process = multiprocessing.Process(target=list_append,
args=(size, i, q))
process.start()
jobs.append(process)
result += [q.get() for _ in range(procs)]
for j in jobs:
j.join()
print("List processing complete.")
I have a list comprehension:
thingie=[f(a,x,c) for x in some_list]
which I am parallelising as follows:
from multiprocessing import Pool
pool=Pool(processes=4)
thingie=pool.map(lambda x: f(a,x,c), some_list)
but I get the following error:
_pickle.PicklingError: Can't pickle <function <lambda> at 0x7f60b3b0e9d8>:
attribute lookup <lambda> on __main__ failed
I have tried to install the pathos package which apparently addresses this issue, but when I try to import it I get the error:
ImportError: No module named 'pathos'
OK, so this answer is just for the record, I've figured it out with author of the question during comment conversation.
multiprocessing needs to transport every object between processes, so it uses pickle to serialize it in one process and deserialize in another. It all works well, but pickle cannot serialize lambda. AFAIR it is so because pickle needs functions source to serialize it, and lambda won't have it, but I'm not 100% sure and cannot quote my source.
It won't be any problem if you use map() on 1 argument function - you can pass that function instead of lambda. If you have more arguments, like in your example, you need to define some wrapper with def keyword:
from multiprocessing import Pool
def f(x, y, z):
print(x, y, z)
def f_wrapper(y):
return f(1, y, "a")
pool = Pool(processes=4)
result = pool.map(f_wrapper, [7, 9, 11])
Just before I close this, I found another way to do this with Python 3, using functools,
say I have a function f with three variables f(a,x,c), one of which I want to may, say x. I can use the following code to do basically what #FilipMalczak suggests:
import functools
from multiprocessing import Pool
f1=functools.partial(f,a=10)
f2=functools.partial(f2,c=10)
pool=Pool(processes=4)
final_answer=pool.map(f2,some_list)