I am trying to run a simple mathematical problem in parallel in OpenMDAO 2.5.0. The problem is an adapted version of the example in the OpenMDAO docs found here: http://openmdao.org/twodocs/versions/latest/features/core_features/grouping_components/parallel_group.html. It has some extra components and connections and uses promotions instead of connections.
from openmdao.api import Problem, IndepVarComp, ParallelGroup, ExecComp, Group, NonlinearBlockGS
prob = Problem()
model = prob.model
model.add_subsystem('p1', IndepVarComp('x1', 1.0), promotes=['x1'])
model.add_subsystem('p2', IndepVarComp('x2', 1.0), promotes=['x2'])
cycle = model.add_subsystem('cycle', Group(), promotes=['*'])
parallel = cycle.add_subsystem('parallel', ParallelGroup(), promotes=['*'])
parallel.add_subsystem('c1', ExecComp(['y1=(-2.0*x1+z)/3']), promotes=['x1', 'y1', 'z'])
parallel.add_subsystem('c2', ExecComp(['y2=(5.0*x2-z)/6']), promotes=['x2', 'y2', 'z'])
cycle.add_subsystem('c3', ExecComp(['z=(3.0*y1+7.0*y2)/10']), promotes=['y1', 'y2', 'z'])
model.add_subsystem('c4', ExecComp(['z2 = y1+y2']), promotes=['z2', 'y1', 'y2'])
cycle.nonlinear_solver = NonlinearBlockGS()
prob.setup(mode='fwd')
prob.set_solver_print(level=2)
prob.run_model()
print(prob['z2'])
print(prob['z'])
print(prob['y1'])
print(prob['y2'])
When I run this code in series, it works as expected with no errors.
However, when I run this code in parallel with:
mpirun -n 2 python Test.py
I get this error for the first process:
RuntimeError: The promoted name y1 is invalid because it refers to multiple inputs: [cycle.c3.y1 ,c4.y1]. Access the value from the connected output variable cycle.parallel.c1.y1 instead.
and this error for the second process:
RuntimeError: The promoted name y2 is invalid because it refers to multiple inputs: [cycle.c3.y2 ,c4.y2]. Access the value from the connected output variable cycle.parallel.c2.y2 instead.
So my question is: why does this example give an error in the promotes names when running in parallel, while it is running without problems in series? Is it only allowed to use connections when running in parallel or are promoted variables okay as well?
As of OpenMDAO V2.5, you have to be a little careful about how you access problem variables in parallel.
If you look at the full stack track of your model, you can see that the error is being thrown right at the end when you call
print(prob['y1'])
print(prob['y2'])
What is happening here is that you have set the model up so that y1 only exists on proc 0 and y2 only exists on proc 1. Then you try to get values that don't exist on that proc and you get (an admittedly not very clear) error.
You can fix this with the following minor modification to your script:
from openmdao.api import Problem, IndepVarComp, ParallelGroup, ExecComp, Group, NonlinearBlockJac
prob = Problem()
model = prob.model
model.add_subsystem('p1', IndepVarComp('x1', 1.0), promotes=['x1'])
model.add_subsystem('p2', IndepVarComp('x2', 1.0), promotes=['x2'])
cycle = model.add_subsystem('cycle', Group(), promotes=['*'])
parallel = cycle.add_subsystem('parallel', ParallelGroup(), promotes=['*'])
parallel.add_subsystem('c1', ExecComp(['y1=(-2.0*x1+z)/3']), promotes=['x1', 'y1', 'z'])
parallel.add_subsystem('c2', ExecComp(['y2=(5.0*x2-z)/6']), promotes=['x2', 'y2', 'z'])
cycle.add_subsystem('c3', ExecComp(['z=(3.0*y1+7.0*y2)/10']), promotes=['y1', 'y2', 'z'])
model.add_subsystem('c4', ExecComp(['z2 = y1+y2']), promotes=['z2', 'y1', 'y2'])
cycle.nonlinear_solver = NonlinearBlockJac()
prob.setup(mode='fwd')
prob.set_solver_print(level=2)
prob.run_model()
print(prob['z2'])
print(prob['z'])
if prob.model.comm.rank == 0:
print(prob['y1'])
if prob.model.comm.rank == 0:
print(prob['y2'])
There are a few minor issues with this. 1) it means your script is now different for serial and parallel. 2) Its annoying. So we're working on a fix that will let things work more cleanly by automatically doing an MPI broadcast when you try to get a value thats no on your proc. That will be released in V2.6.
One other small note. I changed you NL solver over to NonLinearBlockJac. That is for Block Jacobi, which is designed to work in parallel. You could also use the Newton solver in parallel The Gauss-Seidel solver won't actually allow you to get parallel speedups.
Related
Most of my test files involve the creation of an IndepVarComp that gets connected to a group. When I go to create an XDSM from the test file, it only shows the IndepVarComp Box and the Group Box. Is there a way to get it to expand the group and show what's inside?
This would also be useful when dealing with a top level model that contains many levels of groups where I want to expand one or two levels and leave the rest closed.
There is a recurse option, which controls if groups are expanded or not. Here is a small example with the Sellar problem to explore this option. The disciplines d1 and d2 are part of a Group called cycle.
import numpy as np
import openmdao.api as om
from openmdao.test_suite.components.sellar import SellarNoDerivatives
from omxdsm import write_xdsm
prob = om.Problem()
prob.model = model = SellarNoDerivatives()
model.add_design_var('z', lower=np.array([-10.0, 0.0]),
upper=np.array([10.0, 10.0]), indices=np.arange(2, dtype=int))
model.add_design_var('x', lower=0.0, upper=10.0)
model.add_objective('obj')
model.add_constraint('con1', equals=np.zeros(1))
model.add_constraint('con2', upper=0.0)
prob.setup()
prob.final_setup()
# Write output. PDF will only be created, if pdflatex is installed
write_xdsm(prob, filename='sellar_pyxdsm', out_format='pdf', show_browser=True,
quiet=False, output_side='left', recurse=True)
The same code with recurse=False (d1 and d2 are not shown, instead their Group cycle):
To enable the recursion from the command line, use the --recurse flag:
openmdao xdsm sellar_pyxdsm.py -f pdf --recurse
With the function it is turned on by default, in the command line you have to include the flag. If this does not work as expected for you, please provide an example.
You can find a lot of examples with different options in the tests of the XDSM plugin. Some of the options, like recurse, include_indepvarcomps, include_solver and model_path control what is included in the XDSM.
I want to distinguish optimizers (gradients based and free). If I use the sample optimization in the main webpage of OpenMDAO which uses SLSQP and check if the optimizer supports gradients I get "False" as in;
prob.driver.supports['gradients']
is this OpenMDAO or Scipy related issue?
Is there another way to see if the optimizer will use the gradient calculations or not before the problem is run.
Based on the answer below I added this in the beginning of my script; Thanks!
prob = om.api.Problem()
prob.driver = user_driver_object
prob.setup()
prob.final_setup()
grads.append(prob.driver.supports['gradients'])
In the ScipyOptimizeDriver not all optimizers support gradient optimization, hence you cannot determine the correct value until you setup your driver This is done in final_setup() of your problem (which calls _setup_driver() in your driver). This method is called in run_model() and run_driver(), but you can also call it in itself to get the correct properties of your optimizer.
In the example below I am asking the driver 3 times if it supports gradients. The first time, after the problem setup it gives a False answer (the default), because the driver was not touched yet. If I call final_setup(), this will setup the driver, and all the properties of the driver will be correct. If run_model() or run_driver() is called, of course this will also setup the driver.
So my advice is to just use final_setup() before querying anything from your driver, that can change during the setup (which are mostly optimizer specific properties).
import openmdao.api as om
# build the model
prob = om.Problem()
indeps = prob.model.add_subsystem('indeps', om.IndepVarComp())
indeps.add_output('x', 3.0)
indeps.add_output('y', -4.0)
prob.model.add_subsystem('paraboloid', om.ExecComp('f = (x-3)**2 + x*y + (y+4)**2 - 3'))
prob.model.connect('indeps.x', 'paraboloid.x')
prob.model.connect('indeps.y', 'paraboloid.y')
# setup the optimization
driver = prob.driver = om.ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
prob.model.add_design_var('indeps.x', lower=-50, upper=50)
prob.model.add_design_var('indeps.y', lower=-50, upper=50)
prob.model.add_objective('paraboloid.f')
prob.setup()
print("\nSupports gradients (after setup)?")
print(prob.driver.supports['gradients'])
prob.final_setup()
print("\nSupports gradients (after final setup)?")
print(prob.driver.supports['gradients'])
prob.run_driver()
print("\nSupports gradients (after run)?")
print(prob.driver.supports['gradients'])
This results in the following output:
Supports gradients (after setup)?
False
Supports gradients (after final setup)?
True
Optimization terminated successfully. (Exit mode 0)
Current function value: -27.33333333333333
Iterations: 5
Function evaluations: 6
Gradient evaluations: 5
Optimization Complete
-----------------------------------
Supports gradients (after run)?
True
Upon trying to calculate precision#k, I get an exception. To what follows is the a simple code that reproduces the problem.
First the code defines the variable scope:
initializer = tf.random_uniform_initializer(-0.1, 0.1, seed=1234)
with tf.variable_scope("model", reuse=None, initializer=initializer)
Then it calls those lines:
predictions = tf.Variable(tf.ones([2, 10], tf.int64))
labels = tf.Variable(tf.ones([2, 1], tf.int64))
precision = tf.contrib.metrics.streaming_sparse_precision_at_k(predictions, labels, 5)
tf.initialize_all_variables().run()
(I know this code is meaningless, and tries to calculate the precision given 2 fixed matrices...)
Then I get the following exception:
W tensorflow/core/framework/op_kernel.cc:936] Failed precondition:
Attempting to use uninitialized value
model/precision_at_5/false_positive_at_5 [[Node:
model/precision_at_5/false_positive_at_5/read = IdentityT=DT_DOUBLE,
_class=["loc:#model/precision_at_5/false_positive_at_5"], _device="/job:localhost/replica:0/task:0/gpu:0"]]
The same goes when I tried to invoke streaming_sparse_recall_at_k instead of streaming_sparse_precision_at_k.
The installed version is r0.10 on linux with python 2.7.
Please help... Thanks in advance :)
Unfortunately, tf.initialize_all_variables() doesn't initialize "local" variables (which tend to be internal implementation details for ops like tf.contrib.metrics.streaming_sparse_precision_at_k() and tf.train.string_input_producer(), as opposed to variables used as model weights).
You'll need to add a line to your program that runs tf.initialize_local_variables() before running the evaluation op:
sess.run(tf.initialize_local_variables()) # or `tf.initialize_local_variables().run()`
I'm trying to get a parallel workflow to run in which I'm evaluating over 1000 parallel cases inside a ParallelGroup. If I run on a low amount of cores it doesn't crash, but increasing the number of nodes at some point raises an error, which indicates that it relates to how the problem is partitioned.
I'm getting an error from the deep dungeons of OpenMDAO and PETSc, relating to the target indices when setting up the communication tables as far as I can see. Below is a print of the traceback of the error:
File "/home/frza/git/OpenMDAO/openmdao/core/group.py", line 454, in _setup_vectors
impl=self._impl, alloc_derivs=alloc_derivs)
File "/home/frza/git/OpenMDAO/openmdao/core/group.py", line 1456, in _setup_data_transfer
self._setup_data_transfer(my_params, None, alloc_derivs)
File "/home/frza/git/OpenMDAO/openmdao/core/petsc_impl.py", line 125, in create_data_xfer
File "/home/frza/git/OpenMDAO/openmdao/core/petsc_impl.py", line 397, in __init__
tgt_idx_set = PETSc.IS().createGeneral(tgt_idxs, comm=comm)
File "PETSc/IS.pyx", line 74, in petsc4py.PETSc.IS.createGeneral (src/petsc4py.PETSc.c:74696)
tgt_idx_set = PETSc.IS().createGeneral(tgt_idxs, comm=comm)
File "PETSc/arraynpy.pxi", line 121, in petsc4py.PETSc.iarray (src/petsc4py.PETSc.c:8230)
TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'
this answer:
https://scicomp.stackexchange.com/questions/2355/32bit-64bit-issue-when-working-with-numpy-and-petsc4py/2356#2356
led me to look for where you set up the tgt_idxs vector to see whether its defined with the correct dtype PETSc.IntType. But so far I only get Petsc has generated inconsistent data errors when I try to set the dtype of arrays I think may be causing the error.
I've not yet tried to reinstall PETSc with --with-64-bit-indices as suggested in the answer I linked to. Do you run PETSc configured this way?
edit:
I've now set up a stripped down version of the problem that replicates the error I get:
import numpy as np
from openmdao.api import Component, Group, Problem, IndepVarComp, \
ParallelGroup
class Model(Component):
def __init__(self, nsec, nx, nch):
super(Model, self).__init__()
self.add_output('outputs', shape=[nx+1, nch*6*3*nsec])
def solve_nonlinear(self, params, unknowns, resids):
pass
class Aggregate(Component):
def __init__(self, nsec, ncase, nx, nch, nsec_env=12):
super(Aggregate, self).__init__()
self.ncase = ncase
for i in range(ncase):
self.add_param('outputs_sec%03d'%i, shape=[nx+1, nch*6*3*nsec])
for i in range(nsec):
self.add_output('aoutput_sec%03d' % i, shape=[nsec_env, 6])
def solve_nonlinear(self, params, unknowns, resids):
pass
class ParModel(Group):
def __init__(self, nsec, ncase, nx, nch, nsec_env=12):
super(ParModel, self).__init__()
pg = self.add('pg', ParallelGroup())
promotes = ['aoutput_sec%03d' % i for i in range(nsec)]
self.add('agg', Aggregate(nsec, ncase, nx, nch, nsec_env), promotes=promotes)
for i in range(ncase):
pg.add('case%03d' % i, Model(nsec, nx, nch))
self.connect('pg.case%03d.outputs'%i, 'agg.outputs_sec%03d'%i)
if __name__ == '__main__':
from openmdao.core.mpi_wrap import MPI
if MPI:
from openmdao.core.petsc_impl import PetscImpl as impl
else:
from openmdao.core.basic_impl import BasicImpl as impl
p = Problem(impl=impl, root=Group())
root = p.root
root.add('dlb', ParModel(20, 1084, 36, 6))
import time
t0 = time.time()
p.setup()
print 'setup time', time.time() - t0
Having done that I can also see that the data size ends up becoming enormous due to the many cases we evaluate. I'll see if we can somehow reduce the data sizes. I can't actually get this to run at all now, since it either crashes with an error:
petsc4py.PETSc.Errorpetsc4py.PETSc.Error: error code 75
[77] VecCreateMPIWithArray() line 320 in /home/MET/Python-2.7.10_Intel/opt/petsc-3.6.2/src/vec/vec/impls/mpi/pbvec.c
[77] VecSetSizes() line 1374 in /home/MET/Python-2.7.10_Intel/opt/petsc-3.6.2/src/vec/vec/interface/vector.c
[77] Arguments are incompatible
[77] Local size 86633280 cannot be larger than global size 73393408
: error code 75
or the TypeError.
The data sizes that you're running with are definitely larger than can be expressed by 32 bit indices, so recompiling with --with-64-bit-indices makes sense if you're not able to decrease your data size. OpenMDAO uses PETSc.IntType internally for our indices, so they should become 64 bit in size if you recompile.
I've never used that option on petsc. A while back we did have some problems scaling up to larger numbers of cores, but we determined that the problem for us was with the OpenMPI compiling. Re-compiling OpenMDAO fixed our issues.
Since this error shows up on setup, we don't need to run to test the code. If you can provide us with the model that is showing the problem, and we can run it, then we can at least verify if the same problem is happening on our clusters.
It would be good to know how many cores you can successfully run on and at what point it breaks down too.
A PyPy callback, that works perfectly (in an infinite loop) when implemented (straightforwardly) as method of a Python object, segfaults after approximately 100 iterations when I move the Python object into a separate multiprocessing process.
In the main code I have:
import multiprocessing as mp
class Task(object):
def __init__(self, com, lib):
self.com = com # communication queue
self.lib = lib # ffi library
self.proc = mp.Process(target=self.spawn, args=(self.com,))
self.register_callback()
def spawn(self, com):
print('%s spawned.'%self.name)
# loop (keeping 'self' alive) until BREAK:
while True:
cmd = com.get()
if cmd == self.BREAK:
break
print("%s stopped."%self.name)
#ffi.calback("int(void*, Data*"): # old cffi (ABI mode)
def callback(self, data):
# <work on data>
return 1
def register_callback(self):
s = ffi.new_handle(self)
self.lib.register_callback(s, self.callback) # C-call
The idea is that multiple tasks should serve an equal number of callbacks concurrently. I have no clue what may cause the segfault, especially since it runs fine for the first ~100 iterations or so. Help much appreciated!
Solution
Handle 's' is garbage collected when returning from 'register_callback()'. Making the handle an attribute of 'self' and passing keeps it alive.
Standard CPython (cffi 1.6.0) segfaulted at the first iteration (i.e. gc was immediate) and provided me a crucial informative error message. PyPy on the other hand segfaulted after approximately 100 iterations without providing a message... Both run fine now.