OpenMDAO generating coloring files twice - openmdao

I'm wondering if this is correct. Most of the Implicit and Explicit components I have created use the line:
self.declare_coloring(wrt='*', method='cs', tol=1.0E-12, show_sparsity=True)
Then when I get to the file that runs the driver I use:
p.driver.declare_coloring()
And in my /coloring_files directory I have a 'col' and a 'disc' for each component.
coloring_traj_phases_phase0_rhs_col_brakeThrottle.pkl coloring_traj_phases_phase0_rhs_disc_implicitOutputs.pkl
coloring_traj_phases_phase0_rhs_col_implicitOutputs.pkl coloring_traj_phases_phase0_rhs_disc_powerTrain.pkl
coloring_traj_phases_phase0_rhs_col_powerTrain.pkl coloring_traj_phases_phase0_rhs_disc_spin.pkl
coloring_traj_phases_phase0_rhs_col_spin.pkl coloring_traj_phases_phase0_rhs_disc_timeAdder.pkl
coloring_traj_phases_phase0_rhs_col_timeAdder.pkl coloring_traj_phases_phase0_rhs_disc_timeSpace.pkl
coloring_traj_phases_phase0_rhs_col_timeSpace.pkl coloring_traj_phases_phase0_rhs_disc_tracking.pkl
coloring_traj_phases_phase0_rhs_col_tracking.pkl coloring_traj_phases_phase0_rhs_disc_tyreConstraint.pkl
coloring_traj_phases_phase0_rhs_col_tyreConstraint.pkl coloring_traj_phases_phase0_rhs_disc_tyre.pkl
coloring_traj_phases_phase0_rhs_col_tyre.pkl total_coloring.pkl
coloring_traj_phases_phase0_rhs_disc_brakeThrottle.pkl
Are both sets of files needed or am I repeating an operation twice? Also I'm wondering if declaring coloring with the driver is using a method other than CS? I do intent on using the total_coloring.pkl for static coloring.

Dymos can use one of two methods for transcription: The Radau Pseudospectral Method or the high-order GaussLobatto method.
The GaussLobatto method is a two-step process:
The ODE is evaluated at the "discretization" nodes.
The values and rates at the discretization nodes are used to interpolate the state and state rates to the "collocation" nodes.
The ODE is evaluated a second time at the collocation nodes using the interpolated state values from step 2.
The interpolated rates are compared to the rates output by the ODE at the collocation nodes (these are called the defects) - if they're tiny, then the physics are assumed to be accurate.
The Radau transcription follows a similar process, except the collocation nodes are a subset of the discretization nodes, so interpolation isn't necessary, and the ODE only needs to be evaluated once.
If you change your transcription from dymos.GaussLobatto to dymos.Radau, then you'll only have one partial-coloring file for each of your ODE components. Otherwise, both need to have their coloring worked out separately.

Related

Error when computing jacobian vector product

I have a group with coupled disciplines which is nested in a model where all other components are uncoupled. I have assigned a nonlinear Newton and linear direct solvers to the coupled group.
When I try to run the model with default "RunOnce" solver everything is OK, but as soon as I try to run optimization I get following error raised from linear_block_gs.py:
File "...\openmdao\core\group.py", line 1790, in _apply_linear scope_out, scope_in)
File "...\openmdao\core\explicitcomponent.py", line 339, in _apply_linear
self.compute_jacvec_product(*args)
File "...\Thermal_Cycle.py", line 51, in compute_jacvec_product
d_inputs['T'] = slope * deff_dT / alp_sc
File "...\openmdao\vectors\vector.py", line 363, in setitem
raise KeyError(msg.format(name)) KeyError: 'Variable name "T" not found.'
Below is the N2 diagram of the model. Variable "T" which is mentioned in the error comes from implicit "temp" component and is fed back to "sc" component (file Thermal_Cycle.py in the error msg) as input.
N2 diagram
The error disappears when I assign DirectSolver on top of the whole model. My impression was that "RunOnce" would work as long as groups with implicit components have appropriate solvers applied to them as suggested here and is done in my case. Why does it not work when trying to compute total derivatives of the model, i.e. why compute_jacvec_product cannot find coupled variable "T"?
The reason I want to use "RunOnce" solver is that optimization with DirecSolver on top becomes very long as my variable vector "T" increases. I suspect it should be much faster with linear "RunOnce"?
I think this example of the compute_jacvec_product method might be helpful.
The problem is that, depending on the solver configuration or the structure of the model, OpenMDAO may only need some of the partials that you provide in this method. For example, your matrix-free component might have two inputs, but only one is connected, so OpenMDAO does not need the derivative with respect to the unconnected input, and in fact, does not allocate space for it in the d_inputs or d_outputs vectors.
So, to fix the problem, you just need to put an if statement before assigning the value, just like in the example.
Based on the N2, I think that I agree with your strategy of putting the direct solver down around the coupling only. That should work fine, however it looks like you're implementing a linear operator in your component, based on:
File "...\Thermal_Cycle.py", line 51, in compute_jacvec_product d_inputs['T'] = slope * deff_dT / alp_sc
You shouldn't use direct solver with matrix-free partials. The direct solver computes an inverse, which requires the full assembly of the matrix. The only reason it works at all is that OM has some fall-back functionality to manually assemble the jacobian by passing columns of the identity matrix through the compute_jacvec_product method.
This fallback mechanism is there to make things work, but its very slow (you end up calling compute_jacvec_product A LOT).
The error you're getting, and why it works when you put the direct solver higher up in the model, is probably due to a lack of necessary if conditions in your compute_jacvec_product implementation.
See the docs on explicit component for some examples, but the key insight is to realize that not every single variable will be present when doing a jacvec product (it depends on what kind of solve is being done --- i.e. one for Newton vs one for total derivatives of the whole model).
So those if-checks are needed to check if variables are relevant. This is done, because for expensive codes (i.e. CFD) some of these operations are quite expensive and you don't want to do them unless you need to.
Are your components so big that you can't use the compute_partials function? Have you tried specifying the sparsity in your jacobian? Usually the matrix-free partial derivative methods are not needed until you start working with really big PDE solvers with 1e6 or more implicit outputs variables.
Without seeing some code, its hard to comment with more detail, but in summary:
You shouldn't use compute_jacvec_product in combination with direct solver. If you really need matrix-free partials, then you need to switch to iterative linear solvers liket PetscKrylov.
If you can post the code for the the component in Thermal_Cycle.py that has the compute_jacvec_product I could give a more detailed recommendation on how to handle the partial derivatives in that case.

In chainer, How to write BPTT updater using multiple GPUs?

I don't find example because existing example only extends training.StandardUpdater, thus only use One GPU.
I assume that you are talking about the BPTTUpdater of the ptb example of Chainer.
It's not straight forward to make the customized updater support learning on multiple GPUs. The MultiprocessParallelUpdater hard code the way to compute the gradient (only the target link implementation is customizable), so you have to copy the overall implementation of MultiprocessParallelUpdater and modify the gradient computation parts. What you have to copy and edit is chainer/training/updaters/multiprocess_parallel_updater.py.
There are two parts in this file that compute gradient; one in _Worker.run, which represents a worker process task, and the other in MultiprocessParallelUpdater.update_core, which represents the master process task. You have to make these code do BPTT by modifying the code starting from _calc_loss to backward in each of these two parts:
# Change self._master into self.model for _Worker.run code
loss = _calc_loss(self._master, batch)
self._master.cleargrads()
loss.backward()
It should be modified by inserting the code of BPTTUpdater.update_core.
You also have to take care on the data iterators. MultiprocessParallelUpdater accept the set of iterators that will be distributed to master/worker processes. Since the ptb example uses a customized iterator (ParallelSequentialIterator), you have to make sure that these iterators iterate over different portions of the dataset or using different initial offsets of word positions. It may require customization to ParalellSequentialIterator as well.

Finite difference between old and new OpenMDAO

So I am converting a code from the old OpenMDAO to the new OpenMDAO. All the outputs and the partial gradients have been verified as correct. At first the problem would not optimize at all and then I realized that the old code had some components that did not provide gradients so they were automatically finite differenced. So I added fd_options['force_fd'] = True to those components but it still does not optimize to the right value. I checked the total derivative and it was still not correct. It also takes quite a bit longer to do each iteration than the old OpenMDAO. The only way I can get my new code to optimize to the same value as the old OpenMDAO code is to set each component to finite difference, even on the components that provide gradients. So I have a few questions about how finite difference works between the old and the new OpenMDAO:
When the old OpenMDAO did automatic finite difference did it only do it on the outputs and inputs needed for the optimization or did it calculate the entire Jacobian for all the inputs and outputs? Same question for the new OpenMDAO when you turn 'force_fd' to True.
Can you provide some parts of the Jacobian of a component and have it finite difference the rest? In the old OpenMDAO did it finite difference any gradients not provided unless you put missing_deriv_policy = 'assume_zero'?
So, the old OpenMDAO looked for groups of components without derivatives, and bundled them together into a group that could be finite differenced together. New OpenMDAO doesn't do that, so each of those components would be finite differenced separately.
We don't support that yet, and didn't in old OpenMDAO. We do have a story up on our pivotal tracker though, so we will eventually have this feature.
What I suspect might be happening for you is that the finite-difference groupings happened to be better in classic OpenMDAO. Consider one component with one input and 10 outputs connected to a second component with 10 inputs and 1 output. If you finite difference them together, only one execution is required. if you finite difference them individually, you need one execution of component one, and 10 executions of component two. This could cause a noticeable or even major performance hit.
Individual FD vs group FD can also cause accuracy problems, if there is an important input that has vastly different scaling than the other variables, so that the default FD stepsize of 1.0e-6 is no good. (Note: you can set a step_size when you add a param or output and it overrides the default for that var.)
Luckilly, new OpenMDAO has a way to recreate what you had in old OpenMDAO, but it is not automatic. What you would need to do is take a look at your model and figure out what components can be FD'd together, and then create a sub Group and move those components into that group. You can set fd_options['force_fd'] to True on the group, and it'll finite difference that group together. So for example, if you have A -> B -> C, with no components in between, and none have derivatives, you can move A, B, and C into a new sub Group with force_fd set to True.
If that doesn't fix things, we may have to look more deeply at your model.

OpenMDAO 1.x relevance reduction

I have a component in OpenMDAO without outputs that serves to provide inputs to the rest of the group. apply_linear in that component is being called despite the fact that the output of it is not connected. Shouldn't the relevance reduction algorithm in OpenMDAO 1.x figure out that apply_linear for this method never needs to be called?
As it turns out, relevance reduction on a per-variable basis isn't turned on by default. You can turn it on with:
prob.root.ln_solver = LinearGaussSeidel()
prob.root.ln_solver.options['single_voi_relevance_reduction'] = True
This options is set to False by default because it does use more memory by allocating separate vectors for each quantity of interest (though each vector is smaller because it only contains relevant variables, but the total size may be larger.) Also, relevance-reduction is only applicable when using Linear Gauss Seidel as the top linear solver.
My reputation isn't high enough yet to leave comments, so I'm just adding another answer instead. I just wanted to mention that if you're not running under MPI, activating single_voi_relevance_reduction is essentially free. The real increase in memory use isn't due to the vectors themselves, but instead it's due to the index arrays that we store in order to transfer the data from source arrays to target arrays. We're forced to use index arrays under MPI, because PETSc requires it, but when we're not using MPI we use python slice objects to do our data transfer. Slice objects require very little memory.

solving ODE on microcontroller

I would like to solve two ODE first order on microcontroller. It has to be evaluated every 100ms
x'=-k_{1}\cdot (x-x_{ref})\cdot e^{-b\cdot ((x-x_{obs})^{2}+(y-y_{obs})^{2})}
y'=-k_{1}\cdot (y-y_{ref})\cdot e^{-b\cdot ((x-x_{obs})^{2}+(y-y_{obs})^{2})}
Basically i thought of using euler integration (Runge-Kute I)
y(k+1)=y(k)+f(k,y(k))*dT
I expect error to be < 0.001. How do i determine how many iterations i should run until i hit that error rate ?
I guess that x and y, as well as x_{ref}, y_{ref}, x_{obs}, y_{obs} are time dependent. This limits the number of ODE solver you can use. So it can be only the Euler method and a Runge-Kutta method of 2 order (I forgot the name), which evaluate the r.h.s of you ODE only at the time points x(t), x(t+dT)ยด,x(t+2dT)`,...
You can use classical step size control with these two methods. That is you make one with step with the Euler method and one step with the RK-II method. The difference between these two steps is an indicator for the error and can be used for classical step size control. Have a look at the Numerical Recipes for more details.

Resources