How can I increase the number of mini-batch using the Standard Updater class in Chainer substantially? - chainer

How can I increase the number of mini-batch using the Standard Updater class in Chainer substantially?
In case of PyTorch,
I can increase the number of mini-batch substantially.
Execute loss.backward() every time.
Execute optimizer.step() / optimizer.zero_grad() once every three times.
This effectively increase the number of mini-batch substantially.
Question 1.
In case of Chainer,
Is it possible to increase the number of mini-batch substantially?
Execute loss.backward() every time.
Execute net.cleargrads() / optimizer.update() once every three times.
Can this increase the number of mini-batch substantially?
Question 2.
In fact, I'm using the StandardUpdater class.
Is it possible to increase the number of mini-batch using any of hyper parameters substantially?
Or should I make my class that inherits from StandardUpdater class and change the implementation above?
I'm sorry if the questions have already been asked.
I hope any advice.

(Question seems quite old, but I stumbled upon it and wanted to share my solution to the question)
You would basically do it the same way you do it in PyTorch. Unfortunately, the StandardUpdater has neither a hyper-parameter that supports it nor an implementation for "mini-batch updates". But here is my implementation, how I did it (basically as you mentioned in your question: inherit from the StandardUpdater and re-implement the update_core method):
from chainer.training import StandardUpdater
from chainer.dataset import convert
class MiniBatchUpdater(StandardUpdater):
"""
The iterator outputs batches in mini-batch sizes. This updater
cummulates the gradients of these mini-batches until the
update_size is reached. Then a parameter update is performed
"""
def __init__(self, update_size=32, *args, **kwargs):
super(MiniBatchUpdater, self).__init__(*args, **kwargs)
self.update_size = update_size
self.iteration_counter = 0
def update_core(self):
optimizer = self.get_optimizer('main')
loss_func = self.loss_func or optimizer.target
it = self.get_iterator('main')
batch = it.next()
data = convert._call_converter(self.converter, batch, self.device)
use_cleargrads = getattr(optimizer, '_use_cleargrads', True)
if use_cleargrads and self.iteration_counter == 0:
optimizer.target.cleargrads()
self.iteration_counter += it.batch_size
loss = loss_func(*data)
loss.backward()
if self.iteration_counter >= self.update_size:
self.iteration_counter = 0
optimizer.update()
The implementation is quite old (I think for chainer 4 or 5), but I works for me with chainer 7.8 as well. One could update some lines to match the newer implementation of the update_core method, but as I said, it works for me. Hopefully it helps ;)

Related

Is it possible to add an "explicit output" to an implicit component without extra computational effort compared to an explicit component?

While trying to figure out if the code can be simplified to avoid some duplication, I was wondering if it is possible to add an explicit output to an implicit component without adding extra computation effort compared to an explicit component. Explicit output may not be a fully correct term here though, since it depends on another output that is determined implicitly. Taking the node implicit component example from the docs:
class Node(om.ImplicitComponent):
"""Computes voltage residual across a node based on incoming and outgoing current."""
def initialize(self):
self.options.declare('n_in', default=1, types=int, desc='number of connections with + assumed in')
self.options.declare('n_out', default=1, types=int, desc='number of current connections + assumed out')
def setup(self):
self.add_output('V', val=5., units='V')
for i in range(self.options['n_in']):
i_name = 'I_in:{}'.format(i)
self.add_input(i_name, units='A')
for i in range(self.options['n_out']):
i_name = 'I_out:{}'.format(i)
self.add_input(i_name, units='A')
def setup_partials(self):
#note: we don't declare any partials wrt `V` here,
# because the residual doesn't directly depend on it
self.declare_partials('V', 'I*', method='fd')
def apply_nonlinear(self, inputs, outputs, residuals):
residuals['V'] = 0.
for i_conn in range(self.options['n_in']):
residuals['V'] += inputs['I_in:{}'.format(i_conn)]
for i_conn in range(self.options['n_out']):
residuals['V'] -= inputs['I_out:{}'.format(i_conn)]
When we would like to calculate the power going through the node, one option would be to create an explicit component that takes the node voltage and each of the node current in and outs as inputs to calculate the power, and group it with the implicit component. However, since all of the parameters are already available inside the implicit component, and this approach duplicates some current in/out loops between the components, I was wondering if this can be done directly within the implicit component. Since the docs example mentions "The solve_nonlinear method provides a way to explicitly define an output within an implicit component":
def solve_nonlinear(self, inputs, outputs):
total_abs_current = 0
for i_conn in range(self.options['n_in']):
total_abs_current += np.abs(inputs['I_in:{}'.format(i_conn)])
for i_conn in range(self.options['n_out']):
total_abs_current += np.abs(inputs['I_out:{}'.format(i_conn)])
outputs['P_total'] = total_abs_current * outputs['V'] / 2
Reading on further, the docs say it is still necesarry to also add a power residual under the apply_nonlinear() method. Hence, something like:
def apply_nonlinear(self, inputs, outputs, residuals):
residuals['V'] = 0
total_abs_current = 0
for i_conn in range(self.options['n_in']):
residuals['V'] += inputs['I_in:{}'.format(i_conn)]
total_abs_current += np.abs(inputs['I_in:{}'.format(i_conn)])
for i_conn in range(self.options['n_out']):
residuals['V'] -= inputs['I_out:{}'.format(i_conn)]
total_abs_current += np.abs(inputs['I_out:{}'.format(i_conn)])
residuals['P_total'] = outputs['P_total'] - total_abs_current * outputs['V'] / 2
But will the component actually use this function to "solve" for the power, even when solve_linear() specifies/calculates the power already explicitely? Will this implementation then therefore require more computational resources compared to the explicit component approach? And when specifying the partials through the linearize() method, should they follow the apply_nonlinear() or solve_nonlinear() calculation?
I typically call this kind of situation a pseudo-implicit output. You have an analytic expression so you don't really need it to be implicit, but you want to stick the calculation in with a bunch of other implicit stuff. You have the basic layout right. You write a solve_nonlinear method that does the calculation for you, and you add the residual form in the apply_nonlinear.
But will the component actually use this function to "solve" for the
power, even when solve_linear() specifies/calculates the power already
explicitly?
Yes .. and no :) The simple answer is that (in most cases) the solve_nonlinear method will ultimately provide the value for the pseudo-implicit output as part of the whole global nonlinear solve. The residual form will effectively always return 0 for that particular variable. This holds true if you are using a block gauss-seidel solver, or a newton solver with solve_subsystems turned on.
The more subtle situation happens if you use a pure newton method (without solve_subsystem). In that case the residual form is actually driving the entire calculation, and the solve_nonlinear method of any implicit component is not ever getting called. This is not a super common mode of running the newton solver, but it does come up often enough.
I would say that the pseudo-implicit output gives you the flexibility to work either way with no real loss of performance. As I'll discuss below, there isn't any practical difference between this and just breaking it out into an explicit component anyway.
Will this implementation then therefore require more computational
resources compared to the explicit component approach?
The short answer is no, at least not by any meaningful amount. The long answer requires diving into the math of newton solvers and understanding how OpenMDAO really does ExplicitComponents. For all the details, you should check out section 5.1 of the OpenMDAO paper along with the implicit transformation that OpenMDAO does internally for all ExplicitComponents.
In summary, explicit components in OpenMDAO do the exact same thing that you did in the apply_linear is what OpenMDAO does internally anyway when it needs to compute a residual. So your implementation doesn't really add anything more or less than OpenMDAO already does in the background.
residuals['P_total'] = outputs['P_total'] - total_abs_current * outputs['V'] / 2
There is one caveat here though. I'll exaggerate to make the situation clear. Lets say you had a single scaler implicit relationship in a component, that you then add 1e6 pseudo-implicit outputs to it as well. In that case, you are better of splitting them up because you're making the newton system a lot larger and more expensive. But generally, adding a few extra pseudo-explicit outputs won't make much of a difference at all.
When specifying the partials through the linearize() method,
should they follow the apply_nonlinear() or solve_nonlinear()
calculation?
Differentiate the apply_nonlinear. Don't worry about what you did in the solve_nonlinear at all in the context of derivatives for implicit components!

Approximating the whole problem with Mixed Analytical strategy

I have problem where I have implemented analytical derivatives for some components and I'm using complex step for the rest. There is a cyclic dependency between them so I also use a solver to converge them. It converges when I use NonlinearBlockGS. But when I use NewtonSolver in combination with a linear solver the optimization fails (Iteration limit exceeded), even with high iteration count. But I found that it converges easily and works perfectly when I use prob.model.approx_totals(). I read that approx_totals uses fd or cs to find the model gradients. So I have two questions.
In general, Will I lose the benefits from the mixed-analytical approach when I use approx_totals()? Is there a way to find the derivatives of whole model (or group) with mixed analytical strategy ? (Anyway In my case the explicitcomponents which are coupled use 'complex step`. But I'm just curious about this.)
In general (not in this scenario), will Openmdao automatically detect the mixed strategy or should I specify it some how ?
I will also be grateful, if you could point me to some examples where mixed derivatives are used. I didnt have any luck finding them myself.
Edit:Adding Example. I am not able to reproduce the issue in a sample code. Also I dont want to waste your time with my code(there more than 30 ExplicitComponents and 7 Groups). So I made a simple structure below to explain it better. In this there are 7 components A to G and only F and G doesn't have analytical derivatives and uses FD.
import openmdao.api as om
import numpy as np
class ComponentA_withDerivatives(om.ExplicitComponent):
def setup(self):
#setup inputs and outputs
def setup_partials(self):
#partial declaration
def compute(self, inputs, outputs):
def compute_partials(self, inputs, J):
#Partial definition
class ComponentB_withDerivatives(om.ExplicitComponent):
.....
class ComponentC_withDerivatives(om.ExplicitComponent):
......
class ComponentD_withDerivatives(om.ExplicitComponent):
......
class ComponentE_withDerivatives(om.ExplicitComponent):
......
class ComponentF(om.ExplicitComponent):
def setup(self):
#setup inputs and outputs
self.declare_partials(of='*', wrt='*', method='fd')
def compute(self,inputs,outputs):
# Computation
class ComponentG(om.ExplicitComponent):
def setup(self):
#setup inputs and outputs
self.declare_partials(of='*', wrt='*', method='fd')
def compute(self,inputs,outputs):
# Computation
class GroupAB(om.Group):
def setup(self):
self.add_subsystem('A', ComponentA_withDerivatives(), promotes_inputs=['x','y'], promotes_outputs=['z'])
self.add_subsystem('B', ComponentB_withDerivatives(), promotes_inputs=['x','y','w','u'], promotes_outputs=['k'])
class GroupCD(om.Group):
def setup(self):
self.add_subsystem('C', ComponentC_withDerivatives(), .....)
self.add_subsystem('D', ComponentD_withDerivatives(), ...)
class Final(om.Group):
def setup(self):
cycle1 = self.add_subsystem('cycle1', om.Group(), promotes=['*'])
cycle1.add_subsystem('GroupAB', GroupAB())
cycle1.add_subsystem('ComponentF', ComponentF())
cycle1.linear_solver = om.DirectSolver()
cycle1.nonlinear_solver = om.NewtonSolver(solve_subsystems=True)
cycle2 = self.add_subsystem('cycle2', om.Group(), promotes=['*'])
cycle2.add_subsystem('GroupCD', GroupCD())
cycle2.add_subsystem('ComponentE_withDerivatives', ComponentE_withDerivatives())
cycle2.linear_solver = om.DirectSolver()
cycle2.nonlinear_solver = om.NewtonSolver(solve_subsystems=True)
self.add_subsystem('ComponentG', ComponentG(), promotes_inputs=['a1','a2','a3'], promotes_outputs=['b1'])
prob = om.Problem()
prob.model = Final()
prob.driver = om.pyOptSparseDriver()
prob.driver.options['optimizer'] = 'SNOPT'
prob.driver.options['print_results']= True
## Design Variables
## Costraints
## Objectives
# Setup
prob.setup()
##prob.model.approx_totals(method='fd')
prob.run_model()
prob.run_driver()
Here this doesn't work. The cycle1 doesn't converge. The code works when I completely remove cycle1 or use NonlinearBlockGS instead of Newton or if I uncomment prob.model.approx_total(method='FD'). (no problem with cycle2. Work with Newton)
So if I don't use approx_totals(), I am assuming Openmdao uses a mixed strategy. Or should I manually mention it somehow ? And when I do use approx_totals() , will I lose the benefits from the analytical derivatives that I do have?
The code example you provided isn't runnable, so I'll have to make a few guesses. You call both run_model() and run_driver(). You bothered to include an optimizer in your sample code though, and you've show approx_totals to be called at the top of the model hierarchy.
So when you say it does not work, I will assume you mean that the optimizer doesn't converge.
You have understood the behavior of approx_totals correctly. When you set that at the top of your model, then OpenMDAO will FD the relevant variables from the group level. In this case, that means you will also be FD-ing across the solver itself. You say that this seems to work, but the mixed analytic approach does not.
In general, Will I lose the benefits from the mixed-analytical approach when I use approx_totals()?
Yes. You are no long using a mixed approach. You are just FD-ing across the model monolithically.
Is there a way to find the derivatives of whole model (or group) with mixed analytical strategy ?
OpenMDAO is computing total derivatives with a mixed strategy when you don't use approx_totals. The issue is that for your model, it seems not to be working.
In general (not in this scenario), will Openmdao automatically detect the mixed strategy?
It will "detect" it (it doesn't actually detect anything, but the underlying algorithms will use a mixed strategy UNLESS you tell it not to with approx_totals. Again, the issue is not that a mixed strategy is not being used, but that it is not working.
So why isn't the mixed strategy working?
I can only guess, since I can't run the code... so YMMV.
You mention that you are using complex-step for partials of your explicit components. Complex-step is a much more accurate approximation scheme than FD, but it is not without its own flaws. Not every computation is complex-safe. Some can be re-written to be complex-safe, others can not.
By "complex-safe" I mean that the computation correctly handles the complex-part to give a derivatives.
Two commonly used-complex-safe methods are np.linalg.norm and np.abs. Both will happily accept complex-numbers and give you an answer, but it is not the correct answer for when you need derivatives.
Because of this, OpenMDAO ships with a set of custom functions that are cs-safe --- custom norm and abs are provided.
What typically happens with non cs-safe methods is that the complex-part somehow gets dropped off and you get 0 partial derivatives. Wrong partials, wrong totals.
To check this, make sure you call check_partials on your components that are being complex-stepped, using a finite-difference check. You'll probably find some discrepancies.
The fixes available to you are:
Switch those components to use FD partials. Less accurate, but will probably work
Correct whatever problems in your compute are making your code non-cs-safe. Use OpenMDAO's custom functions if thats the problem, or possibly you need to be more careful about how you allocate and use numpy arrays in your compute (if you're allocating your own arrays, then you need to be careful to make sure they are complex too!).

How to properly connect a scalar to a vector entry?

We're searching a way to connect scalars (as an output) to vector entries (as an input).
In the "Nonlinear Circuit Analysis" example, there is a workaround in the class Node which loops over the number of scalars and adds each scalar as a new input. In the class Circuit, the added inputs are then accessed by their "indices" (e.g. 'I_in:0').
In our case, this loop must be integrated by a new Component, which solely loops the new inputs. This is why we'd like to avoid loops and directly use vector and matrix operations. In terms of the Circuit example, a way to achieve this would be to use some kind of target indices (see tgt_indices), which are not implemented (yet 😊).
In this case both classes would look like this:
class Node(om.ImplicitComponent):
"""Computes voltage residual across a node based on incoming and outgoing current."""
def initialize(self):
self.options.declare('n_in', default=1, types=int, desc='number of connections with + assumed in')
self.options.declare('n_out', default=1, types=int, desc='number of current connections + assumed out')
def setup(self):
self.add_output('V', val=5., units='V')
self.add_input('I_in', units='A', shape=self.options['n_in'])
self.add_input('I_out', units='A', shape=self.options['n_out'])
def apply_nonlinear(self, inputs, outputs, residuals):
residuals['V'] = 0.
residuals['V'] += inputs['I_in'].sum()
residuals['V'] -= inputs['I_out'].sum()
class Circuit(om.Group):
def setup(self):
self.add_subsystem('n1', Node(n_in=1, n_out=2), promotes_inputs=[('I_in')])
self.add_subsystem('n2', Node()) # leaving defaults
self.add_subsystem('R1', Resistor(R=100.), promotes_inputs=[('V_out', 'Vg')])
self.add_subsystem('R2', Resistor(R=10000.))
self.add_subsystem('D1', Diode(), promotes_inputs=[('V_out', 'Vg')])
self.connect('n1.V', ['R1.V_in', 'R2.V_in'])
self.connect('R1.I', 'n1.I_out', tgt_indices=[0])
self.connect('R2.I', 'n1.I_out', tgt_indices=[1])
self.connect('n2.V', ['R2.V_out', 'D1.V_in'])
self.connect('R2.I', 'n2.I_in', tgt_indices=[0])
self.connect('D1.I', 'n2.I_out', tgt_indices=[0])
...
So the main aspect is to connect output scalars to entries of an input vector similar to the src_indices option. Is there a way to do this or a reason against this?
Since we plan to use Dymos we`d like to use this functionality one dimension higher and connect output vectors to rows of input matrices.
You are correct that there is currently no tgt_indices like feature in OpenMDAO. Though it is technically feasible, it does present some API design and internal practical challenges. If you feel strongly about the need/value for this feature, you could consider submitting a POEM describing your proposed API for the dev-team to consider. You have a start on it with your provided example, but you'd need to think through details such as the following:
what happens if a user gives both src_indices and tgt_indices?
What do error msgs look like if there are overlapping tgt_indices
How does the api extend to the promotes function.
In the meantime you'll either need to use a MuxComponent, or write your own version of that component that would take in array inputs and push them into the combined matrix. Its slightly inefficient to add a component like this, but in the grand scheme of things it should not be too bad (as long as you take the time to define analytic derivatives for it. It would be expensive to CS/FD this component).

Creation of a 'partial objective' in OpenMDAO

I am creating a program that optimizes a set of coupled subcomponents to minimize for their total mass. Currently each component is a group that has a promoted output for it's mass and then another group group exists at the top level that takes each of these masses as inputs, computes the sum, and then this sum is used as the objective for the optimizer.
This program is designed to be operated by a user where the type and number of subcomponents is set at runtime. This proves problematic for my statically declared mass summing group that would need to change it's inputs depending on what components are added at runtime.
I was therefor wondering if is there a way to declare a 'partial objective' where each of these partial pieces would be summed together for the final objective processed by the ScipyOptimize Driver? The 'partial objectives', design variable and constraints could simply be added in each subsystem, and the subsystem is added to the model, they would ready to go to fit into the larger optimization.
Another way could be some sort of summer behavior in a group where the inputs to be summed were exclusively created via glob pattern. Something along the lines of
self.add_subsystem('sum', Summer(inputs='mass:*'))
Is there any way to achieve either of these types of functionality in OpenMDAO 3.1.1?
In OpenMDAO V3.1, there is a configure method that will let you accomplish what you want --- subject to a few caveats. The first caveat is that in V3.1 you can inspect the I/O of components from within a group configure but you can not inspect the I/O of child groups. This is something we are working to remedy, but as of V3.1 this restriction is present.
None the less, here is some code that accomplishes what I think you were seeking. Its not super clean, but it does achieve the kind of reactive setup that you were going for.
import openmdao.api as om
class Summer(om.ExplicitComponent):
def setup(self):
# note: will add inputs via the configure method of parent group
self.add_output('total_mass')
self.declare_partials('total_mass', wrt='*', val=1)
def compute(self, inputs, outputs):
outputs['total_mass'] = 0
for inp_name in inputs:
outputs['total_mass'] += inputs[inp_name]
class TotalMass(om.Group):
def setup(self):
# Only add the summing comp, others will be added by users
self.add_subsystem('sum', Summer())
def configure(self):
sum_comp = self.sum
# NOTE: need to access some private attributes of the group here,
# so this is a little fragile, but works as of OM V3.1
for subsys in self._subsystems_myproc:
s_name = subsys.name
if s_name == 'sum':
continue
i_name = f'{s_name}_mass'
sum_comp.add_input(i_name)
self.connect(f'{s_name}.mass', f'sum.{i_name}')
if __name__ == "__main__":
p = om.Problem()
tm = p.model.add_subsystem('tm', TotalMass())
tm.add_subsystem('part_1', om.ExecComp('mass=3+x'))
tm.add_subsystem('part_2', om.ExecComp('mass=5+x'))
p.setup()
p.run_model()
p.model.list_outputs()
We're planning changes that will make more model introspection at the time of setup/configure possible. Until those changes are implemented, then the typical way of achieving this is similar to what you've implemented. Without introspection, you need to give Summer the names of the inputs it should expect (not wildcard-based).
You can give your systems which compute mass some attribute, for instance 'mass_output_name'.
Then, you could iterate through all such systems:
mass_output_systems = [sys_a, sys_b, sys_c]
mass_names = [sys.mass_output_name for sys in mass_output_systems]
And then feed these to your summing subsystem:
self.add_subsystem('sum', Summer(inputs=mass_names))

How to use the BCELoss in PyTorch?

I want to write a simple autoencoder in PyTorch and use BCELoss, however, I get NaN out, since it expects the targets to be between 0 and 1. Could someone post a simple use case of BCELoss?
Update
The BCELoss function did not use to be numerically stable. See this issue https://github.com/pytorch/pytorch/issues/751. However, this issue has been resolved with Pull #1792, so that BCELoss is numerically stable now!
Old answer
If you build PyTorch from source, you can use the numerically stable function BCEWithLogitsLoss(contributed in https://github.com/pytorch/pytorch/pull/1792), which takes logits as input.
Otherwise, you can use the following function (contributed by yzgao in the above issue):
class StableBCELoss(nn.modules.Module):
def __init__(self):
super(StableBCELoss, self).__init__()
def forward(self, input, target):
neg_abs = - input.abs()
loss = input.clamp(min=0) - input * target + (1 + neg_abs.exp()).log()
return loss.mean()
You might want to use a sigmoid layer at the end of the network. In that way the number would represent probabilities. Also make sure that the targets are binary numbers. If you post your complete code we might help more.

Resources