I'm trying to understand the OpenMDAO error messages
RuntimeError: Singular entry found in '' for column associated with state/residual 'x'.
and
RuntimeError: Singular entry found in '' for row associated with state/residual 'y'.
Can someone explain these? E.g. When running the code
from openmdao.api import Problem, Group, IndepVarComp, ImplicitComponent, ScipyOptimizeDriver, NewtonSolver, DirectSolver, view_model, view_connections
class Test1Comp(ImplicitComponent):
def setup(self):
self.add_input('x', 0.5)
self.add_input('design_x', 1.0)
self.add_output('z', val=0.0)
self.add_output('obj')
self.declare_partials(of='*', wrt='*', method='fd', form='central', step=1.0e-4)
def apply_nonlinear(self, inputs, outputs, resids):
x = inputs['x']
design_x = inputs['design_x']
z = outputs['z']
resids['z'] = x*z + z - 4
resids['obj'] = (z/5.833333 - design_x)**2
if __name__ == "__main__":
prob = Problem()
model = prob.model = Group()
model.add_subsystem('p1', IndepVarComp('x', 0.5))
model.add_subsystem('d1', IndepVarComp('design_x', 1.0))
model.add_subsystem('comp', Test1Comp())
model.connect('p1.x', 'comp.x')
model.connect('d1.design_x', 'comp.design_x')
prob.driver = ScipyOptimizeDriver()
prob.driver.options["optimizer"] = 'SLSQP'
model.add_design_var("d1.design_x", lower=0.5, upper=1.5)
model.add_objective('comp.obj')
model.nonlinear_solver = NewtonSolver()
model.nonlinear_solver.options['iprint'] = 2
model.nonlinear_solver.options['maxiter'] = 20
model.linear_solver = DirectSolver()
prob.setup()
prob.run_model()
print(prob['comp.z'])
I get the error message:
File "C:\Scripts/mockup_component3.py", line 46, in <module>
prob.run_model()
File "C:\Python_32\lib\site-packages\openmdao\core\problem.py", line 315, in run_model
return self.model.run_solve_nonlinear()
File "C:\Python_32\lib\site-packages\openmdao\core\system.py", line 2960, in run_solve_nonlinear
result = self._solve_nonlinear()
File "C:\Python_32\lib\site-packages\openmdao\core\group.py", line 1420, in _solve_nonlinear
result = self._nonlinear_solver.solve()
File "C:\Python_32\lib\site-packages\openmdao\solvers\solver.py", line 602, in solve
fail, abs_err, rel_err = self._run_iterator()
File "C:\Python_32\lib\site-packages\openmdao\solvers\solver.py", line 349, in _run_iterator
self._iter_execute()
File "C:\Python_32\lib\site-packages\openmdao\solvers\nonlinear\newton.py", line 234, in _iter_execute
system._linearize()
File "C:\Python_32\lib\site-packages\openmdao\core\group.py", line 1562, in _linearize
self._linear_solver._linearize()
File "C:\Python_32\lib\site-packages\openmdao\solvers\linear\direct.py", line 199, in _linearize
raise RuntimeError(format_singluar_error(err, system, mtx))
RuntimeError: Singular entry found in '' for column associated with state/residual 'comp.obj'.
This error I was able to solve, by adding - outputs['obj'] in the equation for resids['obj']. But I still have little understanding as to what the two error messages mean. What matrix is it that is singular? And what does it mean to have
1) a singular entry for a column?
2) a singular entry for a row?
I realized that the cause for the singular row was that I had not defined the partial derivatives for the component. I fixed this problem by adding the command declare_partials to the top level system. The traceback gave me the clue that the matrix was related to linearization.
The case with the singular column seems related to that I had two equations in apply_nonlinear, but only one unknown (z).
Related
I am using three-dimensional convolution links (with ConvolutionND) in my chain.
The forward computation run smoothly (I checked intermediate result shapes to be sure I understood correctly the meaning of the parameters of convolution_nd), but during the backward a CuDNNError is raised with the message CUDNN_STATUS_NOT_SUPPORTED.
The cover_all parameter of ConvolutionND as its default value of False, so from the doc I don't see what can be the cause of the error.
Here is how I defind one of the convolution layers :
self.conv1 = chainer.links.ConvolutionND(3, 1, 4, (3, 3, 3)).to_gpu(self.GPU_1_ID)
And the call stack is
File "chainer/function_node.py", line 548, in backward_accumulate
gxs = self.backward(target_input_indexes, grad_outputs)
File "chainer/functions/connection/convolution_nd.py", line 118, in backward
gy, W, stride=self.stride, pad=self.pad, outsize=x_shape)
File "chainer/functions/connection/deconvolution_nd.py", line 310, in deconvolution_nd
y, = func.apply(args)
File chainer/function_node.py", line 258, in apply
outputs = self.forward(in_data)
File "chainer/functions/connection/deconvolution_nd.py", line 128, in forward
return self._forward_cudnn(x, W, b)
File "chainer/functions/connection/deconvolution_nd.py", line 105, in _forward_cudnn
tensor_core=tensor_core)
File "cupy/cudnn.pyx", line 881, in cupy.cudnn.convolution_backward_data
File "cupy/cuda/cudnn.pyx", line 975, in cupy.cuda.cudnn.convolutionBackwardData_v3
File "cupy/cuda/cudnn.pyx", line 461, in cupy.cuda.cudnn.check_status
cupy.cuda.cudnn.CuDNNError: CUDNN_STATUS_NOT_SUPPORTED
So are there special points to take care of when using ConvolutionND ?
A failing code is for instance :
import chainer
from chainer import functions as F
from chainer import links as L
from chainer.backends import cuda
import numpy as np
import cupy as cp
chainer.global_config.cudnn_deterministic = False
NB_MASKS = 60
NB_FCN = 3
NB_CLASS = 17
class MFEChain(chainer.Chain):
"""docstring for Wavelphasenet."""
def __init__(self,
FCN_Dim,
gpu_ids=None):
super(MFEChain, self).__init__()
self.GPU_0_ID, self.GPU_1_ID = (0, 1) if gpu_ids is None else gpu_ids
with self.init_scope():
self.conv1 = chainer.links.ConvolutionND(3, 1, 4, (3, 3, 3)).to_gpu(
self.GPU_1_ID
)
def __call__(self, inputs):
### Pad input ###
processed_sequences = []
for convolved in inputs:
## Transform to sequences)
copy = convolved if self.GPU_0_ID == self.GPU_1_ID else F.copy(convolved, self.GPU_1_ID)
processed_sequences.append(copy)
reprocessed_sequences = []
with cuda.get_device(self.GPU_1_ID):
for convolved in processed_sequences:
convolved = F.expand_dims(convolved, 0)
convolved = F.expand_dims(convolved, 0)
convolved = self.conv1(convolved)
reprocessed_sequences.append(convolved)
states = F.vstack(reprocessed_sequences)
logits = states
ret_logits = logits if self.GPU_0_ID == self.GPU_1_ID else F.copy(logits, self.GPU_0_ID)
return ret_logits
def mfe_test():
mfe = MFEChain(150)
inputs = list(
chainer.Variable(
cp.random.randn(
NB_MASKS,
11,
in_len,
dtype=cp.float32
)
) for in_len in [53248]
)
val = mfe(inputs)
grad = cp.ones(val.shape, dtype=cp.float32)
val.grad = grad
val.backward()
for i in inputs:
print(i.grad)
if __name__ == "__main__":
mfe_test()
cupy.cuda.cudnn.convolutionBackwardData_v3 is incompatible with some specific parameters, as described in an issue in official github.
Unfortunately, the issue only dealt with deconvolution_2d.py (not deconvolution_nd.py), therefore the decision-making about whether cudnn is used or not failed in your case, I guess.
you can check your parameter by confirming
check whether dilation parameter (!=1) or group parameter (!=1) is passed to the convolution.
print chainer.config.cudnn_deterministic, configuration.config.autotune, and configuration.config.use_cudnn_tensor_core.
Further support may be obtained by raising an issue in the official github.
The code you showed is much complicated.
To clarify the problem, the code below would help.
from chainer import Variable, Chain
from chainer import links as L
from chainer import functions as F
import numpy as np
from six import print_
batch_size = 1
in_channel = 1
out_channel = 1
class MyLink(Chain):
def __init__(self):
super(MyLink, self).__init__()
with self.init_scope():
self.conv = L.ConvolutionND(3, 1, 1, (3, 3, 3), nobias=True, initialW=np.ones((in_channel, out_channel, 3, 3, 3)))
def __call__(self, x):
return F.sum(self.conv(x))
if __name__ == "__main__":
my_link = MyLink()
my_link.to_gpu(0)
batch = Variable(np.ones((batch_size, in_channel, 3, 3, 3)))
batch.to_gpu(0)
loss = my_link(batch)
loss.backward()
print_(batch.grad)
When running the following example code:
from openmdao.api import Problem, Group, IndepVarComp, ImplicitComponent, ScipyOptimizeDriver
class Test1Comp(ImplicitComponent):
def setup(self):
self.add_input('x', 0.5)
self.add_input('design_x', 1.0)
self.add_output('z', val=0.0)
self.add_output('obj')
self.declare_partials(of='*', wrt='*', method='fd', form='central', step=1.0e-4)
def apply_nonlinear(self, inputs, outputs, resids):
x = inputs['x']
design_x = inputs['design_x']
z = outputs['z']
resids['z'] = x*z + z - 4
resids['obj'] = (z/5.833333 - design_x)**2
if __name__ == "__main__":
prob = Problem()
model = prob.model = Group()
model.add_subsystem('p1', IndepVarComp('x', 0.5))
model.add_subsystem('d1', IndepVarComp('design_x', 1.0))
model.add_subsystem('comp', Test1Comp())
model.connect('p1.x', 'comp.x')
model.connect('d1.design_x', 'comp.design_x')
prob.driver = ScipyOptimizeDriver()
prob.driver.options["optimizer"] = 'SLSQP'
model.add_design_var("d1.design_x", lower=0.5, upper=1.5)
model.add_objective('comp.obj')
prob.setup()
prob.run_model()
print(prob['comp.z'])
I get:
Traceback (most recent call last):
File "C:/Users/jonat/Desktop/mockup_component3.py", line 40, in <module>
prob.setup()
File "C:\Python\openmdao\core\problem.py", line 409, in setup
model._setup(comm, 'full', mode)
File "C:\Python\openmdao\core\system.py", line 710, in _setup
self._setup_relevance(mode, self._relevant)
File "C:\Python\openmdao\core\system.py", line 1067, in _setup_relevance
self._relevant = relevant = self._init_relevance(mode)
File "C:\Python\openmdao\core\group.py", line 693, in _init_relevance
return get_relevant_vars(self._conn_global_abs_in2out, desvars, responses, mode)
File "C:\Python\openmdao\core\group.py", line 1823, in get_relevant_vars
if 'type_' in nodes[node]:
TypeError: 'instancemethod' object has no attribute '__getitem__'
Can someone explain why? I've succesfully run a similar component, but without optimization, so I'm suspicious the error comes from the optimization constructs. For example, do I have to define the objective in an ExplicitComponent?
I get a more descriptive message when I run:
KeyError: 'Variable name "comp.y" not found.'
Which just means that component "comp" doesn't have a variable named "y" (or "z").
The issue seems to have been caused by incorrect installation of OpenMDAO. I had previously tried to install by downloading a zip-file containing OpenMDAO. Now I instead installed using pip and the error disappeared.
I’m trying to create a basic binary classifier in Pytorch that classifies whether my player plays on the right or the left side in the game Pong. The input is an 1x42x42 image and the label is my player's side (right = 1 or left = 2). The code:
class Net(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(Net, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
out = self.fc1(x)
out = self.relu(out)
out = self.fc2(out)
return out
net = Net(42 * 42, 100, 2)
# Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer_net = torch.optim.Adam(net.parameters(), 0.001)
net.train()
while True:
state = get_game_img()
state = torch.from_numpy(state)
# right = 1, left = 2
current_side = get_player_side()
target = torch.LongTensor(current_side)
x = Variable(state.view(-1, 42 * 42))
y = Variable(target)
optimizer_net.zero_grad()
y_pred = net(x)
loss = criterion(y_pred, y)
loss.backward()
optimizer.step()
The error I get:
File "train.py", line 109, in train
loss = criterion(y_pred, y)
File "/home/shani/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/shani/anaconda2/lib/python2.7/site-packages/torch/nn/modules/loss.py", line 321, in forward
self.weight, self.size_average)
File "/home/shani/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 533, in cross_entropy
return nll_loss(log_softmax(input), target, weight, size_average)
File "/home/shani/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 501, in nll_loss
return f(input, target)
File "/home/shani/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py", line 41, in forward
output, *self.additional_args)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /py/conda-bld/pytorch_1493676237139/work/torch/lib/THNN/generic/ClassNLLCriterion.c:57
For most of deeplearning library, target(or label) should start from 0.
It means that your target should be in the range of [0,n) with n-classes.
It looks like PyTorch expect to get zero-based labels (0/1 in your case) and you probably feed it with one-based labels (1/2)
I had the same error in my program and i just realized that the problem was in the number of output nodes in my neural network
In my program the number of output nodes of my model was not equal to the number of labels in dataset
the number of output was 1 and the number of target labels was 10. then i changed the number of output to 10, there was no error
I'm trying to run a DoE in parallel on a distributed code, which doesn't seem to work. Below is a simplified example that raises the same error as for the real code.
import numpy as np
from openmdao.api import IndepVarComp, Group, Problem, Component
from openmdao.core.mpi_wrap import MPI
from openmdao.drivers.latinhypercube_driver import LatinHypercubeDriver
if MPI:
from openmdao.core.petsc_impl import PetscImpl as impl
rank = MPI.COMM_WORLD.rank
else:
from openmdao.api import BasicImpl as impl
rank = 0
class DistribCompSimple(Component):
"""Uses 2 procs but takes full input vars"""
def __init__(self, arr_size=2):
super(DistribCompSimple, self).__init__()
self._arr_size = arr_size
self.add_param('invar', 0.)
self.add_output('outvec', np.ones(arr_size, float))
def solve_nonlinear(self, params, unknowns, resids):
if rank == 0:
unknowns['outvec'] = params['invar'] * np.ones(self._arr_size) * 0.25
elif rank == 1:
unknowns['outvec'] = params['invar'] * np.ones(self._arr_size) * 0.5
print 'hello from rank', rank, unknowns['outvec']
def get_req_procs(self):
return (2, 2)
if __name__ == '__main__':
N_PROCS = 4
prob = Problem(impl=impl)
root = prob.root = Group()
root.add('p1', IndepVarComp('invar', 0.), promotes=['*'])
root.add('comp', DistribCompSimple(2), promotes=['*'])
prob.driver = LatinHypercubeDriver(4, num_par_doe=N_PROCS/2)
prob.driver.add_desvar('invar', lower=-5.0, upper=5.0)
prob.driver.add_objective('outvec')
prob.setup(check=False)
prob.run()
I run this with
mpirun -np 4 python lhc_driver.py
and get this error:
Traceback (most recent call last):
File "lhc_driver.py", line 60, in <module>
prob.run()
File "/Users/frza/git/OpenMDAO/openmdao/core/problem.py", line 1064, in run
self.driver.run(self)
File "/Users/frza/git/OpenMDAO/openmdao/drivers/predeterminedruns_driver.py", line 157, in run
self._run_par_doe(problem.root)
File "/Users/frza/git/OpenMDAO/openmdao/drivers/predeterminedruns_driver.py", line 221, in _run_par_doe
for case in self._get_case_w_nones(self._distrib_build_runlist()):
File "/Users/frza/git/OpenMDAO/openmdao/drivers/predeterminedruns_driver.py", line 283, in _get_case_w_nones
case = next(it)
File "/Users/frza/git/OpenMDAO/openmdao/drivers/latinhypercube_driver.py", line 119, in _distrib_build_runlist
run_list = comm.scatter(job_list, root=0)
File "MPI/Comm.pyx", line 1286, in mpi4py.MPI.Comm.scatter (src/mpi4py.MPI.c:109079)
File "MPI/msgpickle.pxi", line 707, in mpi4py.MPI.PyMPI_scatter (src/mpi4py.MPI.c:48114)
File "MPI/msgpickle.pxi", line 161, in mpi4py.MPI.Pickle.dumpv (src/mpi4py.MPI.c:41605)
ValueError: expecting 4 items, got 2
I don't see a test for this use case in the latest master, so does that mean you don't yet support it or is it a bug?
Thanks for submitting a simple test case for this. I added the parallel DOE stuff fairly recently and forgot to test it with distributed components. I'll add a story to our bug tracker for this and hopefully get it fixed soon.
In relation with my previous question Scaled paraboloid and derivatives checking, I see that you fixed related to running the problem once. I wanted to try but I still have a problem with the derivatives checking and finite differences showed in the following code:
""" Unconstrained optimization of the scaled paraboloid component."""
from __future__ import print_function
import sys
import numpy as np
from openmdao.api import IndepVarComp, Component, Problem, Group, ScipyOptimizer
class Paraboloid(Component):
def __init__(self):
super(Paraboloid, self).__init__()
self.add_param('X', val=np.array([0.0, 0.0]))
self.add_output('f_xy', val=0.0)
def solve_nonlinear(self, params, unknowns, resids):
X = params['X']
x = X[0]
y = X[1]
unknowns['f_xy'] = (1000.*x-3.)**2 + (1000.*x)*(0.01*y) + (0.01*y+4.)**2 - 3.
def linearize(self, params, unknowns, resids):
""" Jacobian for our paraboloid."""
X = params['X']
J = {}
x = X[0]
y = X[1]
J['f_xy', 'X'] = np.array([[ 2000000.0*x - 6000.0 + 10.0*y,
0.0002*y + 0.08 + 10.0*x]])
return J
if __name__ == "__main__":
top = Problem()
root = top.root = Group()
#root.fd_options['force_fd'] = True # Error if uncommented
root.add('p1', IndepVarComp('X', np.array([3.0, -4.0])))
root.add('p', Paraboloid())
root.connect('p1.X', 'p.X')
top.driver = ScipyOptimizer()
top.driver.options['optimizer'] = 'SLSQP'
top.driver.add_desvar('p1.X',
lower=np.array([-1000.0, -1000.0]),
upper=np.array([1000.0, 1000.0]),
scaler=np.array([1000., 0.001]))
top.driver.add_objective('p.f_xy')
top.setup()
top.check_partial_derivatives()
top.run()
top.check_partial_derivatives()
print('\n')
print('Minimum of %f found at (%s)' % (top['p.f_xy'], top['p.X']))
First check works fine but the second check_partial_derivatives gives weird results for FD :
[...]
Partial Derivatives Check
----------------
Component: 'p'
----------------
p: 'f_xy' wrt 'X'
Forward Magnitude : 1.771706e-04
Reverse Magnitude : 1.771706e-04
Fd Magnitude : 9.998228e-01
Absolute Error (Jfor - Jfd) : 1.000000e+00
Absolute Error (Jrev - Jfd) : 1.000000e+00
Absolute Error (Jfor - Jrev): 0.000000e+00
Relative Error (Jfor - Jfd) : 1.000177e+00
Relative Error (Jrev - Jfd) : 1.000177e+00
Relative Error (Jfor - Jrev): 0.000000e+00
Raw Forward Derivative (Jfor)
[[ -1.77170624e-04 -8.89040341e-10]]
Raw Reverse Derivative (Jrev)
[[ -1.77170624e-04 -8.89040341e-10]]
Raw FD Derivative (Jfd)
[[ 0.99982282 0. ]]
Minimum of -27.333333 found at ([ 6.66666658e-03 -7.33333333e+02])
And (may be not related) when I try to set root.fd_options['force_fd'] = True (just to see), I get an error during the first check :
Partial Derivatives Check
----------------
Component: 'p'
----------------
Traceback (most recent call last):
File "C:\Program Files (x86)\Wing IDE 101 5.0\src\debug\tserver\_sandbox.py", line 59, in
File "d:\rlafage\OpenMDAO\OpenMDAO\openmdao\core\problem.py", line 1827, in check_partial_derivatives
u_size = np.size(dunknowns[u_name])
File "d:\rlafage\OpenMDAO\OpenMDAO\openmdao\core\vec_wrapper.py", line 398, in __getitem__
return self._dat[name].get()
File "d:\rlafage\OpenMDAO\OpenMDAO\openmdao\core\vec_wrapper.py", line 223, in _get_scalar
return self.val[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
I work with OpenMDAO HEAD (d1e12d4).
This is just a stepsize problem for that finite difference. The 2nd FD occurs at a different point (the optimum), and it must be more sensitive at that point.
I tried with central difference
top.root.p.fd_options['form'] = 'central'
And got much better results.
----------------
Component: 'p'
----------------
p: 'f_xy' wrt 'X'
Forward Magnitude : 1.771706e-04
Reverse Magnitude : 1.771706e-04
Fd Magnitude : 1.771738e-04
The exception when you set 'fd' is a real bug related to the scaler on the des_var being an array. Thanks for the report on that; we'll get a story up to fix it.