Torch7 ClassNLLCriterion() - torch

I've been trying for a whole day to get my code to work but it fails despite the inputs and outputs being consistent.
Someone mentioned somewhere that classnllcliterion does not accept values less than or equal to zero.
How am I supposed to go about training this network.
here is part of my code, I suppose it fails when in backward here the models output may contain -ve values.
However when I switch to meansquarederror criterion, the code works just fine.
ninputs = 22; noutputs = 3
hidden =22
model = nn.Sequential()
model:add(nn.Linear(ninputs, hidden)) -- define the only module
model:add(nn.Tanh())
model:add(nn.Linear(hidden, noutputs))
model:add(nn.LogSoftMax())
----------------------------------------------------------------------
-- 3. Define a loss function, to be minimized.
-- In that example, we minimize the Mean Square Error (MSE) between
-- the predictions of our linear model and the groundtruth available
-- in the dataset.
-- Torch provides many common criterions to train neural networks.
criterion = nn.ClassNLLCriterion()
----------------------------------------------------------------------
-- 4. Train the model
i=1
mean = {}
std = {}
-- To minimize the loss defined above, using the linear model defined
-- in 'model', we follow a stochastic gradient descent procedure (SGD).
-- SGD is a good optimization algorithm when the amount of training data
-- is large, and estimating the gradient of the loss function over the
-- entire training set is too costly.
-- Given an arbitrarily complex model, we can retrieve its trainable
-- parameters, and the gradients of our loss function wrt these
-- parameters by doing so:
x, dl_dx = model:getParameters()
-- In the following code, we define a closure, feval, which computes
-- the value of the loss function at a given point x, and the gradient of
-- that function with respect to x. x is the vector of trainable weights,
-- which, in this example, are all the weights of the linear matrix of
-- our model, plus one bias.
feval = function(x_new)
-- set x to x_new, if differnt
-- (in this simple example, x_new will typically always point to x,
-- so the copy is really useless)
if x ~= x_new then
x:copy(x_new)
end
-- select a new training sample
_nidx_ = (_nidx_ or 0) + 1
if _nidx_ > (#csv_tensor)[1] then _nidx_ = 1 end
local sample = csv_tensor[_nidx_]
local target = sample[{ {23,25} }]
local inputs = sample[{ {1,22} }] -- slicing of arrays.
-- reset gradients (gradients are always accumulated, to accommodate
-- batch methods)
dl_dx:zero()
-- evaluate the loss function and its derivative wrt x, for that sample
local loss_x = criterion:forward(model:forward(inputs), target)
model:backward(inputs, criterion:backward(model.output, target))
-- return loss(x) and dloss/dx
return loss_x, dl_dx
end
The error received is
/home/stormy/torch/install/bin/luajit:
/home/stormy/torch/install/share/lua/5.1/nn/THNN.lua:110: Assertion
`cur_target >= 0 && cur_target < n_classes' failed. at
/home/stormy/torch/extra/nn/lib/THNN/generic/ClassNLLCriterion.c:45
stack traceback: [C]: in function 'v'
/home/stormy/torch/install/share/lua/5.1/nn/THNN.lua:110: in function
'ClassNLLCriterion_updateOutput'
...rmy/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:43: in
function 'forward' nn.lua:178: in function 'opfunc'
/home/stormy/torch/install/share/lua/5.1/optim/sgd.lua:44: in
function 'sgd' nn.lua:222: in main chunk [C]: in function 'dofile'
...ormy/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in
main chunk [C]: at 0x00405d50

The error message results from passing in targets that are out of bounds.
For example:
m = nn.ClassNLLCriterion()
nClasses = 3
nBatch = 10
net_output = torch.randn(nBatch, nClasses)
targets = torch.Tensor(10):random(1,3) -- targets are between 1 and 3
m:forward(net_output, targets)
m:backward(net_output, targets)
Now, see the bad example (that you suffer from)
targets[5] = 13 -- an out of bounds set of classes
targets[4] = 0 -- an out of bounds set of classes
-- these lines below will error
m:forward(net_output, targets)
m:backward(net_output, targets)

Related

compute_totals returns the wrong total derivatives when approx_totals is not used

I noticed that prob.compute_totals() returns a wrong answer when prob.model.approx_totals() is not specified before. Having the partial derivative manually defined or computed by finite differences doesn't change anything, the answer remains wrong when not calling prob.model.approx_totals() before. Also, the call to compute_totals is actually faster when approx_totals is called before, compared to when it's not. This seems counter-intuitive with manually defined partials, since approx_totals is supposed to add an unnecessary finite-difference computation.
Here is a MWE with the Sellar example taken from the OpenMDAO documentation. I also noticed the same behaviour in OpenAeroStruct, even though the differences are smaller than in this example.
import openmdao.api as om
from openmdao.test_suite.components.sellar_feature import SellarMDA
prob = om.Problem()
prob.model = SellarMDA()
prob.driver = om.ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
prob.driver.options['tol'] = 1e-8
prob.model.add_design_var('x', lower=0, upper=10)
prob.model.add_design_var('z', lower=0, upper=10)
prob.model.add_objective('obj')
prob.model.add_constraint('con1', upper=0)
prob.model.add_constraint('con2', upper=0)
prob.setup()
prob.set_solver_print(level=0)
prob.model.approx_totals() # Commenting this line gives the wrong result
prob.run_driver()
totals = prob.compute_totals(of=['obj'],wrt=['x','z'])
print("""
Obj = {}
x = {}
z = {}
y1 = {}
y2 = {}
Totals = {}""".format(prob['obj'][0],prob['x'][0],prob['z'][0],prob['y1'][0],prob['y2'][0],totals))
The good result, with approx_totals :
Optimization terminated successfully. (Exit mode 0)
Current function value: 3.183393951729169
Iterations: 6
Function evaluations: 6
Gradient evaluations: 6
Optimization Complete
-----------------------------------
Obj = 3.183393951729169
x = 0.0
z = 1.977638883487764
y1 = 3.1600000000897133
y2 = 3.755277766976125
Totals = OrderedDict([(('obj', 'x'), array([[0.94051147]])), (('obj', 'z'), array([[3.50849282, 1.72901602]]))])
The wrong result, whithout approx_totals :
Optimization terminated successfully. (Exit mode 0)
Current function value: 3.1833939532752136
Iterations: 11
Function evaluations: 12
Gradient evaluations: 11
Optimization Complete
-----------------------------------
Obj = 3.1833939532752136
x = 4.401421628747386e-15
z = 1.9776388839289216
y1 = 3.1600000016563765
y2 = 3.755277767857951
Totals = OrderedDict([(('obj', 'x'), array([[0.99341446]])), (('obj', 'z'), array([[3.90585351, 1.97002055]]))])
In this example, the problem is that you have a cycle in SellarMDA, but the model does not contain a linear solver that can compute the total derivatives across the cycle. One way you can check on this is to run "openmdao check myfilename.py" at the command-line. I ran it on your model, and got the following warnings:
INFO: checking comp_has_no_outputs
INFO: checking dup_inputs
INFO: checking missing_recorders
WARNING: The Problem has no recorder of any kind attached
INFO: checking out_of_order
INFO: checking solvers
WARNING: Group 'cycle' contains cycles [['d1', 'd2']], but does not have an iterative linear solver.
INFO: checking system
There are a couple of remedies for this. You could manually add a different linear solver such as DirectSolver or PETScKrylov to the "cycle" group. You could also import SellarMDALinearSolver instead of SellarMDA. SellarMDALinearSolver uses a Newton solver for converging the cycle, and a DirectSolver for computing the derivatives. SellarMDA uses NonlinearBlockGS to converge the cycle, but unfortunately does not contain an appropriate linear solver to compute the derivatives. These components are used in a variety of testing roles, but I think in retrospect, we should probably add a LinearBlockGS to SellarMDA in the future, so that total derivatives can be computed without modification. For now though, you'll have to use SellarMDALinearSolver or add the solver yourself.
BTW, I suspect the optimization was slower because the derivatives were so bad. It took twice as many iterations, though it still somehow managed to get pretty close to the answer.
You mentioned similar symptoms in your OpenAeroStruct model. I would suspect that either 1) a subcomponent has an error in its analytical derivatives, or 2) the linear solvers are not set up correctly (maybe you have a cycle somewhere without a good linear solver in that group or parent group.) I think that Problem.check_partials and Problem.check_totals will give you more insight on where the problem could be. There is more info on these here.

How to weight observations in mxnet?

I am new to neural networks and the mxnet package in R. I want to do a logistic regression on my predictors since my observations are probabilities varying between 0 and 1. I'd like to weight my observations by a vector obsWeights I have, but I'm not sure where to implement the weights. There seems to be a weight= option in mx.symbol.FullyConnected but if I try weight=obsWeights I get the following error message
Error in mx.varg.symbol.FullyConnected(list(...)) :
Cannot find argument 'weight', Possible Arguments:
----------------
num_hidden : int, required
Number of hidden nodes of the output.
no_bias : boolean, optional, default=False
Whether to disable bias parameter.
How should I proceed to weight my observations? Here is my code at the moment.
# Prepare data
train.mm = model.matrix(obs ~ . , data = train_data)
train_label = train_data$obs
# Normalize
train.mm = apply(train.mm, 2, function(x) (x-min(x))/(max(x)-min(x)))
# Create MXDataIter compatible iterator
batch_size = 128
train.iter = mx.io.arrayiter(data=t(train.mm), label=train_label,
batch.size=batch_size, shuffle=T)
# Symbolic model definition
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data=data, num.hidden=128, name='fc1')
act1 = mx.symbol.Activation(data=fc1, act.type='relu', name='act1')
final = mx.symbol.FullyConnected(data=act1, num.hidden=1, name='final')
logistic = mx.symbol.LogisticRegressionOutput(data=final, name='logistic')
# Run model
mxnet_train = mx.model.FeedForward.create(
symbol = logistic,
X = train.iter,
initializer = mx.init.Xavier(rnd_type = 'gaussian', factor_type = 'avg', magnitude = 2),
num.round = 25)
Assigning the fully connected weight argument is not what you want to do at any rate. That weight is a reference to parameters of the layer; i.e., what you multiply in the inputs by to get output values These are the parameter values you're trying to learn.
If you want to make some samples matter more than others, then you'll need to adjust the loss function. For example, multiply the usual loss function by your weights so that they do not contribute as much to the overall average loss.
I do not believe the standard Mxnet loss functions have a spot for assigning weights (that is LogisticRegressionOutput won't cover this). However, you can make your own cost function that does. This would involve passing your final layer through a sigmoid activation function to first generate the usual logistic regression output value. Then pass that into the loss function you define. You could do squared error, but for logistic regression you'll probably want to use the cross entropy function:
l * log(y) + (1 - l) * log(1 - y),
where l is the label and y is the predicted value.
Ideally, you'd write a symbol with an efficient definition of the gradient (Mxnet has a cross entropy function, but its for softmax input, not a binary output. You could translate your output to two outputs with softmax as an alternative, but that seems less easy to work with in this case), but the easiest path would be to let Mxnet do its autodiff on it. Then you multiply that cross entropy loss by the weights.
I haven't tested this code, but you'd ultimately have something like this (this is what you'd do in python, should be similar in R):
label = mx.sym.Variable('label')
out = mx.sym.Activation(data=final, act_type='sigmoid')
ce = label * mx.sym.log(out) + (1 - label) * mx.sym.log(1 - out)
weights = mx.sym.Variable('weights')
loss = mx.sym.MakeLoss(weigths * ce, normalization='batch')
Then you want to input your weight vector into the weights Variable along with your normal input data and labels.
As an added tip, the output of an mxnet network with a custom loss via MakeLoss outputs the loss, not the prediction. You'll probably want both in practice, in which case its useful to group the loss with a gradient-blocked version of the prediction so that you can get both. You'd do that like this:
pred_loss = mx.sym.Group([mx.sym.BlockGrad(out), loss])

Large Difference in Output: nnGraph based LSTM vs Sequencer LSTM (Torch)

I have implemented a Sequence Labeler in Torch using rnn from Element Research and also using nnGraph based LSTM code from Oxford ML Group. The training of the nnGraph based LSTM is done similar to the one given in Oxford ML Group.
I have kep the hyperparameters same for both the modules. When i train both the modules on same dataset i get lower error(around 75 Fmeasure) on rnn from Element Research while the error is large(around 5 Fmeasure) using nnGraph based LSTM.
For simplification i do Backpropagaton Through Time over the enitire sequence on both the models. For nnGraph based LSTM I clone it for maximum length of the sequence.
Here is the snippet for training using rnn package:
------------------ forward pass -------------------
local embeddings = {} -- input embeddings
local loss = 0
if inputSource:size(1) ~= target:size(1) then
print("Size mismatch "..inputSource:size(1).."\t"..target:size(1))
os.exit()
end
-- Send the input sequence through a Lookup Table to obtain it's embeddings
for t=1,inputSource:size(1) do
if options.useGPU then
embeddings[t] = embed:forward(inputSource[t])[1]:cuda()
else
embeddings[t] = embed:forward(inputSource[t])[1]
end
end
-- Send the embedding sequence to prduce a table of ner tags
local predictions = sequenceLabeler:forward(embeddings)
loss = loss + criterion:forward(predictions, target)
local gradOutputs = criterion:backward(predictions, target)
sequenceLabeler:backward(embeddings, gradOutputs)
loss = loss / inputSource:size(1)
and the snippet for training using nnGraph based LSTM is
local embeddings = {} -- input embeddings
local loss = 0
if inputSource:size(1) ~= target:size(1) then
print("Size mismatch "..inputSource:size(1).."\t"..target:size(1))
os.exit()
end
-- Send the input sequence through a Lookup Table to obtain it's embeddings
for t=1,inputSource:size(1) do
embeddings[t] = embed:forward(inputSource[t])[1]
end
local lstm_c = {[0]=initstate_c} -- internal cell states of LSTM
local lstm_h = {[0]=initstate_h} -- output values of LSTM
local predictions = {} -- softmax outputs
-- For every input word pass through LSTM module and softmax module
for t = 1, inputSource:size(1) do
lstm_c[t], lstm_h[t] = unpack(clones.memory[t]:forward({embeddings[t]:cuda(), lstm_c[t-1]:cuda(), lstm_h[t-1]:cuda()}))
predictions[t] = clones.softmax[t]:forward(lstm_h[t])
loss = loss + clones.criterion[t]:forward(predictions[t]:float(), target[t])
end
local dlstm_c = {}
local dlstm_h = {}
-- Gradients from higher layers are zero
dlstm_c[inputSource:size(1)]=dfinalstate_c:cuda() --Zero tensors
dlstm_h[inputSource:size(1)]=dfinalstate_h:cuda() --Sero tensors
local dTempSummary = {} -- gradient to be sent to lookup table. But remember the lookup table isn't modified
for t = inputSource:size(1),1,-1 do
local doutput_t = clones.criterion[t]:backward(predictions[t]:float(), target[t]):clone()
--Gradient from output layer. If the token is the last in the sequence there's no additional gradient coming down
--Else need to consider gradient from previous tokens so add
if t == inputSource:size(1) then
dlstm_h[t] = clones.softmax[t]:backward(lstm_h[t], doutput_t):clone()
else
dlstm_h[t]:add(clones.softmax[t]:backward(lstm_h[t], doutput_t))
end
-- backprop through LSTM timestep
dTempSummary[t], dlstm_c[t-1], dlstm_h[t-1] = unpack(clones.memory[t]:backward(
{embeddings[t]:cuda(), lstm_c[t-1]:cuda(), lstm_h[t-1]:cuda()},
{dlstm_c[t]:cuda(), dlstm_h[t]:cuda()}
))
end
loss = loss / inputSource:size(1)
I have shared the complete code snippet here : Complete Code Snippet for both modules
I know that i'm missing something in my nnGraph based LSTM implenetation but unable to figure out my error. Can someone please help me in finding where i am wrong?

Julia parallel for loop with two reductions

I would like to perform two reductions in a parallel for loop in Julia. I am trying to compute the error in a random forest inside the parallel for loop as each tree is built. Any ideas?
Current:
forest = #parallel (vcat) for i in 1:ntrees
inds = rand(1:Nlabels, Nsamples)
build_tree(labels[inds], features[inds,:], nsubfeatures)
end
What I want, intuitively is to do an addition inside this for loop as well to get the out of bag error. This is how I would wish for it to work:
forest, ooberror = #parallel (vcat, +) for i in 1:ntrees
inds = rand(1:Nlabels, Nsamples)
tree = build_tree(labels[inds], features[inds,:], nsubfeatures)
error = geterror(ids, features, tree)
(tree, error)
end
Using a type might be best in terms of simplicity and clarity, e.g.
type Forest
trees :: Vector
error
end
join(a::Forest, b::Forest) = Forest(vcat(a.trees,b.trees), a.error+b.error)
#...
forest = #parallel (join) for i in 1:ntrees
inds = rand(1:Nlabels, Nsamples)
tree = build_tree(labels[inds], features[inds,:], nsubfeatures)
error = geterror(ids, features, tree)
Forest(tree, error)
end

What are x1_step1_xoffset, x1_step1_gain and x1_step1_ymin in a neural network generated by genFunction in Matlab?

I'm working with Matlab's Neural Network toolbox and I have generated a neural network function with genFunction.
I would like to know what mapminmax_apply function does, what are these variables used for and their meaning in the neural network:
% Input 1
x1_step1_xoffset = [0.151979470539401;-89.4008362047824;0.387909026651698;0.201508462422352];
x1_step1_gain = [2.67439342164766;0.0112020512930696;3.56055585104964;4.09080417195814];
x1_step1_ymin = -1;
Here it's the mapminmax_apply function:
% Map Minimum and Maximum Input Processing Function
function y = mapminmax_apply(x,settings_gain,settings_xoffset,settings_ymin)
y = bsxfun(#minus,x,settings_xoffset);
y = bsxfun(#times,y,settings_gain);
y = bsxfun(#plus,y,settings_ymin);
end
And here it's the call to the function with the above variables:
% Input 1
Xp1 = mapminmax_apply(X{1,ts},x1_step1_gain,x1_step1_xoffset,x1_step1_ymin);
I think:
the mapminmax function can also return the settings it uses (amongst others, offset, gain and ymin). For some reason in the code spat out by the NN function, these settings are given at the begining of the file, under Input1, in the form of x1_step1_xoffset, etc.
mapminmax('apply',X,PS) will apply the settings in PS to the mapminmax algorithm.
So, I think the code generated here has more steps than you necessarily need. You could get rid of the Input1 steps and just use a simple xp1 = mapminmax(x1'), instead of the mapminmax_apply
Cheers
Matlab NN toolbox automatically normalizes the features of the dataset.
The functions mapminmax_apply and mapminmax_reverse are related to normalizing the features.
The function mapminmax_apply exactly converts/normalizes input range to -1 to 1.
Since the output will also come out as a normalized vector/value(between -1 to 1) it needs to be reversed normalized using the function mapminmax_reverse .
Cheers

Resources