Large Difference in Output: nnGraph based LSTM vs Sequencer LSTM (Torch) - torch

I have implemented a Sequence Labeler in Torch using rnn from Element Research and also using nnGraph based LSTM code from Oxford ML Group. The training of the nnGraph based LSTM is done similar to the one given in Oxford ML Group.
I have kep the hyperparameters same for both the modules. When i train both the modules on same dataset i get lower error(around 75 Fmeasure) on rnn from Element Research while the error is large(around 5 Fmeasure) using nnGraph based LSTM.
For simplification i do Backpropagaton Through Time over the enitire sequence on both the models. For nnGraph based LSTM I clone it for maximum length of the sequence.
Here is the snippet for training using rnn package:
------------------ forward pass -------------------
local embeddings = {} -- input embeddings
local loss = 0
if inputSource:size(1) ~= target:size(1) then
print("Size mismatch "..inputSource:size(1).."\t"..target:size(1))
os.exit()
end
-- Send the input sequence through a Lookup Table to obtain it's embeddings
for t=1,inputSource:size(1) do
if options.useGPU then
embeddings[t] = embed:forward(inputSource[t])[1]:cuda()
else
embeddings[t] = embed:forward(inputSource[t])[1]
end
end
-- Send the embedding sequence to prduce a table of ner tags
local predictions = sequenceLabeler:forward(embeddings)
loss = loss + criterion:forward(predictions, target)
local gradOutputs = criterion:backward(predictions, target)
sequenceLabeler:backward(embeddings, gradOutputs)
loss = loss / inputSource:size(1)
and the snippet for training using nnGraph based LSTM is
local embeddings = {} -- input embeddings
local loss = 0
if inputSource:size(1) ~= target:size(1) then
print("Size mismatch "..inputSource:size(1).."\t"..target:size(1))
os.exit()
end
-- Send the input sequence through a Lookup Table to obtain it's embeddings
for t=1,inputSource:size(1) do
embeddings[t] = embed:forward(inputSource[t])[1]
end
local lstm_c = {[0]=initstate_c} -- internal cell states of LSTM
local lstm_h = {[0]=initstate_h} -- output values of LSTM
local predictions = {} -- softmax outputs
-- For every input word pass through LSTM module and softmax module
for t = 1, inputSource:size(1) do
lstm_c[t], lstm_h[t] = unpack(clones.memory[t]:forward({embeddings[t]:cuda(), lstm_c[t-1]:cuda(), lstm_h[t-1]:cuda()}))
predictions[t] = clones.softmax[t]:forward(lstm_h[t])
loss = loss + clones.criterion[t]:forward(predictions[t]:float(), target[t])
end
local dlstm_c = {}
local dlstm_h = {}
-- Gradients from higher layers are zero
dlstm_c[inputSource:size(1)]=dfinalstate_c:cuda() --Zero tensors
dlstm_h[inputSource:size(1)]=dfinalstate_h:cuda() --Sero tensors
local dTempSummary = {} -- gradient to be sent to lookup table. But remember the lookup table isn't modified
for t = inputSource:size(1),1,-1 do
local doutput_t = clones.criterion[t]:backward(predictions[t]:float(), target[t]):clone()
--Gradient from output layer. If the token is the last in the sequence there's no additional gradient coming down
--Else need to consider gradient from previous tokens so add
if t == inputSource:size(1) then
dlstm_h[t] = clones.softmax[t]:backward(lstm_h[t], doutput_t):clone()
else
dlstm_h[t]:add(clones.softmax[t]:backward(lstm_h[t], doutput_t))
end
-- backprop through LSTM timestep
dTempSummary[t], dlstm_c[t-1], dlstm_h[t-1] = unpack(clones.memory[t]:backward(
{embeddings[t]:cuda(), lstm_c[t-1]:cuda(), lstm_h[t-1]:cuda()},
{dlstm_c[t]:cuda(), dlstm_h[t]:cuda()}
))
end
loss = loss / inputSource:size(1)
I have shared the complete code snippet here : Complete Code Snippet for both modules
I know that i'm missing something in my nnGraph based LSTM implenetation but unable to figure out my error. Can someone please help me in finding where i am wrong?

Related

Predict Multiple Output using Apriori Algorithm in R

Currently I am working on item-item based recommendation system using r. The package which I have used is arules. I have done my basic models but I want to modify my model with following criteria:
In the apriori algo. We will receive only one output, not multiple output. I want multiple output value in the rhs side. For example:
lhs rhs
{GH DAILY MOONG DAL PREMIUM 1kg,
MDH POW SPICE DEGHI CHILLI 100g,PREM 1kg} => {DAILY OTH PULSE CHANA DAL...
Rice}
My recommendation system totally based on item-item. Is there any other algorithm or package exist in r which will give me better business output?
How to calculate confidence and support value? For my case I am using default values.
My code is given below:
#Create Sparse Matrix
dataset = read.transactions('/Users/Nikita/Downloads/Reco_System/market_basket_before_model.csv', sep = ',', rm.duplicates = TRUE)
summary(dataset)
itemFrequencyPlot(dataset, topN = 20, type = 'absolute')
#1st cut
# Training Apriori on the dataset
rules = apriori(data = dataset, parameter = list(support = 0.001, confidence = 0.8))
# Visualising the results
inspect(sort(rules, by = 'lift')[1:30])
Thanks in advance.
Most implementations of association rule mining algorithms restrict the RHS of the rules to a single item to avoid further combinatorial explosion.

TensorFlow: Do preprocessing operations get frozen in a graph as well?

I believe after training, the model saved to the checkpoint does not contain any of the preprocessing operation, as upon examination of the checkpoint model, the operations available start from the input of a model (and not the preprocessing operations that precede the model input).
However, when freezing a graph restored from a point file, where the graph has additional preprocessing operations, does the preprocessing operation gets frozen as well? I have included a preprocessing operation for test time in the graph, and intend to freeze the graph together with the checkpoint model, but the result seem to vary a lot for these 2 scenarios:
Put raw image through frozen graph with preprocessing operations included in the frozen graph --> very, very poor accuracy as if no preprocessing was done.
Preprocess the image first, before putting the preprocessed image through a frozen graph that does not include any preprocessing operation --> result works as expected with very high accuracy.
So my question is does the preprocessing operation gets effectively frozen, or is it advisable to only preprocess images at test time so that we can leave the frozen graph for performing inference only (and not any preprocessing op)? My intention was to include the preprocessing ops within the graph to make it more convenient, but it seems that this approach does not work.
What is the TensorFlow's take on such a workflow? Should preprocessing be done within the graph and frozen, or should it be a separate task outside of the frozen graph?
Here is how I intended to put the preprocessing ops within a graph and freeze them all:
with tf.Graph().as_default() as graph:
# image = tf.placeholder(shape=[None, None, 3], dtype=tf.float32, name = 'Placeholder_only')
# preprocessed_image = inception_preprocessing.preprocess_for_eval(image, 299, 299)
# preprocessed_image = tf.expand_dims(preprocessed_image, 0)
img_array = tf.placeholder(dtype=tf.float32, shape=[None,None,3], name='Placeholder_only')
preprocessed_image = inception_preprocessing.preprocess_for_eval(img_array, 299, 299)
preprocessed_image = tf.expand_dims(preprocessed_image, 0, name='expand_preprocessed_img')
with slim.arg_scope(inception_resnet_v2_arg_scope()):
logits, end_points = inception_resnet_v2(preprocessed_image, num_classes = 5, is_training = False)
variables_to_restore = slim.get_variables_to_restore()
saver = tf.train.Saver(variables_to_restore)
#Setup graph def
input_graph_def = graph.as_graph_def()
output_node_names = "InceptionResnetV2/Logits/Predictions"
output_graph_name = "./frozen_flowers_model_IR2_with_preprocesssing.pb"
with tf.Session() as sess:
saver.restore(sess, checkpoint_file)
# count=0
# for op in graph.get_operations():
# print (op.name)
# count+=1
# if count==50:
# assert False
#Exporting the graph
print ("Exporting graph...")
output_graph_def = graph_util.convert_variables_to_constants(
sess,
input_graph_def,
output_node_names.split(","))
with tf.gfile.GFile(output_graph_name, "wb") as f:
f.write(output_graph_def.SerializeToString())

How to weight observations in mxnet?

I am new to neural networks and the mxnet package in R. I want to do a logistic regression on my predictors since my observations are probabilities varying between 0 and 1. I'd like to weight my observations by a vector obsWeights I have, but I'm not sure where to implement the weights. There seems to be a weight= option in mx.symbol.FullyConnected but if I try weight=obsWeights I get the following error message
Error in mx.varg.symbol.FullyConnected(list(...)) :
Cannot find argument 'weight', Possible Arguments:
----------------
num_hidden : int, required
Number of hidden nodes of the output.
no_bias : boolean, optional, default=False
Whether to disable bias parameter.
How should I proceed to weight my observations? Here is my code at the moment.
# Prepare data
train.mm = model.matrix(obs ~ . , data = train_data)
train_label = train_data$obs
# Normalize
train.mm = apply(train.mm, 2, function(x) (x-min(x))/(max(x)-min(x)))
# Create MXDataIter compatible iterator
batch_size = 128
train.iter = mx.io.arrayiter(data=t(train.mm), label=train_label,
batch.size=batch_size, shuffle=T)
# Symbolic model definition
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data=data, num.hidden=128, name='fc1')
act1 = mx.symbol.Activation(data=fc1, act.type='relu', name='act1')
final = mx.symbol.FullyConnected(data=act1, num.hidden=1, name='final')
logistic = mx.symbol.LogisticRegressionOutput(data=final, name='logistic')
# Run model
mxnet_train = mx.model.FeedForward.create(
symbol = logistic,
X = train.iter,
initializer = mx.init.Xavier(rnd_type = 'gaussian', factor_type = 'avg', magnitude = 2),
num.round = 25)
Assigning the fully connected weight argument is not what you want to do at any rate. That weight is a reference to parameters of the layer; i.e., what you multiply in the inputs by to get output values These are the parameter values you're trying to learn.
If you want to make some samples matter more than others, then you'll need to adjust the loss function. For example, multiply the usual loss function by your weights so that they do not contribute as much to the overall average loss.
I do not believe the standard Mxnet loss functions have a spot for assigning weights (that is LogisticRegressionOutput won't cover this). However, you can make your own cost function that does. This would involve passing your final layer through a sigmoid activation function to first generate the usual logistic regression output value. Then pass that into the loss function you define. You could do squared error, but for logistic regression you'll probably want to use the cross entropy function:
l * log(y) + (1 - l) * log(1 - y),
where l is the label and y is the predicted value.
Ideally, you'd write a symbol with an efficient definition of the gradient (Mxnet has a cross entropy function, but its for softmax input, not a binary output. You could translate your output to two outputs with softmax as an alternative, but that seems less easy to work with in this case), but the easiest path would be to let Mxnet do its autodiff on it. Then you multiply that cross entropy loss by the weights.
I haven't tested this code, but you'd ultimately have something like this (this is what you'd do in python, should be similar in R):
label = mx.sym.Variable('label')
out = mx.sym.Activation(data=final, act_type='sigmoid')
ce = label * mx.sym.log(out) + (1 - label) * mx.sym.log(1 - out)
weights = mx.sym.Variable('weights')
loss = mx.sym.MakeLoss(weigths * ce, normalization='batch')
Then you want to input your weight vector into the weights Variable along with your normal input data and labels.
As an added tip, the output of an mxnet network with a custom loss via MakeLoss outputs the loss, not the prediction. You'll probably want both in practice, in which case its useful to group the loss with a gradient-blocked version of the prediction so that you can get both. You'd do that like this:
pred_loss = mx.sym.Group([mx.sym.BlockGrad(out), loss])

Torch7 ClassNLLCriterion()

I've been trying for a whole day to get my code to work but it fails despite the inputs and outputs being consistent.
Someone mentioned somewhere that classnllcliterion does not accept values less than or equal to zero.
How am I supposed to go about training this network.
here is part of my code, I suppose it fails when in backward here the models output may contain -ve values.
However when I switch to meansquarederror criterion, the code works just fine.
ninputs = 22; noutputs = 3
hidden =22
model = nn.Sequential()
model:add(nn.Linear(ninputs, hidden)) -- define the only module
model:add(nn.Tanh())
model:add(nn.Linear(hidden, noutputs))
model:add(nn.LogSoftMax())
----------------------------------------------------------------------
-- 3. Define a loss function, to be minimized.
-- In that example, we minimize the Mean Square Error (MSE) between
-- the predictions of our linear model and the groundtruth available
-- in the dataset.
-- Torch provides many common criterions to train neural networks.
criterion = nn.ClassNLLCriterion()
----------------------------------------------------------------------
-- 4. Train the model
i=1
mean = {}
std = {}
-- To minimize the loss defined above, using the linear model defined
-- in 'model', we follow a stochastic gradient descent procedure (SGD).
-- SGD is a good optimization algorithm when the amount of training data
-- is large, and estimating the gradient of the loss function over the
-- entire training set is too costly.
-- Given an arbitrarily complex model, we can retrieve its trainable
-- parameters, and the gradients of our loss function wrt these
-- parameters by doing so:
x, dl_dx = model:getParameters()
-- In the following code, we define a closure, feval, which computes
-- the value of the loss function at a given point x, and the gradient of
-- that function with respect to x. x is the vector of trainable weights,
-- which, in this example, are all the weights of the linear matrix of
-- our model, plus one bias.
feval = function(x_new)
-- set x to x_new, if differnt
-- (in this simple example, x_new will typically always point to x,
-- so the copy is really useless)
if x ~= x_new then
x:copy(x_new)
end
-- select a new training sample
_nidx_ = (_nidx_ or 0) + 1
if _nidx_ > (#csv_tensor)[1] then _nidx_ = 1 end
local sample = csv_tensor[_nidx_]
local target = sample[{ {23,25} }]
local inputs = sample[{ {1,22} }] -- slicing of arrays.
-- reset gradients (gradients are always accumulated, to accommodate
-- batch methods)
dl_dx:zero()
-- evaluate the loss function and its derivative wrt x, for that sample
local loss_x = criterion:forward(model:forward(inputs), target)
model:backward(inputs, criterion:backward(model.output, target))
-- return loss(x) and dloss/dx
return loss_x, dl_dx
end
The error received is
/home/stormy/torch/install/bin/luajit:
/home/stormy/torch/install/share/lua/5.1/nn/THNN.lua:110: Assertion
`cur_target >= 0 && cur_target < n_classes' failed. at
/home/stormy/torch/extra/nn/lib/THNN/generic/ClassNLLCriterion.c:45
stack traceback: [C]: in function 'v'
/home/stormy/torch/install/share/lua/5.1/nn/THNN.lua:110: in function
'ClassNLLCriterion_updateOutput'
...rmy/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:43: in
function 'forward' nn.lua:178: in function 'opfunc'
/home/stormy/torch/install/share/lua/5.1/optim/sgd.lua:44: in
function 'sgd' nn.lua:222: in main chunk [C]: in function 'dofile'
...ormy/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in
main chunk [C]: at 0x00405d50
The error message results from passing in targets that are out of bounds.
For example:
m = nn.ClassNLLCriterion()
nClasses = 3
nBatch = 10
net_output = torch.randn(nBatch, nClasses)
targets = torch.Tensor(10):random(1,3) -- targets are between 1 and 3
m:forward(net_output, targets)
m:backward(net_output, targets)
Now, see the bad example (that you suffer from)
targets[5] = 13 -- an out of bounds set of classes
targets[4] = 0 -- an out of bounds set of classes
-- these lines below will error
m:forward(net_output, targets)
m:backward(net_output, targets)

Torch: Model fast when learning/testing, slow when using it

I have an issue using a learned model with torch.
I followed this howto http://code.cogbits.com/wiki/doku.php?id=tutorial_supervised to train a model. Everything is fine, my model was trained and I have corrects results when I use my model. But it's slow !
The testing part for training look like this:
model:evaluate()
-- test over test data
print('==> testing on test set:')
for t = 1,testData:size() do
-- disp progress
xlua.progress(t, testData:size())
-- get new sample
local input = testData.data[t]
if opt.type == 'double' then input = input:double()
elseif opt.type == 'cuda' then input = input:cuda() end
local target = testData.labels[t]
-- test sample
local pred = model:forward(input)
confusion:add(pred, target)
end
-- timing
time = sys.clock() - time
time = time / testData:size()
print("\n==> time to test 1 sample = " .. (time*1000) .. 'ms')
I have the following speed recorded during testing:
==> time to test 1 sample = 12.419194088996ms
(Of course it vary, but it's ~12ms).
I want to use the learned model on others images, so I did this in a simple and new script:
(... requires)
torch.setnumthreads(8)
torch.setdefaulttensortype('torch.FloatTensor')
model = torch.load('results/model.net')
model:evaluate()
(... Image loading, resizing and normalization)
local time = sys.clock()
local result_info = model:forward(cropped_image:double())
print("==> time to test 1 frame = " .. (sys.clock() - time) * 1000 .. "ms")
The time spent is much bigger, I have the following output: ==> time to test 1 frame = 212.7647127424ms
I tested with more than one image, always with the resizing and normalization outside clock's measurements, and I always have > 200ms / image.
I don't understand what I'm doing wrong and why my code is much slower than during the training/testing.
Thanks !

Resources