Torch: Model fast when learning/testing, slow when using it - torch

I have an issue using a learned model with torch.
I followed this howto http://code.cogbits.com/wiki/doku.php?id=tutorial_supervised to train a model. Everything is fine, my model was trained and I have corrects results when I use my model. But it's slow !
The testing part for training look like this:
model:evaluate()
-- test over test data
print('==> testing on test set:')
for t = 1,testData:size() do
-- disp progress
xlua.progress(t, testData:size())
-- get new sample
local input = testData.data[t]
if opt.type == 'double' then input = input:double()
elseif opt.type == 'cuda' then input = input:cuda() end
local target = testData.labels[t]
-- test sample
local pred = model:forward(input)
confusion:add(pred, target)
end
-- timing
time = sys.clock() - time
time = time / testData:size()
print("\n==> time to test 1 sample = " .. (time*1000) .. 'ms')
I have the following speed recorded during testing:
==> time to test 1 sample = 12.419194088996ms
(Of course it vary, but it's ~12ms).
I want to use the learned model on others images, so I did this in a simple and new script:
(... requires)
torch.setnumthreads(8)
torch.setdefaulttensortype('torch.FloatTensor')
model = torch.load('results/model.net')
model:evaluate()
(... Image loading, resizing and normalization)
local time = sys.clock()
local result_info = model:forward(cropped_image:double())
print("==> time to test 1 frame = " .. (sys.clock() - time) * 1000 .. "ms")
The time spent is much bigger, I have the following output: ==> time to test 1 frame = 212.7647127424ms
I tested with more than one image, always with the resizing and normalization outside clock's measurements, and I always have > 200ms / image.
I don't understand what I'm doing wrong and why my code is much slower than during the training/testing.
Thanks !

Related

bnlearn package: unexpected cpdist (prediction) behaviour

I encounter a problem that goes beyond my understanding.
I have made a simple reproducible example for you to test it out.
Basically I create a Bayesian network with two strongly correlated variables that are linked together. One would expect that if one of them is high, the other one should also be (Since they are directly linked).
library(bnlearn)
Learning.set4 = cbind(c(1,2,1,8,9,9),c(2,0,1,10,10,10))
Learning.set4 = as.data.frame(Learning.set4)
colnames(Learning.set4) = c("Cause","Cons")
b.network = empty.graph(colnames(Learning.set4))
struct.mat = matrix(0,2,2)
colnames(struct.mat) = colnames(Learning.set4)
rownames(struct.mat) = colnames(struct.mat)
struct.mat[1,2] = 1
bnlearn::amat(b.network) = struct.mat
haha = bn.fit(b.network,Learning.set4)
# Here we get a mean that is close to 10
seems_logic_to_me=cpdist(haha, nodes="Cons",
evidence=list("Cause"=10), method="lw")
# Here I get a mean that is close to 5, so a high value
# of Cons wouldn't mean anything for Cause?
very_low_cause_values = cpdist(haha, nodes="Cause",
evidence=list("Cons"=10), method="lw")
Could anyone enlighten me here on the reason why it doesn't work with the lw method? (You can try with ls and it seems to work fine).
lw stands for likelihood weighting
UPDATE:
Got the solution from the maintainer.
Adding the following at the end will print the the expected prediction:
print (sum(very_low_cause_values[, 1] * attr(very_low_cause_values, "weights")) / sum(attr(very_low_cause_values, "weights")))

lmRob() on mostly linear data: errors and defense

I am working with a legacy R script using robust::lmRob().
The documentation for lmRob() is quite clear, and the example runs fine (for docs and example, see here)
Based on that, I would have thought this simple script would work, but it fails with 'msg.UCV' not found
library(robust)
xx = c(c(2.1111,3.1111,4.1111),seq(1,7,by=0.3))
yy = 0.5+1.5*xx
yy[1] = -21; yy[2] = 0; yy[3] = -10
df = data.frame(x=xx,y=yy)
mf = lmRob(y~x, data=df)
Note that the data is 21 exactly linear rows, plus three outliers.
If instead one uses
yy = 0.5+1.5*xx + 0.01*xx^2
then no errors arise.
I would switch to robustbase::lmrob() but lmRob.fit.compute() seems to do some fairly nontrivial things.
What is a good defensive programming technique to prevent near-linear data from causing errors in my program?

Torch7 ClassNLLCriterion()

I've been trying for a whole day to get my code to work but it fails despite the inputs and outputs being consistent.
Someone mentioned somewhere that classnllcliterion does not accept values less than or equal to zero.
How am I supposed to go about training this network.
here is part of my code, I suppose it fails when in backward here the models output may contain -ve values.
However when I switch to meansquarederror criterion, the code works just fine.
ninputs = 22; noutputs = 3
hidden =22
model = nn.Sequential()
model:add(nn.Linear(ninputs, hidden)) -- define the only module
model:add(nn.Tanh())
model:add(nn.Linear(hidden, noutputs))
model:add(nn.LogSoftMax())
----------------------------------------------------------------------
-- 3. Define a loss function, to be minimized.
-- In that example, we minimize the Mean Square Error (MSE) between
-- the predictions of our linear model and the groundtruth available
-- in the dataset.
-- Torch provides many common criterions to train neural networks.
criterion = nn.ClassNLLCriterion()
----------------------------------------------------------------------
-- 4. Train the model
i=1
mean = {}
std = {}
-- To minimize the loss defined above, using the linear model defined
-- in 'model', we follow a stochastic gradient descent procedure (SGD).
-- SGD is a good optimization algorithm when the amount of training data
-- is large, and estimating the gradient of the loss function over the
-- entire training set is too costly.
-- Given an arbitrarily complex model, we can retrieve its trainable
-- parameters, and the gradients of our loss function wrt these
-- parameters by doing so:
x, dl_dx = model:getParameters()
-- In the following code, we define a closure, feval, which computes
-- the value of the loss function at a given point x, and the gradient of
-- that function with respect to x. x is the vector of trainable weights,
-- which, in this example, are all the weights of the linear matrix of
-- our model, plus one bias.
feval = function(x_new)
-- set x to x_new, if differnt
-- (in this simple example, x_new will typically always point to x,
-- so the copy is really useless)
if x ~= x_new then
x:copy(x_new)
end
-- select a new training sample
_nidx_ = (_nidx_ or 0) + 1
if _nidx_ > (#csv_tensor)[1] then _nidx_ = 1 end
local sample = csv_tensor[_nidx_]
local target = sample[{ {23,25} }]
local inputs = sample[{ {1,22} }] -- slicing of arrays.
-- reset gradients (gradients are always accumulated, to accommodate
-- batch methods)
dl_dx:zero()
-- evaluate the loss function and its derivative wrt x, for that sample
local loss_x = criterion:forward(model:forward(inputs), target)
model:backward(inputs, criterion:backward(model.output, target))
-- return loss(x) and dloss/dx
return loss_x, dl_dx
end
The error received is
/home/stormy/torch/install/bin/luajit:
/home/stormy/torch/install/share/lua/5.1/nn/THNN.lua:110: Assertion
`cur_target >= 0 && cur_target < n_classes' failed. at
/home/stormy/torch/extra/nn/lib/THNN/generic/ClassNLLCriterion.c:45
stack traceback: [C]: in function 'v'
/home/stormy/torch/install/share/lua/5.1/nn/THNN.lua:110: in function
'ClassNLLCriterion_updateOutput'
...rmy/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:43: in
function 'forward' nn.lua:178: in function 'opfunc'
/home/stormy/torch/install/share/lua/5.1/optim/sgd.lua:44: in
function 'sgd' nn.lua:222: in main chunk [C]: in function 'dofile'
...ormy/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in
main chunk [C]: at 0x00405d50
The error message results from passing in targets that are out of bounds.
For example:
m = nn.ClassNLLCriterion()
nClasses = 3
nBatch = 10
net_output = torch.randn(nBatch, nClasses)
targets = torch.Tensor(10):random(1,3) -- targets are between 1 and 3
m:forward(net_output, targets)
m:backward(net_output, targets)
Now, see the bad example (that you suffer from)
targets[5] = 13 -- an out of bounds set of classes
targets[4] = 0 -- an out of bounds set of classes
-- these lines below will error
m:forward(net_output, targets)
m:backward(net_output, targets)

makeCluster with parallelSVM in R takes up all Memory and swap

I'm trying to train a SVM model on a large dataset(~110k training points). This is a sample of the code where I am using the parallelSVM package to parallelize the training step on a subset of the training data on my 4 core Linux machine.
numcore = 4
train.time = c()
for(i in 1:5)
{
cl = makeCluster(4)
registerDoParallel(cores=numCore)
getDoParWorkers()
dummy = train_train[1:10000*i,]
begin = Sys.time()
model.svm = parallelSVM(as.factor(target) ~ .,data =dummy,
numberCores=detectCores(),probability = T)
end = Sys.time() - begin
train.time = c(train.time,end)
stopCluster(cl)
registerDoSEQ()
}
The idea of this snippet of code is to estimate the time it'll take to train the model on the entire dataset by gradually increasing the size of the dummy training set. After running the code above for 10,000 and 20,000 training samples, this is the memory and swap history usage statistic from the System Monitor.After 4 runs of the for loop,both the memory and swap usage is about 95%,and I get the following error :
Error in summary.connection(connection) : invalid connection
Any ideas on how to manage this problem? Is there a way to deallocate the memory used by a cluster after using the stopCluster() function ?
Please take into consideration the fact that I am an absolute beginner in this field. A short explanation of the proposed solutions will be greatly appreciated. Thank you.
Your line
registerDoParallel(cores=numCore)
creates a new cluster with number of nodes equal to numCore (which you haven't stated). This cluster is never destroyed, so with each iteration of the loop you're starting more new R processes. Since you're already creating a cluster with cl = makeCluster(4), you should use
registerDoParallel(cl)
instead.
(And move the makeCluster, registerDoParallel, stopCluster and registerDoSEQ calls outside the loop.)

Large Difference in Output: nnGraph based LSTM vs Sequencer LSTM (Torch)

I have implemented a Sequence Labeler in Torch using rnn from Element Research and also using nnGraph based LSTM code from Oxford ML Group. The training of the nnGraph based LSTM is done similar to the one given in Oxford ML Group.
I have kep the hyperparameters same for both the modules. When i train both the modules on same dataset i get lower error(around 75 Fmeasure) on rnn from Element Research while the error is large(around 5 Fmeasure) using nnGraph based LSTM.
For simplification i do Backpropagaton Through Time over the enitire sequence on both the models. For nnGraph based LSTM I clone it for maximum length of the sequence.
Here is the snippet for training using rnn package:
------------------ forward pass -------------------
local embeddings = {} -- input embeddings
local loss = 0
if inputSource:size(1) ~= target:size(1) then
print("Size mismatch "..inputSource:size(1).."\t"..target:size(1))
os.exit()
end
-- Send the input sequence through a Lookup Table to obtain it's embeddings
for t=1,inputSource:size(1) do
if options.useGPU then
embeddings[t] = embed:forward(inputSource[t])[1]:cuda()
else
embeddings[t] = embed:forward(inputSource[t])[1]
end
end
-- Send the embedding sequence to prduce a table of ner tags
local predictions = sequenceLabeler:forward(embeddings)
loss = loss + criterion:forward(predictions, target)
local gradOutputs = criterion:backward(predictions, target)
sequenceLabeler:backward(embeddings, gradOutputs)
loss = loss / inputSource:size(1)
and the snippet for training using nnGraph based LSTM is
local embeddings = {} -- input embeddings
local loss = 0
if inputSource:size(1) ~= target:size(1) then
print("Size mismatch "..inputSource:size(1).."\t"..target:size(1))
os.exit()
end
-- Send the input sequence through a Lookup Table to obtain it's embeddings
for t=1,inputSource:size(1) do
embeddings[t] = embed:forward(inputSource[t])[1]
end
local lstm_c = {[0]=initstate_c} -- internal cell states of LSTM
local lstm_h = {[0]=initstate_h} -- output values of LSTM
local predictions = {} -- softmax outputs
-- For every input word pass through LSTM module and softmax module
for t = 1, inputSource:size(1) do
lstm_c[t], lstm_h[t] = unpack(clones.memory[t]:forward({embeddings[t]:cuda(), lstm_c[t-1]:cuda(), lstm_h[t-1]:cuda()}))
predictions[t] = clones.softmax[t]:forward(lstm_h[t])
loss = loss + clones.criterion[t]:forward(predictions[t]:float(), target[t])
end
local dlstm_c = {}
local dlstm_h = {}
-- Gradients from higher layers are zero
dlstm_c[inputSource:size(1)]=dfinalstate_c:cuda() --Zero tensors
dlstm_h[inputSource:size(1)]=dfinalstate_h:cuda() --Sero tensors
local dTempSummary = {} -- gradient to be sent to lookup table. But remember the lookup table isn't modified
for t = inputSource:size(1),1,-1 do
local doutput_t = clones.criterion[t]:backward(predictions[t]:float(), target[t]):clone()
--Gradient from output layer. If the token is the last in the sequence there's no additional gradient coming down
--Else need to consider gradient from previous tokens so add
if t == inputSource:size(1) then
dlstm_h[t] = clones.softmax[t]:backward(lstm_h[t], doutput_t):clone()
else
dlstm_h[t]:add(clones.softmax[t]:backward(lstm_h[t], doutput_t))
end
-- backprop through LSTM timestep
dTempSummary[t], dlstm_c[t-1], dlstm_h[t-1] = unpack(clones.memory[t]:backward(
{embeddings[t]:cuda(), lstm_c[t-1]:cuda(), lstm_h[t-1]:cuda()},
{dlstm_c[t]:cuda(), dlstm_h[t]:cuda()}
))
end
loss = loss / inputSource:size(1)
I have shared the complete code snippet here : Complete Code Snippet for both modules
I know that i'm missing something in my nnGraph based LSTM implenetation but unable to figure out my error. Can someone please help me in finding where i am wrong?

Resources