mxnet model does not produce same output for same input with no intermediate gradient backprop - recurrent-neural-network

I have some experience with Tensorflow but only about a week with mxnet. I am trying to understand the behavior of some code when I hit a break point in the function below:
def train_and_eval(lr, end_date_str, pred):
model.collect_params().initialize(mx.init.Xavier(), ctx=ctx, force_reinit=True)
mgr = ProcessMgr(2, end_date_str)
for epoch in range(args_epochs):
for i in range(2):
if i == TRAIN_MODE:
mgr.switch_to_train()
elif epoch == args_epochs - 1 and i == VALIDATE_MODE:
mgr.switch_to_validate()
else:
break
while True:
try:
data, target, eval_target, date_str = mgr.get_batch()
data = gluon.utils.split_and_load(data, ctx)
target = gluon.utils.split_and_load(target, ctx)
eval_target = gluon.utils.split_and_load(eval_target, ctx)
data = [mx.nd.swapaxes(d, 0, 1) for d in data]
with autograd.record():
losses = [loss(model(X)[-args_batch_size:], Y) for X, Y in zip(data, target)]
null_loss_vals = sum([Y.square().sum().asscalar() for Y in target])
model_loss_vals = sum([sum(l).asscalar() for l in losses])
null_loss[i] += null_loss_vals
model_loss[i] += model_loss_vals
**pdb.set_trace() ## BREAK POINT IS HERE**
if i == TRAIN_MODE:
for l in losses:
l.backward()
x = 18
grads = [i.grad(ctx) for i in model.collect_params().values() if i._grad is not None]
gluon.utils.clip_global_norm(grads, args_clip)
trainer.step(GPU_COUNT * args_batch_size)
except:
print("completed an epoch")
break
I am getting some unexpected values for the losses I am calculating, so I put a break point in to see what was going on. The problem is that when I run the same data through the model, I get different outputs each time. Below I paste some of the outputs I have when I hit the pdb breakpoint and try to run data through the model.
<NDArray 38400x1 #gpu(0)>
(Pdb) model(data[0])
[[ 2.9265028e-01]
[ 9.3701184e-03]
[ 4.3234527e-02]
...
[-5.0668776e-09]
[-2.7628975e-08]
[-1.9340845e-08]]
<NDArray 38400x1 #gpu(0)>
(Pdb) model(data[0])
[[ 1.5275864e-01]
[ 2.0615126e-01]
[ 4.6957955e-02]
...
[-2.6077061e-08]
[-9.2040580e-09]
[-3.2883932e-08]]
<NDArray 38400x1 #gpu(0)>
(Pdb) data[0]
[[[ 0. -4.]
[ 0. -4.]
[ 0. -4.]
...
[ 0. -4.]
[ 0. -4.]
[ 0. -4.]]
[[ 0. -4.]
[ 0. -4.]
[ 0. -4.]
...
[ 0. -4.]
[ 0. -4.]
[ 0. -4.]]
[[ 0. -4.]
[ 0. -4.]
[ 0. -4.]
...
[ 0. -4.]
[ 0. -4.]
[ 0. -4.]]
...
[[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
...
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
...
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
...
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]]
<NDArray 128x300x2 #gpu(0)>
(Pdb) data[0]
[[[ 0. -4.]
[ 0. -4.]
[ 0. -4.]
...
[ 0. -4.]
[ 0. -4.]
[ 0. -4.]]
[[ 0. -4.]
[ 0. -4.]
[ 0. -4.]
...
[ 0. -4.]
[ 0. -4.]
[ 0. -4.]]
[[ 0. -4.]
[ 0. -4.]
[ 0. -4.]
...
[ 0. -4.]
[ 0. -4.]
[ 0. -4.]]
...
[[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
...
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
...
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
...
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]]
<NDArray 128x300x2 #gpu(0)>
(Pdb)
I am perplexed as to what is going on here. I do realize my code may not be entirely proper in that I am not running anything in a predict or inference model (was planning to check/tackle that later), but I don't understand how the model itself seems to changing each time I run input into the model even though I am not running backward() or trainer.step(). Any insight would be appreciated. Why is this happening?
My only guess is that perhaps the hidden state is preserved between runs. But I thought I had not coded it to do so (I saw an example where this was done and the hidden state had to be explicitly saved and fed back into the RNN). In particular, I have not implemented a begin_state method for my gluon.Block. I am not sure how to verify or disprove this guess.
Here is my gluon.Block as implemented in case that is relevant:
class RNNModel(gluon.Block):
def __init__(self, mode, num_inputs, num_embed, num_hidden,
num_layers, dropout=0.5, tie_weights=False, **kwargs):
super(RNNModel, self).__init__(**kwargs)
with self.name_scope():
self.drop = nn.Dropout(dropout)
self.rnn = rnn.GRU(num_hidden, num_layers, dropout=dropout,
input_size=num_inputs)
self.decoder = nn.Dense(1, in_units = num_hidden)
self.num_hidden = num_hidden
def forward(self, inputs):
output = self.rnn(inputs)
output = self.drop(output)
decoded = self.decoder(output.reshape((-1, self.num_hidden)))
return decoded

I determined that within the with autograd.record() context, the hidden state must keep evolving, because I did not see this behavior outside of this context. Because my model does not provide a variable which exposes the hidden state I was not able to verify this explicitly, but it makes the most sense. Also I was able to confirm that the weights that are exposed (via trainer._params) were not changing, so it had to be the hidden state.

Related

numpy array of unexpected dimension

I'm currently switching from Matlab to Python and I have a problem with understanding numpy arrays.
The following code (copied from Numpy documentation) creates a [2x3] array
np.array([[1, 2, 3], [4, 5, 6]], np.int32).
Which behaves as expected.
Now I tried to adapt this to my case and tried
myArray = np.array([\
[-0.000847283, 0.000000000, 0.141182070, 2.750000000],
[ 0.000876414, -0.025855453, 0.270459334, 2.534537894],
[-0.000098373, 0.003388169, -0.021976882, 3,509325279],
[ 0.000077079, -0.004507202, 0.096453685, 2,917172446],
[-0.000049944, 0.003114201, -0.055974372, 3,933359490],
[ 0.000042697, -0.003833862, 0.117727186, 2.485846507],
[-0.000000843, 0.000084733, 0.000169340, 3.661424974],
[ 0.000000676, -0.000074756, 0.005751451, 3.596300338],
[-0.000001860, 0.000229543, -0.006420507, 3.758593109],
[ 0.000006764, -0.000934745, 0.045972458, 2.972698644],
[ 0.000014803, -0.002140505, 0.106260454, 1.967898711],
[-0.000025975, 0.004587858, -0.263799480, 8.752330828],
[ 0.000009098, -0.001725357, 0.114993424, 1.176472749],
[-0.000010418, 0.002080207, -0.132368251, 6.535975709],
[ 0.000032572, -0.006947575, 0.499576502, -8.209401868],
[-0.000039870, 0.009351884, -0.722882956, 22.352084596],
[ 0.000046909, -0.011475011, 0.943268640, -22.078624629],
[-0.000067764, 0.017766572, -1.542265901, 48.344854010],
[ 0.000144148, -0.039449875, 3.607214322,-106.139552662],
[-0.000108830, 0.032648910, -3.242170215, 110.757624352]
])
But not as expected the shape is (20,). I expected the following shape: (20x4).
Question 1: Can anyone tell me why? And how do I create the array correctly?
Question 2: When I add the datatype , dtype=np.float, I get the following
Error:
*TypeError: float() argument must be a string or a number, not 'list'*
but the array isn't intended to be a list.
I found the mistake on my own after trying to np.vstack all vectors.
The resulting error said that the size of the arrays with the row index 2, 3, 4 is not 4 as expected.
Replacing a , (comma) with a dot solved the problem.

Breaking out of a recursive function

I'm walking through a set of nested blocks and want to stop the walk when I've found the value I'm looking for.
For reasons that are beyond the scope of this question, I can't use PARSE for this particular problem, nor use FOREACH as the looper:
walk: func [series [block!] criteria [block!]][
use [value] compose/deep [
while [not tail? series][
value: pick series 1
either block? value [
walk value criteria
][
(to paren! criteria)
]
series: next series
]
]
]
I'd like to break out if I find this specific value.
walk [a [b c [d e] f] g] [if value = 'e [return value]]
; returns 'e
However, I'd also like to do operations that don't break out:
walk [a [b c [d e] f] g] [
collect [if find [c e] value [keep value]]
]
; returns [c e]
Would like to try and solve this for any of the Rebol flavours including Red. Any thoughts as to efficiency (reason I use a block instead of a function), etc. would be welcome too.
The function combo I was looking for is CATCH/THROW. Once again, using the given function:
walk: func [series [block!] criteria [block!]][
use [value] compose/deep [
while [not tail? series][
value: pick series 1
either block? value [
walk value criteria
][
(to paren! criteria)
]
series: next series
]
]
]
I can simply wrap it as follows:
catch [walk [a [b c [d e] f] g] [if value = 'e [throw value]]]
; returns 'e
Some Notes
I want the function to return NONE if there are no matches
I'll just have WALK return NONE (am using ALSO just so as not to leave an awkward trailing none):
walk: func [series [block!] criteria [block!]][
also none use [value] compose/deep [
while [not tail? series][
value: pick series 1
either block? value [
walk value criteria
][
(to paren! criteria)
]
series: next series
]
]
]
red does not have a USE function
This introduces a complication as I only want to bind the block to the word VALUE. If I were to rewrite the function as follows:
walk: func [series [block!] criteria [block!] /local value][
do bind compose/deep [
while [not tail? series][
value: pick series 1
either block? value [
walk value criteria
][
(to paren! criteria)
]
series: next series
]
] 'value
]
Then it also binds that same block to the words SERIES and CRITERIA which would override the binding of any such words from the calling context, e.g.:
walk [some values][series: none probe value] ; results in error
This version avoids binding anything except VALUE and works in Red 0.6.3 and Rebol2:
walk: func [series [block!] criteria [block!]][
also none do bind compose/deep [
while [not tail? series] [
value: pick series 1
either block? value [
walk value criteria
] [
(to paren! criteria)
]
series: next series
]
]
context [value: none]
]
(Comments on how this implementation differs from what USE does would be welcome.)
And yes, this does not work on Rebol3 Alpha. But neither does the one with the USE. I think it's a THROW issue.

Matrix Inversion Methods

When one has a problem of a matrix inverse multiplication with a vector, as such:
one can take a Cholesky Decomposition of A and backsubstitute b to find the resulting vector x. However, a matrix inverse is sometimes needed when the problem is not formulated as above. My question is what is the best way to handle such a situation. Below, I have compared various ways(using numpy) to invert a positive definite matrix:
Firstly, generate the matrix:
>>> A = np.random.rand(5,5)
>>> A
array([[ 0.13516074, 0.2532381 , 0.61169708, 0.99678563, 0.32895589],
[ 0.35303998, 0.8549499 , 0.39071336, 0.32792806, 0.74723177],
[ 0.4016188 , 0.93897663, 0.92574706, 0.93468798, 0.90682809],
[ 0.03181169, 0.35059435, 0.10857948, 0.36422977, 0.54525 ],
[ 0.64871162, 0.37809219, 0.35742865, 0.7154568 , 0.56028468]])
>>> A = np.dot(A.transpose(), A)
>>> A
array([[ 0.72604206, 0.96959581, 0.82773451, 1.10159817, 1.05327233],
[ 0.96959581, 1.94261607, 1.53140854, 1.80864185, 1.9766411 ],
[ 0.82773451, 1.53140854, 1.52338262, 1.89841402, 1.59213299],
[ 1.10159817, 1.80864185, 1.89841402, 2.61930178, 2.01999385],
[ 1.05327233, 1.9766411 , 1.59213299, 2.01999385, 2.10012097]])
The results for the method of direct inversion are as follows:
>>> np.linalg.inv(A)
array([[ 5.49746838, -1.92540877, 2.24730018, -2.20242449,
-0.53025806],
[ -1.92540877, 95.34219156, -67.93144606, 50.16450952,
-85.52146331],
[ 2.24730018, -67.93144606, 57.0739859 , -40.56297863,
58.55694127],
[ -2.20242449, 50.16450952, -40.56297863, 30.6441555 ,
-44.83400183],
[ -0.53025806, -85.52146331, 58.55694127, -44.83400183,
79.96573405]])
When using the Moore-Penrose Pseudoinverse, the results are as follows(you might notice that to the displayed precision, the results are the same as direct inversion):
>>> np.linalg.pinv(A)
array([[ 5.49746838, -1.92540877, 2.24730018, -2.20242449,
-0.53025806],
[ -1.92540877, 95.34219156, -67.93144606, 50.16450952,
-85.52146331],
[ 2.24730018, -67.93144606, 57.0739859 , -40.56297863,
58.55694127],
[ -2.20242449, 50.16450952, -40.56297863, 30.6441555 ,
-44.83400183],
[ -0.53025806, -85.52146331, 58.55694127, -44.83400183,
79.96573405]])
Finally, when solving with the identity matrix:
>>> np.linalg.solve(A, np.eye(5))
array([[ 5.49746838, -1.92540877, 2.24730018, -2.20242449,
-0.53025806],
[ -1.92540877, 95.34219156, -67.93144606, 50.16450952,
-85.52146331],
[ 2.24730018, -67.93144606, 57.0739859 , -40.56297863,
58.55694127],
[ -2.20242449, 50.16450952, -40.56297863, 30.6441555 ,
-44.83400183],
[ -0.53025806, -85.52146331, 58.55694127, -44.83400183,
79.96573405]])
Again, you might notice that on a cursory inspection, the result is the same as the previous two methods.
It is well known that matrix inversion is an ill posed problem due to numerical instability that should be avoided where possible. However, in situations where it appears unavoidable, what is the preferable approach and why? To clarify, I am referring to the best approach when implementing such equations in software.
An example of such a problem is provided with another of my questions.
The reason for avoiding inverting matrices has only to do with efficiency. It is faster to solve the linear systems directly. If you think of the problem in your linked question a bit differently, you can apply the same principles.
In order to find the matrix inv(K) * Y * T(Y) * inv(K) - D * inv(K) you can solve the following systems of equations:
K * R * K = Y * T(Y)
You can solve it in two parts:
R2 * K = R1
K * R1 = Y * T(Y)
So you first solve for R1 with your usual method, then solve for R2 (recognise that you can solve T(K) * T(R2) = T(R1) if you have to).
However, at this point I don't know if this will be more efficient than computing the inverse explicitly unless K is symmetric. (There may be a way to efficiently get the decomposition of T(K) from K, but I don't know offhand)
If K is symmetric then you can compute your decomposition on K once and reuse it for the two back-substitution steps and it might be more efficient than computing the inverse explicitly.

Plotting Data with Basemap

I am trying to plot the following dataset using a Basemap instance:
[-0.126929 -0.127279 -0.127851 ..., -0.14199 -0.142828 -0.14335 ]
The problem is I can't have the right number of points displayed on the map. (which is the length of my data list above = 1315 points).
If I define my dataset to be like:
data = np.diag(data)
[[-0.126929 0. 0. ..., 0. 0. 0. ]
[ 0. -0.127279 0. ..., 0. 0. 0. ]
[ 0. 0. -0.127851 ..., 0. 0. 0. ]
...,
[ 0. 0. 0. ..., -0.14199 0. 0. ]
[ 0. 0. 0. ..., 0. -0.142828 0. ]
[ 0. 0. 0. ..., 0. 0. -0.14335 ]]
I have this figure
If I define my dataset to be like:
data = np.ma.masked_where(data==0., data)
[[-0.12692899999999696 -- -- ..., -- -- --]
[-- -0.12727900000000147 -- ..., -- -- --]
[-- -- -0.12785099999999971 ..., -- -- --]
...,
[-- -- -- ..., -0.14198999999999984 -- --]
[-- -- -- ..., -- -0.1428280000000015 --]
[-- -- -- ..., -- -- -0.1433499999999981]]
I have this figure
fig = plt.figure()
m = Basemap(projection='mill', \
llcrnrlon= lonmin-0.02, \
urcrnrlon= lonmax+0.02, \
llcrnrlat= latmin-0.02, \
urcrnrlat= latmax+0.02, \
resolution='f')
cmap = plt.cm.get_cmap('bwr_r')
x, y = m(*np.meshgrid(lon,lat))
y = y.T
cs = m.contourf(x,y,data,cmap=cmap)
m.drawcoastlines(linewidth=1.,color='grey')
m.drawcountries(linewidth=1.5,color='white')
m.drawmapboundary(color='k',linewidth=2.0)
m.fillcontinents(color='white')
cb = m.colorbar(cs,location='bottom',size='5%',pad='8%')
cb.set_label('[m]',fontsize=12)
plt.show()

Making turtles wait x number of ticks

Part of what I am trying to do is make a breed of turtles move around, but when one reaches its destination that turtle waits for a certain number of ticks before continuing ? Also is it possible to make turtles wait for different number of ticks depending upon their destination ( different patch colors). Is it a case of making a turtle breed or global variable to count the number of ticks? The hopefully relevant code is below.
You are right, this can be done by making the turtles count the number of ticks they have been on a patch. Also this has to be a turtle variable and not a global variable since each turtle will have a different value for this
The approach, I have used is this:
Once the turtle arrives at its destination record the ticks (the global variable which records the number of ticks that have passed till now) into a turtle variable say ticks-since-here. This works like a time-stamp.
On each successive tick check the difference between the current-time ticks global variable and the ticks-since-here turtle variable. If this becomes greater than the number of ticks the turtle is allowed to stay on the patch, let it choose and move to the new destination.
breed [visitors visitor]
globals [ number-of-visitors ]
visitors-own [
; visitors own destination
destination
ticks-since-here
]
to go
ask visitors [
move
]
tick
end
to move
; Instructions to move the agents around the environment go here
; comparing patch standing on to dest, if at dest then choose random new dest
; then more forward towards new dest
ifelse ( patch-here = destination )
[
if ticks - ticks-since-here > ticks-to-stay-on-patch patch-here
[
set ticks-since-here 0
set destination one-of patches with
[
pcolor = 65 or pcolor = 95 or pcolor = 125 or pcolor = 25 or pcolor = 15 or pcolor = 5
]
]
]
[
face destination
forward 1
if ( patch-here = destination )
[
set ticks-since-here ticks
]
]
end
to-report ticks-to-stay-on-patch [p]
if [pcolor] of p = 65
[
report 6
]
if [pcolor] of p = 95
[
report 5
]
if [pcolor] of p = 125
[
report 4
]
if [pcolor] of p = 25
[
report 3
]
if [pcolor] of p = 15
[
report 2
]
if [pcolor] of p = 5
[
report 1
]
end
to setup-people
;;;; added the following lines to facilitate world view creation
ask patches
[
set pcolor one-of [65 95 125 25 15 5]
]
set number-of-visitors 100
;;;;
create-visitors number-of-visitors
[
ask visitors
[
; set the shape of the visitor to "visitor"
set shape "person"
; set the color of visitor to white
set color white
; give person a random xy
setxy (random 50) (random 50)
; set visitors destination variable
set destination one-of patches with
[
pcolor = 65 or pcolor = 95 or pcolor = 125 or pcolor = 25 or pcolor = 15 or pcolor = 5
]
]
]
end

Resources