Pytorch RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select - runtime-error

I am training a model that takes tokenized strings which are then passed through an embedding layer and an LSTM thereafter. However, there seems to be an error in the input, as it does not pass through the embedding layer.
class DrugModel(nn.Module):
def __init__(self, input_dim, output_dim, hidden_dim, drug_embed_dim,
lstm_layer, lstm_dropout, bi_lstm, linear_dropout, char_vocab_size,
char_embed_dim, char_dropout, dist_fn, learning_rate,
binary, is_mlp, weight_decay, is_graph, g_layer,
g_hidden_dim, g_out_dim, g_dropout):
super(DrugModel, self).__init__()
# Save model configs
self.drug_embed_dim = drug_embed_dim
self.lstm_layer = lstm_layer
self.char_dropout = char_dropout
self.dist_fn = dist_fn
self.binary = binary
self.is_mlp = is_mlp
self.is_graph = is_graph
self.g_layer = g_layer
self.g_dropout = g_dropout
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# For one-hot encoded SMILES
if not is_mlp:
self.char_embed = nn.Embedding(char_vocab_size, char_embed_dim,
padding_idx=0)
self.lstm = nn.LSTM(char_embed_dim, drug_embed_dim, lstm_layer,
bidirectional=False,
batch_first=True, dropout=lstm_dropout)
# Distance function
self.dist_fc = nn.Linear(drug_embed_dim, 1)
if binary:
# Binary Cross Entropy
self.criterion = lambda x, y: y*torch.log(x) + (1-y)*torch.log(1-x)
def init_lstm_h(self, batch_size):
return (Variable(torch.zeros(
self.lstm_layer*1, batch_size, self.drug_embed_dim)).cuda(),
Variable(torch.zeros(
self.lstm_layer*1, batch_size, self.drug_embed_dim)).cuda())
# Set Siamese network as basic LSTM
def siamese_sequence(self, inputs, length):
# Character embedding
inputs = inputs.long()
inputs = inputs.cuda()
self.char_embed = self.char_embed(inputs.to(self.device))
c_embed = self.char_embed(inputs)
# c_embed = F.dropout(c_embed, self.char_dropout)
maxlen = inputs.size(1)
if not self.training:
# Sort c_embed
_, sort_idx = torch.sort(length, dim=0, descending=True)
_, unsort_idx = torch.sort(sort_idx, dim=0)
maxlen = torch.max(length)
# Pack padded sequence
c_embed = c_embed.index_select(0, Variable(sort_idx).cuda())
sorted_len = length.index_select(0, sort_idx).tolist()
c_packed = pack_padded_sequence(c_embed, sorted_len, batch_first=True)
else:
c_packed = c_embed
# Run LSTM
init_lstm_h = self.init_lstm_h(inputs.size(0))
lstm_out, states = self.lstm(c_packed, init_lstm_h)
hidden = torch.transpose(states[0], 0, 1).contiguous().view(
-1, 1 * self.drug_embed_dim)
if not self.training:
# Unsort hidden states
outputs = hidden.index_select(0, Variable(unsort_idx).cuda())
else:
outputs = hidden
return outputs
def forward(self, key1, key2, targets, key1_len, key2_len, status, predict = False):
if not self.is_mlp:
output1 = self.siamese_sequence(key1, key1_len)
output2 = self.siamese_sequence(key2, key2_len)
After instantiating the class I get the following error when passing the input through the embedding layer:
<ipython-input-128-432fcc7a1e39> in forward(self, key1, key2, targets, key1_len, key2_len, status, predict)
129 def forward(self, key1, key2, targets, key1_len, key2_len, status, predict = False):
130 if not self.is_mlp:
--> 131 output1 = self.siamese_sequence(key1, key1_len)
132 output2 = self.siamese_sequence(key2, key2_len)
133 set_trace()
<ipython-input-128-432fcc7a1e39> in siamese_sequence(self, inputs, length)
74 inputs = inputs.cuda()
75
---> 76 self.char_embed = self.char_embed(inputs.to(self.device))
77 set_trace()
78 c_embed = self.char_embed(inputs)
~/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
~/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py in forward(self, input)
112 return F.embedding(
113 input, self.weight, self.padding_idx, self.max_norm,
--> 114 self.norm_type, self.scale_grad_by_freq, self.sparse)
115
116 def extra_repr(self):
~/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1482 # remove once script supports set_grad_enabled
1483 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1484 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1485
1486
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select
despite the fact that the input (e.g. key1) has already been passed to cuda and has been transformed into long format:
tensor([[25, 33, 30, ..., 0, 0, 0],
[25, 7, 7, ..., 0, 0, 0],
[25, 7, 30, ..., 0, 0, 0],
...,
[25, 7, 33, ..., 0, 0, 0],
[25, 33, 41, ..., 0, 0, 0],
[25, 33, 41, ..., 0, 0, 0]], device='cuda:0')

setting model.device to cuda does not change your inner module devices, so self.lstm, self.char_embed, and self.dist_fc are all still on cpu. correct way of doing it is by using DrugModel().to(device)
in general, it's better not to feed a device to your model and write it in a device-agnostic way. to make your init_lstm_h function device-agnostic you can use something like this

in order to make nn work in cuda.
first step: we must set initial model to cuda
device = torch.device('cuda:1')
model = esm.ProteinBertModel(
args,
alphabet,
).to(device)
second step: we should set loaded model to cuda
bert_model = bert_model.to(device)

Related

ValueError: shapes (2,1000) and (2,2,1000) not aligned: 1000 (dim 1) != 2 (dim 1)

I'm implementing a MLP to test a simple NN architecture, hoping to scale up to a bigger network with a larger dataset. My end goal is making a working phone recognizer for TIMIT data, as part of my internship.
To build the MLP, I used the suggestions of this video: https://www.youtube.com/watch?v=Z97XGNUUx9o.
And the proposal of my teacher to use the following inputs:
X = np.random.rand(5,1000)
y = X[4:5,:]
The error message is the following:
ValueError Traceback (most recent call last)
Cell In [63], line 7
5 build_model()
6 mlp = MLP(1000, [1000], 1000)
----> 7 mlp.train(inputs,targets, 50, 0.1)
8 output = mlp.forward_propagate(input)
Cell In [62], line 117, in MLP.train(self, inputs, targets, epochs, learning_rate)
115 output = self.forward_propagate(input)
116 error = target - output
--> 117 self.back_propagate(error)
118 self.gradient_descent(learning_rate=1)
119 sum_error += self._mse(target,output)
Cell In [62], line 96, in MLP.back_propagate(self, error)
94 current_activations = self.activations[i]
95 current_activations_reshaped = current_activations.reshape(current_activations.shape[0], -1)
---> 96 self.derivatives[i] = np.dot(current_activations, delta)
97 error = np.dot(delta, self.weights[i].T)
98 return error
File <__array_function__ internals>:180, in dot(*args, **kwargs)
ValueError: shapes (2,1000) and (2,2,1000) not aligned: 1000 (dim 1) != 2 (dim 1)
This is the relevant code:
class MLP(object):
def __init__(self, num_inputs=3, hidden_layers=[3,3], num_outputs=2):
self.num_inputs = num_inputs
self.hidden_layers = hidden_layers
self.num_outputs = num_outputs
layers = [num_inputs] + hidden_layers + [num_outputs]
weights = []
for i in range(len(layers) - 1):
w = np.random.rand(layers[i], layers[i + 1])
weights.append(w)
self.weights = weights
activations = []
for i in range(len(layers)):
a = np.zeros(layers[i])
activations.append(a)
self.activations = activations
derivatives = []
for i in range(len(layers) - 1):
d = np.zeros((layers[i], layers[i+1]))
derivatives.append(d)
self.derivatives = derivatives
def forward_propagate(self,inputs):
activations = inputs
self.activations[0] = inputs
for i in range(len(self.weights)):
net_inputs = np.dot(activations,self.weights)
activations = self._sigmoid(net_inputs)
self.activations[i+1] = activations
return activations
def back_propagate(self, error):
for i in reversed(range(len(self.derivatives))):
activations = self.activations[i+1]
delta = error * self._sigmoid_derivative(activations)
delta_reshaped = delta.reshape(delta.shape[0], -1).T
current_activations = self.activations[i]
current_activations_reshaped = current_activations.reshape(current_activations.shape[0], -1)
self.derivatives[i] = np.dot(current_activations, delta)
error = np.dot(delta, self.weights[i].T)
return error
def _sigmoid_derivative(self,x):
return x * (1.0 - x)
def _sigmoid(self,x):
y = 1.0 / (1+np.exp(-x))
return y
def gradient_descent(self, learning_rate):
for i in range(len(self.weights)):
weights = self.weights[i]
derivatives = self.derivatives[i]
weights += derivatives + learning_rate
def _mse(self,target,output):
return np.average((target-output)**2)
def train(self,inputs,targets,epochs,learning_rate):
for i in range(epochs):
sum_error = 0
for input,target in zip(inputs,targets):
output = self.forward_propagate(input)
error = target - output
self.back_propagate(error)
self.gradient_descent(learning_rate=1)
sum_error += self._mse(target,output)
print("Error: {} at epoch {}".format(sum_error/len(inputs), i))
And this is how I ran it:
if __name__ == "__main__":
X, y = load_dataset()
inputs = X
targets = y
build_model()
mlp = MLP(1000, [1000], 1000)
mlp.train(inputs,targets, 50, 0.1)
output = mlp.forward_propagate(input)
Thanks in advance!
I tried to do what the video said, to set up an MLP, as was the suggestion of the teacher, but I don't know how to solve the shape error.

Self-coded QuestionAnswering models are not trained effectively

I tried using Transformers' own Albert ForQuestionAnswering, which was able to train effectively. But I define similar code myself and can't train it effectively.
model = AutoModelForQuestionAnswering.from_pretrained(config.bert_model)
my model:The rest of the code is the same. That is, the code for training and testing is the same, but the loaded model is different.
class BertQA(nn.Module):
def __init__(self, config):
super(BertQA, self).__init__()
self.albert = AutoModel.from_pretrained(config.bert_model, add_pooling_layer=False)
# for name, param in self.model.named_parameters():
# print(name, param)
self.qa_outputs = nn.Linear(312, 2, bias=True)
self._init_weights([self.qa_outputs])
def _init_weights(self, blocks, **kwargs):
"""
参数初始化,将 Linear / Embedding / LayerNorm 与 Bert 进行一样的初始化
"""
for block in blocks:
for module in block.modules():
if isinstance(module, nn.Linear):
module.weight.data.normal_(mean=0.0, std=0.02)
if module.bias is not None:
nn.init.zeros_(module.bias)
def forward(self,
input_ids,
attention_mask,
token_type_ids,
start_positions=None,
end_positions=None):
outputs = self.albert(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
sequence_output = outputs[0]
logits = self.qa_outputs(sequence_output)
start_logits, end_logits = logits.split(1, dim=-1)
start_logits = start_logits.squeeze(-1) # (32, 512)
end_logits = end_logits.squeeze(-1) # (32, 512)
total_loss = None
if start_positions is not None and end_positions is not None:
if len(start_positions.size()) > 1:
start_positions = start_positions.squeeze(-1)
if len(end_positions.size()) > 1:
end_positions = end_positions.squeeze(-1)
# end_positions: (32)
# start_position: (32)
loss_fct = CrossEntropyLoss()
start_loss = loss_fct(start_logits, start_positions)
end_loss = loss_fct(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2
return total_loss
else:
start_logits = torch.argmax(start_logits, 1)
end_logits = torch.argmax(end_logits, 1)
return start_logits, end_logits
model = BertQA(config)

What problems can lead to a CuDNNError with ConvolutionND

I am using three-dimensional convolution links (with ConvolutionND) in my chain.
The forward computation run smoothly (I checked intermediate result shapes to be sure I understood correctly the meaning of the parameters of convolution_nd), but during the backward a CuDNNError is raised with the message CUDNN_STATUS_NOT_SUPPORTED.
The cover_all parameter of ConvolutionND as its default value of False, so from the doc I don't see what can be the cause of the error.
Here is how I defind one of the convolution layers :
self.conv1 = chainer.links.ConvolutionND(3, 1, 4, (3, 3, 3)).to_gpu(self.GPU_1_ID)
And the call stack is
File "chainer/function_node.py", line 548, in backward_accumulate
gxs = self.backward(target_input_indexes, grad_outputs)
File "chainer/functions/connection/convolution_nd.py", line 118, in backward
gy, W, stride=self.stride, pad=self.pad, outsize=x_shape)
File "chainer/functions/connection/deconvolution_nd.py", line 310, in deconvolution_nd
y, = func.apply(args)
File chainer/function_node.py", line 258, in apply
outputs = self.forward(in_data)
File "chainer/functions/connection/deconvolution_nd.py", line 128, in forward
return self._forward_cudnn(x, W, b)
File "chainer/functions/connection/deconvolution_nd.py", line 105, in _forward_cudnn
tensor_core=tensor_core)
File "cupy/cudnn.pyx", line 881, in cupy.cudnn.convolution_backward_data
File "cupy/cuda/cudnn.pyx", line 975, in cupy.cuda.cudnn.convolutionBackwardData_v3
File "cupy/cuda/cudnn.pyx", line 461, in cupy.cuda.cudnn.check_status
cupy.cuda.cudnn.CuDNNError: CUDNN_STATUS_NOT_SUPPORTED
So are there special points to take care of when using ConvolutionND ?
A failing code is for instance :
import chainer
from chainer import functions as F
from chainer import links as L
from chainer.backends import cuda
import numpy as np
import cupy as cp
chainer.global_config.cudnn_deterministic = False
NB_MASKS = 60
NB_FCN = 3
NB_CLASS = 17
class MFEChain(chainer.Chain):
"""docstring for Wavelphasenet."""
def __init__(self,
FCN_Dim,
gpu_ids=None):
super(MFEChain, self).__init__()
self.GPU_0_ID, self.GPU_1_ID = (0, 1) if gpu_ids is None else gpu_ids
with self.init_scope():
self.conv1 = chainer.links.ConvolutionND(3, 1, 4, (3, 3, 3)).to_gpu(
self.GPU_1_ID
)
def __call__(self, inputs):
### Pad input ###
processed_sequences = []
for convolved in inputs:
## Transform to sequences)
copy = convolved if self.GPU_0_ID == self.GPU_1_ID else F.copy(convolved, self.GPU_1_ID)
processed_sequences.append(copy)
reprocessed_sequences = []
with cuda.get_device(self.GPU_1_ID):
for convolved in processed_sequences:
convolved = F.expand_dims(convolved, 0)
convolved = F.expand_dims(convolved, 0)
convolved = self.conv1(convolved)
reprocessed_sequences.append(convolved)
states = F.vstack(reprocessed_sequences)
logits = states
ret_logits = logits if self.GPU_0_ID == self.GPU_1_ID else F.copy(logits, self.GPU_0_ID)
return ret_logits
def mfe_test():
mfe = MFEChain(150)
inputs = list(
chainer.Variable(
cp.random.randn(
NB_MASKS,
11,
in_len,
dtype=cp.float32
)
) for in_len in [53248]
)
val = mfe(inputs)
grad = cp.ones(val.shape, dtype=cp.float32)
val.grad = grad
val.backward()
for i in inputs:
print(i.grad)
if __name__ == "__main__":
mfe_test()
cupy.cuda.cudnn.convolutionBackwardData_v3 is incompatible with some specific parameters, as described in an issue in official github.
Unfortunately, the issue only dealt with deconvolution_2d.py (not deconvolution_nd.py), therefore the decision-making about whether cudnn is used or not failed in your case, I guess.
you can check your parameter by confirming
check whether dilation parameter (!=1) or group parameter (!=1) is passed to the convolution.
print chainer.config.cudnn_deterministic, configuration.config.autotune, and configuration.config.use_cudnn_tensor_core.
Further support may be obtained by raising an issue in the official github.
The code you showed is much complicated.
To clarify the problem, the code below would help.
from chainer import Variable, Chain
from chainer import links as L
from chainer import functions as F
import numpy as np
from six import print_
batch_size = 1
in_channel = 1
out_channel = 1
class MyLink(Chain):
def __init__(self):
super(MyLink, self).__init__()
with self.init_scope():
self.conv = L.ConvolutionND(3, 1, 1, (3, 3, 3), nobias=True, initialW=np.ones((in_channel, out_channel, 3, 3, 3)))
def __call__(self, x):
return F.sum(self.conv(x))
if __name__ == "__main__":
my_link = MyLink()
my_link.to_gpu(0)
batch = Variable(np.ones((batch_size, in_channel, 3, 3, 3)))
batch.to_gpu(0)
loss = my_link(batch)
loss.backward()
print_(batch.grad)

PyTorch RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed

I’m trying to create a basic binary classifier in Pytorch that classifies whether my player plays on the right or the left side in the game Pong. The input is an 1x42x42 image and the label is my player's side (right = 1 or left = 2). The code:
class Net(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(Net, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
out = self.fc1(x)
out = self.relu(out)
out = self.fc2(out)
return out
net = Net(42 * 42, 100, 2)
# Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer_net = torch.optim.Adam(net.parameters(), 0.001)
net.train()
while True:
state = get_game_img()
state = torch.from_numpy(state)
# right = 1, left = 2
current_side = get_player_side()
target = torch.LongTensor(current_side)
x = Variable(state.view(-1, 42 * 42))
y = Variable(target)
optimizer_net.zero_grad()
y_pred = net(x)
loss = criterion(y_pred, y)
loss.backward()
optimizer.step()
The error I get:
File "train.py", line 109, in train
loss = criterion(y_pred, y)
File "/home/shani/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/shani/anaconda2/lib/python2.7/site-packages/torch/nn/modules/loss.py", line 321, in forward
self.weight, self.size_average)
File "/home/shani/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 533, in cross_entropy
return nll_loss(log_softmax(input), target, weight, size_average)
File "/home/shani/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 501, in nll_loss
return f(input, target)
File "/home/shani/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py", line 41, in forward
output, *self.additional_args)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /py/conda-bld/pytorch_1493676237139/work/torch/lib/THNN/generic/ClassNLLCriterion.c:57
For most of deeplearning library, target(or label) should start from 0.
It means that your target should be in the range of [0,n) with n-classes.
It looks like PyTorch expect to get zero-based labels (0/1 in your case) and you probably feed it with one-based labels (1/2)
I had the same error in my program and i just realized that the problem was in the number of output nodes in my neural network
In my program the number of output nodes of my model was not equal to the number of labels in dataset
the number of output was 1 and the number of target labels was 10. then i changed the number of output to 10, there was no error

TensorFlow: dimension error. how to debug?

I'm a beginner with TF
I've tried to adapt a code which is working well with some other data (noMNIST) to some new data, and i have a dimensionality error, and i don't know how to deal with it.
To debug, i'm trying to use tf.shape method but it doesn't give me the info i need...
def reformat(dataset, labels):
#dataset = dataset.reshape((-1, num_var)).astype(np.float32)
# Map 2 to [0.0, 1.0, 0.0 ...], 3 to [0.0, 0.0, 1.0 ...]
labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
type(train_dataset)
Training set (790184, 29) (790184, 39) Validation set (43899, 29)
(43899, 39) Test set (43899, 29) (43899, 39)
# Adding regularization to the 1 hidden layer network
graph1 = tf.Graph()
batch_size = 128
num_steps=3001
import datetime
startTime = datetime.datetime.now()
def define_and_run_batch(beta):
num_RELU =1024
with graph1.as_default():
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, num_var))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# Variables.
weights_RELU = tf.Variable(
tf.truncated_normal([num_var, num_RELU]))
print(tf.shape(weights_RELU) )
biases_RELU = tf.Variable(tf.zeros([num_RELU]))
weights_layer1 = tf.Variable(
tf.truncated_normal([num_RELU, num_labels]))
biases_layer1 = tf.Variable(tf.zeros([num_labels]))
# Training computation.
logits_RELU = tf.matmul(tf_train_dataset, weights_RELU) + biases_RELU
RELU_vec = tf.nn.relu(logits_RELU)
logits_layer = tf.matmul(RELU_vec, weights_layer1) + biases_layer1
# loss = tf.reduce_mean(
# tf.nn.softmax_cross_entropy_with_logits(logits_layer, tf_train_labels))
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits_layer, tf_train_labels,name="cross_entropy")
l2reg = tf.reduce_sum(tf.square(weights_RELU))+tf.reduce_sum(tf.square(weights_layer1))
# beta = 0.005
loss = tf.reduce_mean(cross_entropy+beta*l2reg)
# Optimizer.
optimizer = tf.train.GradientDescentOptimizer(0.3).minimize(loss)
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits_layer)
print("ok")
print(tf.shape(weights_RELU) )
valid_prediction = tf.nn.softmax(
tf.matmul(tf.nn.relu((tf.matmul(tf_valid_dataset, weights_RELU) + biases_RELU)),weights_layer1)+biases_layer1)
test_prediction =tf.nn.softmax(
tf.matmul(tf.nn.relu((tf.matmul(tf_test_dataset, weights_RELU) + biases_RELU)),weights_layer1)+biases_layer1)
with tf.Session(graph=graph1) as session:
tf.initialize_all_variables().run()
print("Initialized")
for step in range(num_steps):
# Pick an offset within the training data, which has been randomized.
# Note: we could use better randomization across epochs.
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
# Generate a minibatch.
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
# Prepare a dictionary telling the session where to feed the minibatch.
# The key of the dictionary is the placeholder node of the graph to be fed,
# and the value is the numpy array to feed to it.
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
#
_, l, predictions, logits = session.run(
[optimizer, loss,train_prediction,logits_RELU], feed_dict=feed_dict)
if (step % 500 == 0):
print("Minibatch loss at step %d: %f" % (step, l))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
print("Validation accuracy: %.1f%%" % accuracy(
valid_prediction.eval(), valid_labels))
test_acc = accuracy(test_prediction.eval(), test_labels)
print("Test accuracy: %.1f%%" % test_acc)
print('loss=%s' % l)
x = datetime.datetime.now() - startTime
print(x)
return(test_acc,round(l,5))
define_and_run_batch(0.005)
Tensor("Shape:0", shape=(2,), dtype=int32) ok Tensor("Shape_1:0",
shape=(2,), dtype=int32)
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) in ()
94 return(test_acc,round(l,5))
95
---> 96 define_and_run_batch(0.005)
in define_and_run_batch(beta)
54 print(tf.shape(weights_RELU) )
55 valid_prediction = tf.nn.softmax(
---> 56 tf.matmul(tf.nn.relu((tf.matmul(tf_valid_dataset, weights_RELU) + biases_RELU)),weights_layer1)+biases_layer1)
57
58
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.pyc
in matmul(a, b, transpose_a, transpose_b, a_is_sparse, b_is_sparse,
name)
949 transpose_a=transpose_a,
950 transpose_b=transpose_b,
--> 951 name=name)
952
953 sparse_matmul = gen_math_ops._sparse_mat_mul
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.pyc in _mat_mul(a, b, transpose_a, transpose_b, name)
684 """
685 return _op_def_lib.apply_op("MatMul", a=a, b=b, transpose_a=transpose_a,
--> 686 transpose_b=transpose_b, name=name)
687
688
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.pyc
in apply_op(self, op_type_name, name, **keywords)
653 op = g.create_op(op_type_name, inputs, output_types, name=scope,
654 input_types=input_types, attrs=attr_protos,
--> 655 op_def=op_def)
656 outputs = op.outputs
657 return _Restructure(ops.convert_n_to_tensor(outputs), output_structure)
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc
in create_op(self, op_type, inputs, dtypes, input_types, name, attrs,
op_def, compute_shapes, compute_device) 2040
original_op=self._default_original_op, op_def=op_def) 2041 if
compute_shapes:
-> 2042 set_shapes_for_outputs(ret) 2043 self._add_op(ret) 2044
self._record_op_seen_by_control_dependencies(ret)
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc
in set_shapes_for_outputs(op) 1526 raise RuntimeError("No
shape function registered for standard op: %s" 1527
% op.type)
-> 1528 shapes = shape_func(op) 1529 if len(op.outputs) != len(shapes): 1530 raise RuntimeError(
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/common_shapes.pyc
in matmul_shape(op)
87 inner_a = a_shape[0] if transpose_a else a_shape[1]
88 inner_b = b_shape[1] if transpose_b else b_shape[0]
---> 89 inner_a.assert_is_compatible_with(inner_b)
90 return [tensor_shape.TensorShape([output_rows, output_cols])]
91
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.pyc
in assert_is_compatible_with(self, other)
92 if not self.is_compatible_with(other):
93 raise ValueError("Dimensions %s and %s are not compatible"
---> 94 % (self, other))
95
96 def merge_with(self, other):
ValueError: Dimensions Dimension(29) and Dimension(30) are not
compatible
the whole code is on my github
https://github.com/FaguiCurtain/Kaggle-SF
the Udacity Assignment 3 file is working
the original data is here
https://www.kaggle.com/c/sf-crime/data
in Udacity, the data were images and each image was a 28x28 matrix which was reformatted into flattened vectors of size 784
in the Kaggle-SF file, i am feeding vectors of size 29, and labels can take 39 different values.
thanks for your help
In debug mode you can check shapes of you Tensors.
by the way you error is valid_prediction assignment. to make it better for debugging and reading it's better to define each step in a separate line. you are using 4 operation in 1 line. BTW in debug mode (for example in Pycharm) you can inspect the element and check what is causing the problem
To check the dimensions, you can directly print the tensors. When you print the tensors, you can view the dimensions. I suggest if you are a beginner, try using the 'tf.layers' package which contains high level wrappers for the various layers one would need to build a CNN in tensorflow. By using this, you can avoid having to deal with the various low level operations like 'matmul' and adding bias for example. The activations can also be directly applied by the layers without having to implement it manually.
As far as debugging is concerned, from the code since you have merged the operations, its hard to see what is going on under the hood unless we can use a proper debugger. If you are not using an IDE, I suggest using 'pudb'.

Resources