Gradient flow stopped on a combined model - torch

I meet with a problem that the gradient cannot backpropagate on a combined network. I checked lots of answers but cannot find a relevant solution to this problem. I would appreciate it so much if we can solve this.
I wanted to calculate the gradient for input data in this code:
for i, (input, target, impath) in tqdm(enumerate(data_loader)):
# print(‘input.shape:’, input.shape)
input = Variable(input.cuda(), requires_grad=True)
output = model(input)
loss = criterion(output, target.cuda())
loss = Variable(loss, requires_grad=True)
loss.backward()
print(‘input:’, input.grad.data)
but I got errror:
print(‘input:’, input.grad.data)
AttributeError: ‘NoneType’ object has no attribute ‘data’
and my model is a combined model that I loaded the parameters from two pretrained models.
I checked the requires_grad state-dict of model weights, it is true, however, the gradient of the model weights is None.
Is it because I load the state-dict that caused the gradient block?
How can I deal with this problem?
The model structure is attached below:
class resnet_model(nn.Module):
def __init__(self, opt):
super(resnet_model, self).__init__()
resnet = models.resnet101()
num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, 1000)
if opt.resnet_path != None:
state_dict = torch.load(opt.resnet_path)
resnet.load_state_dict(state_dict)
print("resnet load state dict from {}".format(opt.resnet_path))
self.model1 = torch.nn.Sequential()
for chd in resnet.named_children():
if chd[0] != 'fc':
self.model1.add_module(chd[0], chd[1])
self.model2 = torch.nn.Sequential()
self.classifier = LINEAR_LOGSOFTMAX(input_dim=2048, nclass=200)
if opt.pretrained != None:
self.classifier_state_dict = torch.load('../checkpoint/{}_cls.pth'.format(opt.pretrained))
print("classifier load state dict from ../checkpoint/{}_cls.pth".format(opt.pretrained))
self.classifier.load_state_dict(self.classifier_state_dict)
for chd in self.classifier.named_children():
self.model2.add_module(chd[0], chd[1])
def forward(self, x):
x = self.model1(x)
x = x.view(-1, 2048)
x = self.model2(x)
return x

The problem is solved with this comment:
Why do you have this line: loss = Variable(loss, requires_grad=True) ?
Variable should not be used anymore.
So the line above should be deleted and to mark a Tensor for which you want gradients, you can use:
input = input.cuda().requires_grad_().

Related

detectron2 diffusioninst: oom-kill during training

I tried to run code for DiffusionInst based on Detectron2 (source code: https://github.com/chenhaoxing/DiffusionInst). During my training, my python process has always been killed (at 10000-20000 iteration epochs, which is insufficient for diffisioninst training).
I only rewrite the code for dataloader, in order to adapt to my own dataset.
My new code for dataloader:
class DiffusionInstDatasetMapper:
"""
A callable which takes a dataset dict in Detectron2 Dataset format,
and map it into a format used by DiffusionInst.
The callable currently does the following:
1. Read the image from "file_name"
2. Applies geometric transforms to the image and annotation
3. Find and applies suitable cropping to the image and annotation
4. Prepare image and annotation to Tensors
"""
def __init__(self, cfg, is_train=True):
if cfg.INPUT.CROP.ENABLED and is_train:
self.crop_gen = [
# T.ResizeShortestEdge([400, 500, 600], sample_style="choice"),
T.RandomCrop(cfg.INPUT.CROP.TYPE, cfg.INPUT.CROP.SIZE),
]
else:
self.crop_gen = None
self.tfm_gens = build_transform_gen(cfg, is_train)
logging.getLogger(__name__).info(
"Full TransformGens used in training: {}, crop: {}".format(str(self.tfm_gens), str(self.crop_gen))
)
self.img_format = cfg.INPUT.FORMAT
self.is_train = is_train
def __call__(self, dataset_dict):
"""
Args:
dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.
Returns:
dict: a format that builtin models in detectron2 accept
"""
dataset_dict = copy.deepcopy(dataset_dict) # it will be modified by code below
# image = utils.read_image(dataset_dict["file_name"], format=self.img_format)
## crop roi
'''lst = dataset_dict['file_name'].split('-')
image = sitk.ReadImage('-'.join(lst[:-2]))
image = sitk.GetArrayFromImage(image)
above, below = int(lst[-2]), int(lst[-1])
image = image[:, above:below, :]'''
## no crop roi
image = sitk.ReadImage(dataset_dict["file_name"],sitk.sitkFloat32)
image = sitk.GetArrayFromImage(image)
# print('**********************',image.shape,'************************')
image = (image - image.min()) / (image.max() - image.min()) * 255
#print(image.dtype)
image = image.transpose(1, 2, 0).astype(np.uint8)
image = np.repeat(image, 3, axis=2)
#print(image.dtype)
utils.check_image_size(dataset_dict, image)
#origshape = image.shape
if self.crop_gen is None:
image, transforms = T.apply_transform_gens(self.tfm_gens, image)
else:
image, transforms = T.apply_transform_gens(
self.tfm_gens + self.crop_gen, image
)
#print('orig', origshape, '\t\tresized', image.shape)
image_shape = image.shape[:2] # h, w
# Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
# but not efficient on large generic data structures due to the use of pickle & mp.Queue.
# Therefore it's important to use torch.Tensor.
dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))
del image
gc.collect()
if not self.is_train:
# USER: Modify this if you want to keep them for some reason.
dataset_dict.pop("annotations", None)
return dataset_dict
if "annotations" in dataset_dict:
# USER: Modify this if you want to keep them for some reason.
# import pdb;pdb.set_trace()
for anno in dataset_dict["annotations"]:
# anno.pop("segmentation", None)
anno.pop("keypoints", None)
# USER: Implement additional transformations if you have other types of data
annos = [
utils.transform_instance_annotations(obj, transforms, image_shape)
for obj in dataset_dict.pop("annotations")
if obj.get("iscrowd", 0) == 0
]
instances = utils.annotations_to_instances(annos, image_shape, mask_format="bitmask")
dataset_dict["instances"] = utils.filter_empty_instances(instances)
del instances
gc.collect()
return dataset_dict
And the information about the oom-killer:
[2599547.303018] python invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=995
[2599547.303084] [<ffffffff8119bfae>] oom_kill_process+0x1fe/0x3c0
[2599547.303133] Task in /kubepods/burstable/podd09a5032-8b07-11ed-bb60-ac1f6b9ec91e/8b4a8d5c2c1a082f93b1610173beb70bbc19fb1a1c2e28150d2d912ed9b95b10 killed as a result of limit of /kubepods/burstable/podd09a5032-8b07-11ed-bb60-ac1f6b9ec91e
[2599547.305957] Memory cgroup out of memory: Kill process 1041771 (python) score 1198 or sacrifice child
[2599547.307810] Killed process 1041771 (python) total-vm:36436532kB, anon-rss:10288264kB, file-rss:104888kB
[2599718.702250] python invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=995
[2599718.702299] [<ffffffff8119bfae>] oom_kill_process+0x1fe/0x3c0
[2599718.702333] Task in /kubepods/burstable/podd09a5032-8b07-11ed-bb60-ac1f6b9ec91e/8b4a8d5c2c1a082f93b1610173beb70bbc19fb1a1c2e28150d2d912ed9b95b10 killed as a result of limit of /kubepods/burstable/podd09a5032-8b07-11ed-bb60-ac1f6b9ec91e
I set IMS_PER_BATCH to 1, and used a dataset which contains only 1 image, but the oom problem still occurred.
I wonder know what should i do to prevent oom problem?

Type error when fine-tuning a bert-large-uncased-whole-word-masking model by Huggingface

I am trying to fine-tune a Huggingface bert-large-uncased-whole-word-masking model and i get a type error like this when training:
"TypeError: only integer tensors of a single element can be converted to an index"
Here is the code:
train_inputs = tokenizer(text_list[0:457], return_tensors='pt', max_length=512, truncation=True, padding='max_length')
train_inputs['labels']= train_inputs.input_ids.detach().clone()
Then i mask randomly about 15% of the words in the input-ids,
and define a class for the dataset, and then the mistake happens in the training loop:
class MeditationsDataset(torch.utils.data.Dataset):
def __init__(self, encodings):
self.encodings= encodings
def __getitem__(self, idx):
return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
def __len__(self):
return self.encodings.input_ids
train_dataset = MeditationsDataset(train_inputs)
train_dataloader = torch.utils.data.DataLoader(dataset= train_dataset, batch_size=8, shuffle=False)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
from transformers import BertModel, AdamW
model = BertModel.from_pretrained("bert-large-uncased-whole-word-masking")
model.to(device)
model.train()
optim = AdamW(model.parameters(), lr=1e-5)
num_epochs = 2
from tqdm.auto import tqdm
for epoch in range(num_epochs):
loop = tqdm(train_dataloader, leave=True)
for batch in loop:
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
The mistake happens in "for batch in loop"
Does anybody understand it and know how to solve this? Thanks in advance for your help
In the class MeditationsDataset in function __getitem__ torch.tensor(val[idx]) is deprecated by PyTorch you should use instead val[idx].clone().detach()

How to calculate similarities between test and train documents

I'm trying to calculate similarities between test and train documents and label them. Here is the code but it doesn't work. I'd also appreciate if somebody could explain the idea.
def calculate_similarities(self, vecTestDoc, vectorsOfTrainDocs):
list_of_similarities = []
for vector in vectorsOfTrainDocs:
label = vectorsOfTrainDocs.key()
list_of_similarities += [(self.calculate_similarities(vector, vecTestDoc), label)]
return list_of_similarities
Here's the error:
File "..\classification.py", line 98, in calculate_similarities
label = vectorsOfTrainDocs.key()
AttributeError: 'list' object has no attribute 'key'
Edit: I've defined two more functions and have been working on a different solution. Here are they:
def cosine_similarity(self, weightedA, weightedB):
dotAB = dot(weightedA, weightedB)
normA = math.sqrt(dot(weightedA, weightedA))
normB = math.sqrt(dot(weightedB, weightedB))
return dotAB / (normA * normB)
def fit(self, doc_collection):
self.doc_collection = doc_collection
self.vectorsOfDoc_collection = [(doc, self.doc_collection.tfidf(doc.token_counts))
for doc in self.doc_collection.docid_to_doc.values()]
I believe something like this would work but there are still error messages... What should I change?
return [self.doc_collection.cosine_similarity(vecTestDoc) in vectorsOfTrainDocs]

Error when implementing RBF kernel bandwidth differentiation in Pytorch

I'm implementing an RBF network by using some beginer examples from Pytorch Website. I have a problem when implementing the kernel bandwidth differentiation for the network. Also, Iwould like to know whether my attempt ti implement the idea is fine. This is a code sample to reproduce the issue. Thanks
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
def kernel_product(x,y, mode = "gaussian", s = 1.):
x_i = x.unsqueeze(1)
y_j = y.unsqueeze(0)
xmy = ((x_i-y_j)**2).sum(2)
if mode == "gaussian" : K = torch.exp( - xmy/s**2) )
elif mode == "laplace" : K = torch.exp( - torch.sqrt(xmy + (s**2)))
elif mode == "energy" : K = torch.pow( xmy + (s**2), -.25 )
return torch.t(K)
class MyReLU(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
"""
#staticmethod
def forward(ctx, input):
"""
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
"""
ctx.save_for_backward(input)
return input.clamp(min=0)
#staticmethod
def backward(ctx, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.cuda.FloatTensor
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(H, D_in).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
# I've created this scalar variable (the kernel bandwidth)
s = Variable(torch.randn(1).type(dtype), requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# To apply our Function, we use Function.apply method. We alias this as 'relu'.
relu = MyReLU.apply
# Forward pass: compute predicted y using operations on Variables; we compute
# ReLU using our custom autograd operation.
# y_pred = relu(x.mm(w1)).mm(w2)
y_pred = relu(kernel_product(w1, x, s)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
print(t, loss.data[0])
# Use autograd to compute the backward pass.
loss.backward()
# Update weights using gradient descent
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
# Manually zero the gradients after updating weights
w1.grad.data.zero_()
w2.grad.data.zero_()
However I get this error, which dissapears when I simply use a fixed scalar in the default input parameter of kernel_product():
RuntimeError: eq() received an invalid combination of arguments - got (str), but expected one of:
* (float other)
didn't match because some of the arguments have invalid types: (str)
* (Variable other)
didn't match because some of the arguments have invalid types: (str)
Well, you are calling kernel_product(w1, x, s) where w1, x and s are torch Variable while the definition of the function is: kernel_product(x,y, mode = "gaussian", s = 1.). Seems like s should be a string specifying the mode.

Catching the print of the function

I am using package fda in particular function fRegress. This function includes another function that is called eigchk and checks if coeffients matrix is singular.
Here is the function as the package owners (J. O. Ramsay, Giles Hooker, and Spencer Graves) wrote it.
eigchk <- function(Cmat) {
# check Cmat for singularity
eigval <- eigen(Cmat)$values
ncoef <- length(eigval)
if (eigval[ncoef] < 0) {
neig <- min(length(eigval),10)
cat("\nSmallest eigenvalues:\n")
print(eigval[(ncoef-neig+1):ncoef])
cat("\nLargest eigenvalues:\n")
print(eigval[1:neig])
stop("Negative eigenvalue of coefficient matrix.")
}
if (eigval[ncoef] == 0) stop("Zero eigenvalue of coefficient matrix.")
logcondition <- log10(eigval[1]) - log10(eigval[ncoef])
if (logcondition > 12) {
warning("Near singularity in coefficient matrix.")
cat(paste("\nLog10 Eigenvalues range from\n",
log10(eigval[ncoef])," to ",log10(eigval[1]),"\n"))
}
}
As you can see last if condition checks if logcondition is bigger than 12 and prints then the ranges of eigenvalues.
The following code implements the useage of regularization with roughness pennalty. The code is taken from the book "Functional data analysis with R and Matlab".
annualprec = log10(apply(daily$precav,2,sum))
tempbasis =create.fourier.basis(c(0,365),65)
tempSmooth=smooth.basis(day.5,daily$tempav,tempbasis)
tempfd =tempSmooth$fd
templist = vector("list",2)
templist[[1]] = rep(1,35)
templist[[2]] = tempfd
conbasis = create.constant.basis(c(0,365))
betalist = vector("list",2)
betalist[[1]] = conbasis
SSE = sum((annualprec - mean(annualprec))^2)
Lcoef = c(0,(2*pi/365)^2,0)
harmaccelLfd = vec2Lfd(Lcoef, c(0,365))
betabasis = create.fourier.basis(c(0, 365), 35)
lambda = 10^12.5
betafdPar = fdPar(betabasis, harmaccelLfd, lambda)
betalist[[2]] = betafdPar
annPrecTemp = fRegress(annualprec, templist, betalist)
betaestlist2 = annPrecTemp$betaestlist
annualprechat2 = annPrecTemp$yhatfdobj
SSE1.2 = sum((annualprec-annualprechat2)^2)
RSQ2 = (SSE - SSE1.2)/SSE
Fratio2 = ((SSE-SSE1.2)/3.7)/(SSE1/30.3)
resid = annualprec - annualprechat2
SigmaE. = sum(resid^2)/(35-annPrecTemp$df)
SigmaE = SigmaE.*diag(rep(1,35))
y2cMap = tempSmooth$y2cMap
stderrList = fRegress.stderr(annPrecTemp, y2cMap, SigmaE)
betafdPar = betaestlist2[[2]]
betafd = betafdPar$fd
betastderrList = stderrList$betastderrlist
betastderrfd = betastderrList[[2]]
As penalty factor the authors use certain lambda.
The following code implements the search for the appropriate `lambda.
loglam = seq(5,15,0.5)
nlam = length(loglam)
SSE.CV = matrix(0,nlam,1)
for (ilam in 1:nlam) {
lambda = 10ˆloglam[ilam]
betalisti = betalist
betafdPar2 = betalisti[[2]]
betafdPar2$lambda = lambda
betalisti[[2]] = betafdPar2
fRegi = fRegress.CV(annualprec, templist,
betalisti)
SSE.CV[ilam] = fRegi$SSE.CV
}
By changing the value of the loglam and cross validation I suppose to equaire the best lambda, yet if the length of the loglam is to big or its values lead the coefficient matrix to singulrity. I recieve the following message:
Log10 Eigenvalues range from
-5.44495317739048 to 6.78194912518214
Created by the function eigchk as I already have mentioned above.
Now my question is, are there any way to catch this so called warning? By catch I mean some function or method that warns me when this has happened and I could adjust the values of the loglam. Since there is no actual warning definition in the function beside this print of the message I ran out of ideas.
Thank you all a lot for your suggestions.
By "catch the warning", if you mean, will alert you that there is a potential problem with loglam, then you might want to look at try and tryCatch functions. Then you can define the behavior you want implemented if any warning condition is satisfied.
If you just want to store the output of the warning (which might be assumed from the question title, but may not be what you want), then try looking into capture.output.

Resources