Cross entropy loss in pytorch nn.CrossEntropyLoss() - torch

maybe someone is able to help me here. I am trying to compute the cross entropy loss of a given output of my network
print output
Variable containing:
1.00000e-02 *
-2.2739 2.9964 -7.8353 7.4667 4.6921 0.1391 0.6118 5.2227 6.2540
[torch.FloatTensor of size 1x10]
and the desired label, which is of the form
print lab
Variable containing:
[torch.FloatTensor of size 1]
where x is an integer between 0 and 9.
According to the pytorch documentation (
criterion = nn.CrossEntropyLoss()
loss = criterion(output, lab)
this should work, but unfortunately I get a weird error
TypeError: FloatClassNLLCriterion_updateOutput received an invalid combination of arguments - got (int, torch.FloatTensor, !torch.FloatTensor!, torch.FloatTensor, bool, NoneType, torch.FloatTensor, int), but expected (int state, torch.FloatTensor input, torch.LongTensor target, torch.FloatTensor output, bool sizeAverage, [torch.FloatTensor weights or None], torch.FloatTensor total_weight, int ignore_index)
Can anyone help me? I am really confused and tried almost everything I could imagined to be helpful.

Please check this code
import torch
import torch.nn as nn
from torch.autograd import Variable
output = Variable(torch.rand(1,10))
target = Variable(torch.LongTensor([1]))
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
This will print out the loss nicely:
Variable containing:
[torch.FloatTensor of size 1]


XGBoost Custom Objective Function

this will be a long question. I’m trying to define my own custom objective function
I want the XGBClassifier, so I run
from xgboost import XGBClassifier
the documentation of xgboost says:
A custom objective function can be provided for the objective parameter. In this case, it should have the signature
objective(y_true, y_pred) -> grad, hess :
y_true: array_like of shape [n_samples], The target values
y_pred: array_like of shape [n_samples], The predicted values
grad: array_like of shape [n_samples], The value of the gradient for each sample point.
hess: array_like of shape [n_samples], The value of the second derivative for each sample point
Now, I’ve coded this custom:
def guess_averse_loss(y_true, y_pred):
y_true = y_true.astype(int)
y_pred = y_pred.astype(int)
... stuffs ...
return grad, hess
everything is compatible with the previous documentation.
If I run:
classifier.train(X_train, y_train)
(where custom_weighted_accuracy is a custom metric defined by me following the documentation of scikitlearn)
I get the error:
-> first_term = np.multiply(cost_matrix[y_true, y_pred], np.exp(y_pred - y_true))
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4043,) (4043,5)
So, y_pred enters the function as a matrix (n_samples x n_classes) where the element ij is the probability that the sample i belongs to the class j.
Then, I modify the line as
first_term = np.multiply(cost_matrix[y_true, np.argmax(y_pred, axis=1)],np.exp(np.argmax(y_pred, axis=1) - y_true))
so it passes from a matrix to an array,
This leads to the error:
unknown custom metric
so it seems that the problem now is the metric.
I try to remove the custom obj function using the default one and another error comes:
XGBoostError: Check failed: in_gpair->Size() % ngroup == 0U (3 vs. 0) : must have exactly ngroup * nrow gpairs
You read what I've tried, I'm excepting some suggestion to solve this problems

Error in .local(x, ...): x and y don't match

I'm new to R, and trying to fit a model using kernlab with some data that I just loaded in. However, when I try and load it in I get the error message in the subject line. I assume this means the data type of X and y are not compatible.
Here's some sample code:
data = read.delim("my-sample-file.txt")
model = ksvm(data[, 1:10], data[, 11])
When I call data[, 11] I just the raw values in the column returned to me, and I notice the typeof function returns the value integer, which I found strange. I am not using any additional packages, just trying to get something basic to work.
Thank you.
Reading the help page for ksvm shows that the Usage sections says that using x and y as the input parameters requires a matrix for x, so this should be more successful (assuming that the data object has all numeric columns. You really should be looking at your data carefully before reaching for analysis tools.):
model = ksvm( x = data.matrix(data[, 1:10]), y=data[, 11]) )
Note that you can get exactly the same error with the iris data.frame:
ksvm(x=iris[-5], y=iris$Species)
Error in .local(x, ...) : x and y don't match.
Whereas converting to matrix results in success:
ksvm(x=data.matrix(iris[-5]), y=iris$Species)
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 1
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 0.484488222038106
Number of Support Vectors : 57
Objective Function Value : -3.7021 -3.8304 -21.7405
Training error : 0.026667
Morals of the story: Pay attention to the 'Usage' section to give guidance on the different forms that generic functions may take. And always assume that the authors of the help page are excruciatingly correct in their description of the arguments in the 'Arguments' sections. If they say matrix, don't assume they mean anything sort of like a matrix. (But if you mutter under your breath that this seems like something that should have been anticipated and a more informative error message emitted, I would not disagree.)

R: Profile-likelihood based confidence intervals

I am using the function plkhci from library Bhat to construct Profile-likelihood based confidence intervals and I got this warning:
Warning message: In dqstep(list(label = x$label, est = btrf(xt, x$low,
x$upp), low = x$low, : oops: unable to find stepsize, use default
when i run
r <- dfp(x,f=nlogf)
Can I ignore this warning as I still can get the output?
Following is the complete coding:
deltae<-ifelse(delta==0, 1,0)
deltar<-ifelse(delta==1, 1,0)
deltai<-ifelse(delta==2, 1,0)
dat=data.frame(t,delta, deltae,deltar,deltai,x)
dat$interval[delta==2] <- as.character(cut(dat$t[delta==2], breaks=seq(0, 600, 100)))
labs <- cut(dat$t[delta==2], breaks=seq(0, 600, 100))
dat$lower[delta==2]<-as.numeric( sub("\\((.+),.*", "\\1", labs) )
dat$upper[delta==2]<-as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) )
data0<-dat[which(dat$delta==0),]#uncensored data
data1<-dat[which(dat$delta==1),]#right censored data
data2<-dat[which(dat$delta==2),]#interval censored data
x <- list(label=c("beta0","beta1","gamma"),est=c(-8,0.03,0.0105),low=c(-10,0,0),upp=c(10,1,1))
r <- dfp(x,f=nlogf)
x$est <- r$est
I am giving you a super long answer, but it will help you see that you can chase down your own error messages (most of the time, sometimes this means of looking at functions will not work). It is good to see what is happening inside a method when it throws an warning because sometimes it is fine and sometimes you need to fix your data.
This function is REALLY involved! You can look at it by typing dfp into the R command line (NO TRAILING PARENTHESES) and it will print out the whole function.
17 lines from the end, you will see an assignment:
del <- dqstep(x, f, sens = 0.01)
You can see that this calls the function dqstep, which is reflected in your warning.
You can see this function by typing dqstep into the command line of R again. In reading through this function, also long but not so tedious, there is this section of boolean logic:
if (r < 0 | | b == 0) {
warning("oops: unable to find stepsize, use default")
cat("problem with ", x$label[i], "\n")
This is the culprit, it returns the message you are getting. The line right above it spells out how r is calculated. You are feeding this function your default x from the prior function plus a sensitivity equations (which I assume dfp generates, it is huge and ugly, so I did not untangle all of it). When the previous nested function returns either an r value lower than Zero, and r value of NA or a b value of ZERO, that message is displayed.
The second error tells you that it was likely b==0 because b is in the denominator and it returned and infinity value, so NO STEP SIZE IS RETURNED FROM THIS NESTED FUNCTION to the variable del in dfp.
The step is fed into THIS equation:
h <- logit.hessian(x, f, del, dapprox = FALSE, nfcn)
which you can look into by typing logit.hessian into the R commandline.
When you do, you see that del is a step size in a logit scale, with a default value of del=rep(0.002, length(x$est))...which the function set for you because running the function dqstep returned no value.
So, you now get to decide if using that step size in the calculation of your confidence interval seems right or if there is a problem with your data which needs resolving to make this work better for you.
When I ran it, line by line, I got this message:
Error in if (denom <= 0) { : missing value where TRUE/FALSE needed
at this line of code:
r <- dfp(x,f=nlogf(x))
Which makes me think I was correct.
That is how I chase down issues I have with messages from packages when I get a message like yours.

How to weight observations in mxnet?

I am new to neural networks and the mxnet package in R. I want to do a logistic regression on my predictors since my observations are probabilities varying between 0 and 1. I'd like to weight my observations by a vector obsWeights I have, but I'm not sure where to implement the weights. There seems to be a weight= option in mx.symbol.FullyConnected but if I try weight=obsWeights I get the following error message
Error in mx.varg.symbol.FullyConnected(list(...)) :
Cannot find argument 'weight', Possible Arguments:
num_hidden : int, required
Number of hidden nodes of the output.
no_bias : boolean, optional, default=False
Whether to disable bias parameter.
How should I proceed to weight my observations? Here is my code at the moment.
# Prepare data = model.matrix(obs ~ . , data = train_data)
train_label = train_data$obs
# Normalize = apply(, 2, function(x) (x-min(x))/(max(x)-min(x)))
# Create MXDataIter compatible iterator
batch_size = 128
train.iter =, label=train_label,
batch.size=batch_size, shuffle=T)
# Symbolic model definition
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data=data, num.hidden=128, name='fc1')
act1 = mx.symbol.Activation(data=fc1, act.type='relu', name='act1')
final = mx.symbol.FullyConnected(data=act1, num.hidden=1, name='final')
logistic = mx.symbol.LogisticRegressionOutput(data=final, name='logistic')
# Run model
mxnet_train = mx.model.FeedForward.create(
symbol = logistic,
X = train.iter,
initializer = mx.init.Xavier(rnd_type = 'gaussian', factor_type = 'avg', magnitude = 2),
num.round = 25)
Assigning the fully connected weight argument is not what you want to do at any rate. That weight is a reference to parameters of the layer; i.e., what you multiply in the inputs by to get output values These are the parameter values you're trying to learn.
If you want to make some samples matter more than others, then you'll need to adjust the loss function. For example, multiply the usual loss function by your weights so that they do not contribute as much to the overall average loss.
I do not believe the standard Mxnet loss functions have a spot for assigning weights (that is LogisticRegressionOutput won't cover this). However, you can make your own cost function that does. This would involve passing your final layer through a sigmoid activation function to first generate the usual logistic regression output value. Then pass that into the loss function you define. You could do squared error, but for logistic regression you'll probably want to use the cross entropy function:
l * log(y) + (1 - l) * log(1 - y),
where l is the label and y is the predicted value.
Ideally, you'd write a symbol with an efficient definition of the gradient (Mxnet has a cross entropy function, but its for softmax input, not a binary output. You could translate your output to two outputs with softmax as an alternative, but that seems less easy to work with in this case), but the easiest path would be to let Mxnet do its autodiff on it. Then you multiply that cross entropy loss by the weights.
I haven't tested this code, but you'd ultimately have something like this (this is what you'd do in python, should be similar in R):
label = mx.sym.Variable('label')
out = mx.sym.Activation(data=final, act_type='sigmoid')
ce = label * mx.sym.log(out) + (1 - label) * mx.sym.log(1 - out)
weights = mx.sym.Variable('weights')
loss = mx.sym.MakeLoss(weigths * ce, normalization='batch')
Then you want to input your weight vector into the weights Variable along with your normal input data and labels.
As an added tip, the output of an mxnet network with a custom loss via MakeLoss outputs the loss, not the prediction. You'll probably want both in practice, in which case its useful to group the loss with a gradient-blocked version of the prediction so that you can get both. You'd do that like this:
pred_loss = mx.sym.Group([mx.sym.BlockGrad(out), loss])

Rstudio - Error in user-created function - Object not found

First thing's first; my skills in R are somewhat lacking, so there is a chance I may be using something incorrectly in the following. If I go wrong somewhere, please let me know.
I've been having a problem in Rstudio where I try to create 2 functions for formulae, then use nls() to create a model using those, with which I will make a plot. When I try to run the line for creating it, I get an error message saying an object is missing. It is always the last object in the function of the first "formula", in this case, 'p'.
I'll provide my code here then explain what I am trying to do for a little context;
DATA <- read.csv(file.choose(),
formula <- function(m, h, g, p){(2*m)/(m+(sqrt(m^2+1)))*p*g*(h^2/2)}
formula.2 <- function(P, V, g){P*V*g}
m = 0.85
p = 766.42
g = 9.81
P = 0.962
h = DATA$lithothick
V = DATA$Vol
fit.1 <- nls(formula (P, V, g) ~ formula(m, h, g, p), data = DATA)
If I run it how it is shown, I get the error;
Error in (2 * m)/(m + (sqrt(m^2 + 1))) * p : 'p' is missing
However it will show h if I rearrange the objects in the formula to (m,g,p,h)
Error in h^2 : 'h' is missing
Now, what I'm trying to do is this; I have a .csv file with 3 thicknesses (0.002, 0.004, 0.006 meters) and 3 volumes (10, 25, 50 milliliters). I am trying to see how the rates of strength and buoyancy increase (in relation to each other) as the thickness and volume for each object (respectively) increases. I was hoping to come out with a graph showing the upward trend for each property (strength and buoyancy), as I believe them to be unequal (one exponential the other linear). I hope that isn't more confusing than clarifying, but any pointers would be GREATLY appreciated.
You cannot overload functions this way in R, what you can do is provide optional arguments (which is a kind of overload) with syntax function(mandatory, optionnal="")
For what you are trying to do, you have to use formula.2 if you want to use the 3-arguments formula.
A workaround could be to use one function with one optionnal argument and check if this argument has been used. Something like :
formula = function(m, h, g, p="") {
if (is.numeric(p)) {
} else {
This is ugly and a very bad way to do it (your variables do not really mean the same thing from one call to the other) but it works.
