How to weight observations in mxnet? - r

I am new to neural networks and the mxnet package in R. I want to do a logistic regression on my predictors since my observations are probabilities varying between 0 and 1. I'd like to weight my observations by a vector obsWeights I have, but I'm not sure where to implement the weights. There seems to be a weight= option in mx.symbol.FullyConnected but if I try weight=obsWeights I get the following error message
Error in mx.varg.symbol.FullyConnected(list(...)) :
Cannot find argument 'weight', Possible Arguments:
----------------
num_hidden : int, required
Number of hidden nodes of the output.
no_bias : boolean, optional, default=False
Whether to disable bias parameter.
How should I proceed to weight my observations? Here is my code at the moment.
# Prepare data
train.mm = model.matrix(obs ~ . , data = train_data)
train_label = train_data$obs
# Normalize
train.mm = apply(train.mm, 2, function(x) (x-min(x))/(max(x)-min(x)))
# Create MXDataIter compatible iterator
batch_size = 128
train.iter = mx.io.arrayiter(data=t(train.mm), label=train_label,
batch.size=batch_size, shuffle=T)
# Symbolic model definition
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data=data, num.hidden=128, name='fc1')
act1 = mx.symbol.Activation(data=fc1, act.type='relu', name='act1')
final = mx.symbol.FullyConnected(data=act1, num.hidden=1, name='final')
logistic = mx.symbol.LogisticRegressionOutput(data=final, name='logistic')
# Run model
mxnet_train = mx.model.FeedForward.create(
symbol = logistic,
X = train.iter,
initializer = mx.init.Xavier(rnd_type = 'gaussian', factor_type = 'avg', magnitude = 2),
num.round = 25)

Assigning the fully connected weight argument is not what you want to do at any rate. That weight is a reference to parameters of the layer; i.e., what you multiply in the inputs by to get output values These are the parameter values you're trying to learn.
If you want to make some samples matter more than others, then you'll need to adjust the loss function. For example, multiply the usual loss function by your weights so that they do not contribute as much to the overall average loss.
I do not believe the standard Mxnet loss functions have a spot for assigning weights (that is LogisticRegressionOutput won't cover this). However, you can make your own cost function that does. This would involve passing your final layer through a sigmoid activation function to first generate the usual logistic regression output value. Then pass that into the loss function you define. You could do squared error, but for logistic regression you'll probably want to use the cross entropy function:
l * log(y) + (1 - l) * log(1 - y),
where l is the label and y is the predicted value.
Ideally, you'd write a symbol with an efficient definition of the gradient (Mxnet has a cross entropy function, but its for softmax input, not a binary output. You could translate your output to two outputs with softmax as an alternative, but that seems less easy to work with in this case), but the easiest path would be to let Mxnet do its autodiff on it. Then you multiply that cross entropy loss by the weights.
I haven't tested this code, but you'd ultimately have something like this (this is what you'd do in python, should be similar in R):
label = mx.sym.Variable('label')
out = mx.sym.Activation(data=final, act_type='sigmoid')
ce = label * mx.sym.log(out) + (1 - label) * mx.sym.log(1 - out)
weights = mx.sym.Variable('weights')
loss = mx.sym.MakeLoss(weigths * ce, normalization='batch')
Then you want to input your weight vector into the weights Variable along with your normal input data and labels.
As an added tip, the output of an mxnet network with a custom loss via MakeLoss outputs the loss, not the prediction. You'll probably want both in practice, in which case its useful to group the loss with a gradient-blocked version of the prediction so that you can get both. You'd do that like this:
pred_loss = mx.sym.Group([mx.sym.BlockGrad(out), loss])

Related

R is only returning non-zero coefficient estimates when using the "poly" function to generate predictors. How do I get the zero values into a vector?

I'm using regsubsets from the leaps library to perform the best subset selection. I need to compare the coefficients it generates to the "true" coefficients I specified when simulating the data (by comparison, meaning, the difference between them squared, and the square root taken of the sum), for each number of predictors.
Since there are 16 different models that regsubsets generated, I use a loop to do this automatically. It would work except that when I extract the coefficients from the best model fit with x predictors, it only gives me the non-zero coefficients of the polynomial fit. This messes up the size of the coefi vector causing it to be smaller in size than the truecoef true coefficients vector.
If I could somehow force all coefficients to be spat out from the model, I wouldn't have an issue. But after looking extensively, I don't know how to do that.
Alternative ways of solving this problem would also be appreciated.
library(leaps)
regfit.train=regsubsets(y ~ poly(x,25, raw = TRUE), data=mydata[train,], nvmax=25)
truecoef = c(3,0,-7,4,-2,8,0,-5,0,2,0,4,5,6,3,2,2,0,3,1,1)
coef.errors = rep(NA, 16)
for (i in 1:16) {
coefi = coef(regfit.train, id=i)
coef.errors[i] = mean((truecoef-coefi)^2)
}
The equation I'm trying to estimate, where j is the coefficient and r refers to the best model containing "r" coefficients:
Thanks!
This is how I ended up solving it (with some help):
The loop indexes which coefficients are available and performs the subtraction, for those unavailable, it assumes they are zero.
truecoef = c(3,0,-7,4,-2,8,0,-5,0,2,0,4,5,6,3,2,2,0,3,1,1)
val.errors = rep(NA, 16)
x_cols = colnames(x, do.NULL = FALSE, prefix = "x.")
for (i in 1:16) {
coefis = coef(regfit.train, id = i)
val.errors[i] = sqrt(sum((truecoef[x_cols %in% names(coefis)] -
coefis[names(coefis) %in% x_cols])^2) + sum(truecoef[!(x_cols %in% names(coefis))])^2)
}

zfit straight line fitting for 2 dim dataset

I would like to fit 2-dim plot by straight line (a*x+b) using zfit like the following figure.
That is very easy work by a probfit package, but it has been deprecated by scikit-hep. https://nbviewer.jupyter.org/github/scikit-hep/probfit/blob/master/tutorial/tutorial.ipynb
How can I fit such 2dim plots by any function?
I've checked zfit examples, but it seems to be assumed some distribution (histogram) thus zfit requires dataset like 1d array and I couldn't reach how to pass 2d data to zfit.
There is no direct way in zfit currently to implement this out-of-the-box (with one line), since a corresponding loss is simply not added.
However, the SimpleLoss (zfit.loss.SimpleLoss) allows you to construct any loss that you can think of (have a look at the example as well in the docstring). In your case, this would look along this:
x = your_data
y = your_targets # y-value
obs = zfit.Space('x', (lower, upper))
param1 = zfit.Parameter(...)
param2 = zfit.Parameter(...)
...
model = Func(...) # a function is the way to go here
data = zfit.Data.from_numpy(array=x, obs=obs)
def mse():
prediction = model.func(data)
value = tf.reduce_mean((prediction - y) ** 2) # or whatever you want to have
return value
loss = zfit.loss.SimpleLoss(mse, [param1, param2])
# etc.
On another note, it would be a good idea to add such a loss. If you're interested to contribute I recommend to get in contact with the authors and they will gladly help you and guide you to it.
UPDATE
The loss function itself consists presumably of three to four things: x, y, a model and maybe an uncertainty on y. The chi2 loss looks like this:
def chi2():
y_pred = model.func(x)
return tf.reduce_sum((y_pred - y) / y_error) ** 2)
loss = zfit.loss.SimpleLoss(chi2, model.get_params())
That's all, 4 lines of code. x is a zfit.Data object, model is in this case a Func.
Does that work?
That's all.

cpquery function of bnlearn always returns 0 for simple discrete artificial data

I want to calculate conditional probabilities based on a bayesian network based on some binary data I create. However, using the bnlearn::cpquery, always a value of 0 is returned, while bnlearn::bn.fit fits a correct model.
# Create data for binary chain network X1->Y->X2 with zeros and ones.
# All base rates are 0.5, increase in probability due to parent = .5 (from .25 to.75)
X1<-c(rep(1,9),rep(1,3),1,rep(1,3),rep(0,3),0,rep(0,3),rep(0,9))
Y<-c(rep(1,9),rep(1,3),0,rep(0,3),rep(1,3),1,rep(0,3),rep(0,9))
X2<-c(rep(1,9),rep(0,3),1,rep(0,3),rep(1,3),0,rep(1,3),rep(0,9))
dag1<-data.frame(X1,Y,X2)
# Fit bayes net to data.
res <- hc(dag1)
fittedbn <- bn.fit(res, data = dag1)
# Fitting works as expected, as seen by graph structure and coefficients in fittedbn:
plot(res)
print(fittedbn)
# Conditional probability query
cpquery(fittedbn, event = (Y==1), evidence = (X1==1), method = "ls")
# Using LW method
cpquery(fittedbn, event = (Y==1), evidence = list(X1 = 1), method = "lw")
'cpquery' just returns 0. I have also tried using the predict function, however this returns an error:
predict(object = fittedbn, node = "Y", data = list(X1=="1"), method = "bayes-lw", prob = True)
# Returns:
# Error in check.data(data, allow.levels = TRUE) :
# the data must be in a data frame.
In the above cpquery the expected result is .75, but 0 is returned. This is not specific to this event or evidence, regardless of what event or evidence I put in (e.g event = (X2==1), evidence = (X1==0) or event = (X2==1), evidence = (X1==0 & Y==1)) the function returns 0.
One thing I tried, as I thought the small amount of observations might be an issue, is to just increase the number of observations (i.e. vertically concatenating the above dataframe with itself a bunch of times), however this did not change the output.
I've seen many threads on cpquery and that it can be fragile, but none indicate this issue. To note: the example in 'cpquery' documentation works as expected, so it seems the problem is not due to my environment.
Any help would be greatly appreciated!

h2o in R : model produces a negative predClass value when my target feature is binary. What does this mean?

My model is producing a negative prediction classifier when my target feature is binary. Does this mean the neural net in h2o.deeplearning just thinks negative values are over 100% likely to be a zero?
my code is as follows:
modeldataset <- h2o.importFile(path = modeldata)
train<- as.h2o(modeldataset)
model<- h2o.deeplearning(x = colnames(train[1:45]), y = "Target", training_frame=train, 'exact_quantiles= False', score_training_samples = 0)
as.h2o(testdata)
as.h2o(modeldataset)
testdata$predClass = h2o.predict(model, newdata=testdata) # obtain the class (0/1)
testdata$predProb = h2o.predict(model, newdata=testdata)
h2o.exportFile(testdata, 'file/path', parts = 1)
Why would some of my predictions be negative? apologies in advance for any formatting errors, I am a new user of Stack Overflow.
This question has been asked a few times, and generally the issue is that you forgot to either convert your numeric valued response as a factor (see the as.factor function) and/or specifying the distribution parameter in the algorithm as bernoulli.
Take a look at the documentations overview of the distribution parameter it should help clear things up.

How to simulate the posterior filtered estimates of a Kalman Filter using the DSE package in R

How do I call for the posterior (refined) state estimates from a Kalman Filter simulation in R using the DSE package?
I have added an example below. Assume that I have created a simple random walk state space with the error being a standard normal distribution. The model is created using the SS function, with initialised state and covariance estimates of zero. The theoretical model form is thus:
X(t) = X(t-1) + e(t)~N(0,1) for state evolution
Y(t) = X(t) + w(t)~N(0,1)
We now implement this in R by following the instructions on page 6 and 7 of the "Kalman Filtering in R" article in the Journal of Statistical Software. First we create the state space model using the SS() function and store it in the variable called kalman.filter:
kalman.filter=dse::SS(F = matrix(1,1,1),
Q = matrix(1,1,1),
H = matrix(1,1,1),
R = matrix(1,1,1),
z0 = matrix(0,1,1),
P0 = matrix(0,1,1)
)
Then we simulate a 100 observations from the model form using simulate() and put them in a variable called simulate.kalman.filter:
simulate.kalman.filter=simulate(kalman.filter, start = 1, freq = 1, sampleT = 100)
Then we run the kalman filter against the measurements using l() and store it under the variable called test:
test=l(kalman.filter, simulate.kalman.filter)
From the outputs, which ones are my filtered estimates?
I have found the answer to this question.
Firstly, the filtered estimates of the model are not given in the l() function. This function only gives the one step ahead predictions. The above framing of my problem was coded as:
kalman.filter=dse::SS(F = matrix(1,1,1),
Q = matrix(1,1,1),
H = matrix(1,1,1),
R = matrix(1,1,1),
z0 = matrix(0,1,1),
P0 = matrix(0,1,1)
)
simulate.kalman.filter=simulate(kalman.filter, start = 1, freq = 1, sampleT = 100)
test=l(kalman.filter, simulate.kalman.filter)
The one step ahead predictions are given by:
predictions = test$estimates$pred
A quick way to visualize this is given by:
tfplot(test)
This allows you to quickly plot your one step ahead predictions against the actual data. To get your filtered estimates, you need to use the smoother() function, in the same dse package. It inputs the state model as well as the data, in this case it is kalman.filter and simulate.kalman.filter respectively. The output is smoothed estimates for all the time points. But note that it does this after considering the full data set, so it does not do this as each observation comes in. See code below. The first line of the code gives you your smoothed estimates, the following lines plot the example:
smooth = smoother(test, simulate.kalman.filter)
plot(test$estimates$pred, ylim=c(max(test$estimates$pred,smooth$filter$track,simulate.kalman.filter$outpu), min(test$estimates$pred,smooth$filter$track,simulate.kalman.filter$output)))
points(smooth$smooth$state, col = 3)
points(simulate.kalman.filter$output, col = 4)
The above plot plots all your actual data, model estimates and smoothed estimates against one another.

Resources