Custom Weight Regularization in Keras - r

I am attempting to implement a custom regularization method in Keras for R which will discourage negative weightings during training. I have found supporting documentation for this in Python, just not for R.
In this method, I would like to identify negative weightings, and then apply regularization to those weights specifically. I have my current attempt defined as
l1l2_reg <- function(weight_matrix) {
neg <- which(weight_matrix < 0, arr.ind = T)
return(0.0001 * sum(sum(weight_matrix[neg]^2)) + sum(sum(abs(weight_matrix[neg]^2))))
}
I am defining the usage of this within my model as
reconstruct = bottleneck %>%
layer_dense(units = input_size, activation = "linear",
kernel_regularizer = l1l2_reg,
name = "reconstruct")
When the model is run, I am met with the error message
Error: Discrete value supplied to continuous scale
I believe that this is occurring because the function is not correctly locating the weights, but am unsure how to go about it. Based on the code above, it should be identifying the indices of the negative weightings and then returning the regularization based off of that, but clearly my implementation is flawed. I primarily use MATLAB, so my implementation may also be skewed towards that as well.
What is the correct method of implementing this within R?

For most custom functions passed to Keras (in both Python and R), you generally have to stick to TensorFlow operations. In this case, which() and subsetting with an integer array via [neg] need to be updated to their TensorFlow equivalents: tf$where() and tf$gather_nd(). Or you can take a different approach altogether and use tf$maximum(), like in the example below.
(The [ method for tensors today doesn't yet accept a list of arbitrary integer indices, but rather, slice specs, in R see ?`[.tensorflow.tensor` for details)
(sum(), abs(), ^, * are R generics which automatically dispatch to the TensorFlow methods tf$reduce_sum(), tf$abs(), tf$pow() and tf$multiply() when called with a Tensor)
You can update your l1l2_reg like this (note, the actual calculation is slightly different from what you wrote, to match the common meaning of "l1" and "l2"):
library(tensorflow)
library(keras)
neg_l1l2_reg <- function(weight_matrix) {
x <- tf$maximum(tf$zeros_like(weight_matrix), weight_matrix)
l1 <- sum(abs(x)) * 0.0001
l2 <- sum(x ^ 2) * 0.0001
l1 + l2
}

Related

From cv.glmnet get confusion matrix

Explanation of the Problem
I am comparing a few models, and my dataset is so small that I would much rather use cross validation than splitting out a validation set. One of my models is made using glm "GLM", another by cv.glmnet "GLMNET". In pseudocode, what I'd like to be able to to is the following:
initialize empty 2x2 matrices GLM_CONFUSION and GLMNET_CONFUSION
# Cross validation loop
For each data point VAL in my dataset X:
Let TRAIN be the rest of X (not including VAL)
Train GLM on TRAIN, use it to predict VAL
Depending on if it were a true positive, false positive, etc...
add 1 to the correct entry in GLM_CONFUSION
Train GLMNET on TRAIN, use it to predict VAL
Depending on if it were a true positive, false positive, etc...
add 1 to the correct entry in GLMNET_CONFUSION
This is not hard to do, the problem lies in cv.glmnet already using cross validation
to deduce the best value of the penalty lambda. It would be convenient if I could have cv.glmnet automatically build up the confusion matrix of the best model, i.e. my code should look like:
initialize empty 2x2 matrices GLM_CONFUSION and GLMNET_CONFUSION
Train GLMNET on X using cv.glmnet
Set GLMNET_CONFUSION to be the confusion matrix of lambda.1se (or lambda.min)
# Cross validation loop
For each data point VAL in my dataset X:
Let TRAIN be the rest of X (not including VAL)
Train GLM on TRAIN, use it to predict VAL
Depending on if it were a true positive, false positive, etc...
add 1 to the correct entry in GLM_CONFUSION
Not only would it be convenient, it is somewhat of a necessity - there are two alternatives:
Use cv.glmnet to find a new lambda.1se on TRAIN at every iteration of the cross validation loop. (i.e. a nested cross-validation)
Use cv.glmnet to find lambda.1se on X, and then 'fix' that value and treat it like a normal model to train during the cross validation loop. (two parallel cross-validations)
The second one is philosophically incorrect as it means GLMNET would have information on what it is trying to predict in the cross validation loop. The first would take a large chunk of time - I could in theory do it, but it might take half an hour and I feel as if there should be a better way.
What I've Looked At So Far
I've looked at the documentation of cv.glmnet - it does not seem like you can do what I am asking, but I am very new to R and data science in general so it is perfectly possible that I have missed something.
I have also looked on this website and seen some posts that at first glance appeared to be relevant, but in fact are asking for something different - for example, this post: tidy predictions and confusion matrix with glmnet
The above post appears similar to what I want, but it is not quite what I am looking for - it appears they are using predict.cv.glmnet to make new predictions, and then creating the confusion matrix of that - whereas I want the confusion matrix of the predictions made during the cross validation step.
I'm hoping that someone is able to either
Explain if and how it is possible to create the confusion matrix as described
Show that there is a third alternative separate to the two I proposed
"Hand-implement cv.glmnet" is not a viable alternative :P
Conclusively state that what I want is not possible and that I need to do one of the two alternatives I mentioned.
Any one of those would be a perfectly fine answer to this question (although I'm hoping for option 1!)
Apologies if there is something simple I have missed!
Thanks to #missuse's advice, I was able to get a solution that worked for me! It corresponds to option 2 in my post, with this alternative being to use the caret package.
In essence we need to attach a custom summary function to caret's model trainer. I mostly bumbled about for a couple hours until I got it to work - there may be better ways to do this, and I encourage others to post alternate answers if they know of any! My code is at the bottom (it's been slightly modified to make it not specific to the task I was working on)
Hopefully if anyone has a similar problem then this will help. Another resource that I found useful in solving this was the following post: https://stats.stackexchange.com/questions/299653/caret-glmnet-vs-cv-glmnet, as in it you can see very clearly how to convert a call to cv.glmnet into a call to caret's train version of glmnet.
library(caret)
# Confusion Matrix of model outputs
CM <- function(model) {
# Need to find index of best tune found by
# cross validation
idx <- 1
for (i in 1:nrow(model$results)) {
check <- model$results[i,]
foundBest <- TRUE
for (col in colnames(model$bestTune)) {
if (check[,col] != model$bestTune[,col]) {
foundBest <- FALSE
break
}
}
if (foundBest) {
idx <- i
break
}
}
# They are averaged w.r.t. the number of folds (ctrl$number)
# hence the multiplication
c(
model$results[idx,]$true_pos,
model$results[idx,]$false_pos,
model$results[idx,]$false_neg,
model$results[idx,]$true_neg
) * model$control$number
}
# Summary function from the training to give confusion metric
SummaryFunc <- function (data, lev = NULL, model = NULL) {
# This puts our output in the right format
out <- postResample(data$pred, data$obs)
# Get the confusion matrix
cm <- confusionMatrix(
factor(data$pred, levels=c(0, 1)),
factor(data$obs, levels=c(0, 1))
)$table
# Add those details to the output
oldnames <- names(out)
out <- c(out, cm[1, 1], cm[2, 1], cm[1, 2], cm[2, 2])
names(out) <- c(oldnames, "true_pos", "false_pos", "false_neg", "true_neg")
out
}
# 10-fold cross validation, as in cv.glmnet implementation
ctrl <- trainControl(
method="cv",
number=10,
summaryFunction=SummaryFunc,
)
# Example of standard glm
our.glm <- train(
your_formula,
data=your_data,
method="glm",
family=gaussian(link="identity"),
trControl=ctrl,
metric="RMSE"
)
# Example of what used to be cv.glmnet
our.glmnet <- train(
your_feature_matrix,
your_label_matrix,
method="glmnet",
family=gaussian(link="identity"),
trControl=ctrl,
metric="RMSE",
tuneGrid = expand.grid(
alpha = 1,
lambda = seq(0.001, 0.1, by=0.001)
)
)
CM(our.glm)
CM(our.glmnet)

R package bnlearn: cpquery vs predict - different results?

I want to use my bayesian network as a classifier, first on complete evidence data (predict), but also on incomplete data (bnlearn::cpquery). But it seems that, even working with the same evidence, the functions give different results (not only based on slight deviation due to sampling).
With complete data, one can easily use R's predict function:
predict(object = BN,
node = "TargetVar",
data = FullEvidence ,
method = "bayes-lw",
prob = TRUE)
By analyzing the prob attribute, I understood that the predict-function simply chooses the factor level with the highest probability assigned.
When it comes to incomplete evidence (only outcomes of some nodes are known), predict doesn't work anymore:
Error in check.fit.vs.data(fitted = fitted,
data = data,
subset = setdiff(names(fitted), :
required variables [.....] are not present in the data.`
So, I want to use bnlearn::cpquery with a list of known evidence:
cpquery(fitted = BN,
event = TargetVar == "TRUE",
evidence = evidenceList,
method = "lw",
n = 100000)
Again, I simply want to use the factor with the highest probability as prediction. So if the result of cpquery is higher than 0.5, I set the prediction to TRUE, else to FALSE.
I tried to monitor the process by giving the same (complete) data to both functions, but they don't give me back the same results. There are large differences, e.g. predict's "prob"-attribute gives me a p(false) = 27% whereas cpquery gives me p(false) = 2,2%.
What is the "right" way of doing this? Using only cpquery, also for complete data? Why are there large differences?
Thanks for your help!
As user20650 put it, increasing the number of samples in the predict call was the solution to get very similar results. So just provide the argument n = ... in your function call.
Of course that makes sense, I just didn't know about that argument in the predict() function.
There's no documentation about it in the bn.fit utilities and also none in the quite generic documentation of predict.

Why do the inverse t-distributions for small values differ in Matlab and R?

I would like to evaluate the inverse Student's t-distribution function for small values, e.g., 1e-18, in Matlab. The degrees of freedom is 2.
Unfortunately, Matlab returns NaN:
tinv(1e-18,2)
NaN
However, if I use R's built-in function:
qt(1e-18,2)
-707106781
The result is sensible. Why can Matlab not evaluate the function for this small value? The Matlab and R results are quite similar to 1e-15, but for smaller values the difference is considerable:
tinv(1e-16,2)/qt(1e-16,2) = 1.05
Does anyone know what is the difference in the implemented algorithms of Matlab and R, and if R gives correct results, how could I effectively calculate the inverse t-distribution, in Matlab, for smaller values?
It appears that R's qt may use a completely different algorithm than Matlab's tinv. I think that you and others should report this deficiency to The MathWorks by filing a service request. By the way, in R2014b and R2015a, -Inf is returned instead of NaN for small values (about eps/8 and less) of the first argument, p. This is more sensible, but I think they should do better.
In the interim, there are several workarounds.
Special Cases
First, in the case of the Student's t-distribution, there are several simple analytic solutions to the inverse CDF or quantile function for certain integer parameters of ν. For your example of ν = 2:
% for v = 2
p = 1e-18;
x = (2*p-1)./sqrt(2*p.*(1-p))
which returns -7.071067811865475e+08. At a minimum, Matlab's tinv should include these special cases (they only do so for ν = 1). It would probably improve the accuracy and speed of these particular solutions as well.
Numeric Inverse
The tinv function is based on the betaincinv function. It appears that it may be this function that is responsible for the loss of precision for small values of the first argument, p. However, as suggested by the OP, one can use the CDF function, tcdf, and root-finding methods to evaluate the inverse CDF numerically. The tcdf function is based on betainc, which doesn't appear to be as sensitive. Using fzero:
p = 1e-18;
v = 2
x = fzero(#(x)tcdf(x,v)-p, 0)
This returns -7.071067811865468e+08. Note that this method is not very robust for values of p close to 1.
Symbolic Solutions
For more general cases, you can take advantage of symbolic math and variable precision arithmetic. You can use identities in terms of Gausian hypergeometric functions, 2F1, as given here for the CDF. Thus, using solve and hypergeom:
% Supposedly valid for or x^2 < v, but appears to work for your example
p = sym('1e-18');
v = sym(2);
syms x
F = 0.5+x*gamma((v+1)/2)*hypergeom([0.5 (v+1)/2],1.5,-x^2/v)/(sqrt(sym('pi')*v)*gamma(v/2));
sol_x = solve(p==F,x);
vpa(sol_x)
The tinv function is based on the betaincinv function. There is no equivalent function or even an incomplete Beta function in the Symbolic Math toolbox or MuPAD, but a similar 2F1 relation for the incomplete Beta function can be used:
p = sym('1e-18');
v = sym(2);
syms x
a = v/2;
F = 1-x^a*hypergeom([a 0.5],a+1,x)/(a*beta(a,0.5));
sol_x = solve(2*abs(p-0.5)==F,x);
sol_x = sign(p-0.5).*sqrt(v.*(1-sol_x)./sol_x);
vpa(sol_x)
Both symbolic schemes return results that agree to -707106781.186547523340184 using the default value of digits.
I've not fully validated the two symbolic methods above so I can't vouch for their correctness in all cases. The code also needs to be vectorized and will be slower than a fully numerical solution.

How can we specify a custom lambda sequence to glmnet

I am new to the glmnet package in R, and wanted to specify a lambda function based on the suggestion in a published research paper to the glmnet.cv function. The documentation suggests that we can supply a decreasing sequence of lambdas as a parameter. However, in the documentation there are no examples of how to do this.
It would be very grateful if someone can suggest how to go about doing this. Do I pass a vector of 100 odd values (default value for nlambda) to the function? What restrictions should be there for the min and max value of this vector, if any? Also, are their things to keep in mind regarding nvars, nobs etc. while specifying the vector?
Thanks in advance.
You can define a grid like this :
grid=10^seq(10,-2,length=100) ##get lambda sequence
ridge_mod=glmnet(x,y,alpha=0,lambda=grid)
This is fairly easy though it's not well explained in the original documentation ;)
In the following I've used cox family but you can change it based on your need
my_cvglmnet_fit <- cv.glmnet(x=regression_data, y=glmnet_response, family="cox", maxit = 100000)
Then you can plot the fitted object created by the cv.glmnet and in the plot you can easily see where the lambda is minimum. one of those dotted vertical lines is the minimum lambda and the other one is the 1se.
plot(my_cvglmnet_fit)
the following lines helps you see the non zero coefficients and their corresponding values:
coef(my_cvglmnet_fit, s = "lambda.min")[which(coef(my_cvglmnet_fit, s = "lambda.min") != 0)] # the non zero coefficients
colnames(regression_data)[which(coef(my_cvglmnet_fit, s = "lambda.min") != 0)] # The features that are selected
here are some links that may help:
http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html
http://blog.revolutionanalytics.com/2013/05/hastie-glmnet.html

Reusing the model from R's forecast package

I have been told that, while using R's forecast package, one can reuse a model. That is, after the code x <- c(1,2,3,4); mod <- ets(x); f <- forecast(mod,h=1) one could have append(x, 5) and predict the next value without recalculating the model. How does one do that? (As I understand, using simple exponential smoothing one would only need to know alpha, right?)
Is it like forecast(x, model=mod)? If that is the case I have to say that I am using Java and calling the forecast code programmatically (for many time series), so I dont think I could keep the model object in the R environment all the time. Would there be an easy way to keep the model object in Java and load it in R environment when needed?
You have two questions here:
A) Can the forecast package "grow" its datasets? I can't speak in great detail to this package and you will have to look at its document. However, R models in general obey a structure of
fit <- someModel(formula, data)
estfit <- predict(fit, newdata=someDataFrame)
eg you supply updated data given a fit object.
B) Can I serialize a model back and forth to Java? Yes, you can. Rserve is one object, you can also try basic serialize() to (raw) character. Or even just `save(fit, file="someFile.RData").
Regarding your first question:
x <- 1:4
mod <- ets(x)
f1 <- forecast(mod, h=1)
x <- append(x, 5)
mod <- ets(x, model=mod) # Reuses old mod without re-estimating parameters.
f2 <- forecast(mod, h=1)

Resources