One common thing I run into a lot is my model will contain matrices which have NaN values. Is there a common Flux method I can pass my matrices into and detect these NaN's? I know Julia has a built in isnan() function which could be used in some cases but I am not sure if there is a Flux specific version?
No, there is no Flux-specific function. Using any(isnan, A) is probably what you want to do in most cases. One Flux-related "enhancement" would be to use training loop callbacks to stop training if NaNs are detected.
# assumes (x, y) is your training data
# and loss(x, y, mode) will compute the loss of model on (x, y)
cb = () -> isnan(loss(x, y, model)) && Flux.stop()
# basic train loop
# assuming opt is your optimizer
Flux.train!((x, y) -> loss(x, y, model), params(model), [(x, y)], opt; cb = cb)
The above example is the basic idea, and you can extended to checking different arrays for NaN. For example, you could do
cb = () -> any(params(m)) do p
any(isnan, p)
end && Flux.stop()
to check if any parameter is NaN.
Related
I have derived a survival function for a system of components (ignore the details of how this system is setup) and I am trying to maximize its expected, or more specifically, maximizing the expected value of the function:
surv_func = function(x,mu) = {(exp(-(x/(mu))^(1/3))*((1-exp(-(4/3)*x^(3/2)))+exp(-(-(4/3)*x^(3/2)))))*exp(-(x/(3-mu))^(1/3))}
and I am supposed (since the pdf including my tasks gives a hint about it) to use the function
optimize()
and the expected value for a function can be computed with
# Computes expected value of a the function "function"
E <- integrate(function, 0, Inf)
but my function depends on x and mu. The expected value could (obviously) be computed if the integral had no mu but instead only depended on x. For those interested, the mu comes from the fact that one of the components has a Weibull-distribution with parameters (1/3,mu) and the 3-mu comes from that has a Weibull-distribution with parameters (1/3,lambda). In the task there is a constraint mu + lambda = 3, so I tought substituting the lambda-parameter in the second Weibull-distribution with lambda = 3 - mu and trying to maximize this problem would yield not only mu, but also lambda.
If I try to, just for the sake of learing about R, compute the expected value using the code below (in the console window), it just gives me the following:
> E <- integrate(surv_func,0,Inf)
Error in (function (x, mu) : argument "mu" is missing, with no default
I am new to R and seem to be a little bit "slow" at learning. How can I approach this problem?
I am trying to run a Bayesian clustering model where the number of clusters is random with binomial distribution.
This is my Jags model:
model{
for(i in 1:n){
y[ i ,1:M] ~ dmnorm( mu[z[i] , 1:M] , I[1:M, 1:M])
z[i] ~ dcat(omega[1:M])
}
for(j in 1:M){
mu[j,1:M] ~ dmnorm( mu_input[j,1:M] , I[1:M, 1:M] )
}
M ~ dbin(p, Mmax)
omega ~ ddirich(rep(1,Mmax))
}
to run it, we need to define the parameters anche the initial values for the variables, which is done in this R script
Mmax=10
y = matrix(0,100,Mmax)
I = diag(Mmax)
y[1:50,] = mvrnorm(50, rep(0,Mmax), I)
y[51:100,] = mvrnorm(50, rep(5,Mmax), I)
plot(y[,1:2])
z = 1*((1:100)>50) + 1
n = dim(y)[1]
M=2
mu=matrix(rnorm(Mmax^2),nrow=Mmax)
mu_input=matrix(2.5,Mmax,Mmax) ### prior mean
p=0.5
omega=rep(1,Mmax)/Mmax
data = list(y = y, I = I, n = n, mu_input=mu_input, Mmax = Mmax, p = p)
inits = function() {list(mu=mu,
M=M,
omega = omega) }
require(rjags)
modelRegress=jags.model("cluster_variabile.txt",data=data,inits=inits,n.adapt=1000,n.chains=1)
however, running the last command, one gets
Error in jags.model("cluster_variabile.txt", data = data, inits = inits,
: RUNTIME ERROR: Compilation error on line 6.
Unknown variable M Either supply values
for this variable with the data or define it on the left hand side of a relation.
which for me makes no sense, since the error is at line 6 even if M already appears at line 4 of the model! What is the actual problem in running this script?
So JAGS is not like R or other programming procedural languages in that it doesn't actually run line by line, it is a declarative language meaning the order of commands doesn't actually matter at least in terms of how the errors pop up. So just because it didn't throw an error on line 4 doesn't mean something isn't also wrong there. Im not positive, but I believe the error is occuring because JAGS tries to build the array first before inputting values, so M is not actually defined at this stage, but nothing you can do about that on your end.
With that aside, there should be a fairly easy work around for this, it is just less efficient. Instead of looping from 1:M make the loop iterate from 1:MMax that way the dimensions don't actually change, it is always an MMax x MMax. Then line 7 just assigns 1:M of those positions to a value. The downside of this is that it will require you to do some processing after the model is fit. So on each iteration, you will need to pull the sampled M and filter the matrix mu to be M x M, but that shouldn't be too tough. Let me know if you need more help.
So, I think the main problem is that you can't change the dimensionality of the stochastic node you're updating. This seems like a problem for reversible jump MCMC, though I don't think you can do this in JAGS.
I am new to neural networks and the mxnet package in R. I want to do a logistic regression on my predictors since my observations are probabilities varying between 0 and 1. I'd like to weight my observations by a vector obsWeights I have, but I'm not sure where to implement the weights. There seems to be a weight= option in mx.symbol.FullyConnected but if I try weight=obsWeights I get the following error message
Error in mx.varg.symbol.FullyConnected(list(...)) :
Cannot find argument 'weight', Possible Arguments:
----------------
num_hidden : int, required
Number of hidden nodes of the output.
no_bias : boolean, optional, default=False
Whether to disable bias parameter.
How should I proceed to weight my observations? Here is my code at the moment.
# Prepare data
train.mm = model.matrix(obs ~ . , data = train_data)
train_label = train_data$obs
# Normalize
train.mm = apply(train.mm, 2, function(x) (x-min(x))/(max(x)-min(x)))
# Create MXDataIter compatible iterator
batch_size = 128
train.iter = mx.io.arrayiter(data=t(train.mm), label=train_label,
batch.size=batch_size, shuffle=T)
# Symbolic model definition
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data=data, num.hidden=128, name='fc1')
act1 = mx.symbol.Activation(data=fc1, act.type='relu', name='act1')
final = mx.symbol.FullyConnected(data=act1, num.hidden=1, name='final')
logistic = mx.symbol.LogisticRegressionOutput(data=final, name='logistic')
# Run model
mxnet_train = mx.model.FeedForward.create(
symbol = logistic,
X = train.iter,
initializer = mx.init.Xavier(rnd_type = 'gaussian', factor_type = 'avg', magnitude = 2),
num.round = 25)
Assigning the fully connected weight argument is not what you want to do at any rate. That weight is a reference to parameters of the layer; i.e., what you multiply in the inputs by to get output values These are the parameter values you're trying to learn.
If you want to make some samples matter more than others, then you'll need to adjust the loss function. For example, multiply the usual loss function by your weights so that they do not contribute as much to the overall average loss.
I do not believe the standard Mxnet loss functions have a spot for assigning weights (that is LogisticRegressionOutput won't cover this). However, you can make your own cost function that does. This would involve passing your final layer through a sigmoid activation function to first generate the usual logistic regression output value. Then pass that into the loss function you define. You could do squared error, but for logistic regression you'll probably want to use the cross entropy function:
l * log(y) + (1 - l) * log(1 - y),
where l is the label and y is the predicted value.
Ideally, you'd write a symbol with an efficient definition of the gradient (Mxnet has a cross entropy function, but its for softmax input, not a binary output. You could translate your output to two outputs with softmax as an alternative, but that seems less easy to work with in this case), but the easiest path would be to let Mxnet do its autodiff on it. Then you multiply that cross entropy loss by the weights.
I haven't tested this code, but you'd ultimately have something like this (this is what you'd do in python, should be similar in R):
label = mx.sym.Variable('label')
out = mx.sym.Activation(data=final, act_type='sigmoid')
ce = label * mx.sym.log(out) + (1 - label) * mx.sym.log(1 - out)
weights = mx.sym.Variable('weights')
loss = mx.sym.MakeLoss(weigths * ce, normalization='batch')
Then you want to input your weight vector into the weights Variable along with your normal input data and labels.
As an added tip, the output of an mxnet network with a custom loss via MakeLoss outputs the loss, not the prediction. You'll probably want both in practice, in which case its useful to group the loss with a gradient-blocked version of the prediction so that you can get both. You'd do that like this:
pred_loss = mx.sym.Group([mx.sym.BlockGrad(out), loss])
I was developing a new algorithm that generates a modified kernel matrix for training with a SVM and encountered a strange problem.
For testing purposes I was comparing the SVM models learned using kernelMatrix interface and normal kernel interface. For example,
# Model with kernelMatrix computation within ksvm
svp1 <- ksvm(x, y, type="C-svc", kernel=vanilladot(), scaled=F)
# Model with kernelMatrix computed outside ksvm
K <- kernelMatrix(vanilladot(), x)
svp2 <- ksvm(K, y, type="C-svc")
identical(nSV(svp1), nSV(svp2))
Note that I have turned scaling off, as I am not sure how to perform scaling on kernel matrix.
From my understanding both svp1 and svp2 should return the same model. However I observed that this not true for a few datasets, for example glass0 from KEEL.
What am I missing here?
I think this has to do with same issue posted here. kernlab appears to treat the calculation of ksvm differently when explicitly using vanilladot() because it's class is 'vanillakernel' instead of 'kernel'.
if you define your own vanilladot kernel with a class of 'kernel' instead of 'vanillakernel' the code will be equivalent for both:
kfunction.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "kernel"
k}
l<-0.1 ; C<-1/(2*l)
svp1 <- ksvm(x, y, type="C-svc", kernel=kfunction.k(), scaled=F)
K <- kernelMatrix(kfunction.k(),x)
svp2 <- ksvm(K, y, type="C-svc", kernel='matrix', scaled=F)
identical(nSV(svp1), nSV(svp2))
It's worth noting that svp1 and svp2 are both different from their values in the original code because of this change.
First thing's first; my skills in R are somewhat lacking, so there is a chance I may be using something incorrectly in the following. If I go wrong somewhere, please let me know.
I've been having a problem in Rstudio where I try to create 2 functions for formulae, then use nls() to create a model using those, with which I will make a plot. When I try to run the line for creating it, I get an error message saying an object is missing. It is always the last object in the function of the first "formula", in this case, 'p'.
I'll provide my code here then explain what I am trying to do for a little context;
DATA <- read.csv(file.choose(), as.is=T)
formula <- function(m, h, g, p){(2*m)/(m+(sqrt(m^2+1)))*p*g*(h^2/2)}
formula.2 <- function(P, V, g){P*V*g}
m = 0.85
p = 766.42
g = 9.81
P = 0.962
h = DATA$lithothick
V = DATA$Vol
fit.1 <- nls(formula (P, V, g) ~ formula(m, h, g, p), data = DATA)
If I run it how it is shown, I get the error;
Error in (2 * m)/(m + (sqrt(m^2 + 1))) * p : 'p' is missing
However it will show h if I rearrange the objects in the formula to (m,g,p,h)
Error in h^2 : 'h' is missing
Now, what I'm trying to do is this; I have a .csv file with 3 thicknesses (0.002, 0.004, 0.006 meters) and 3 volumes (10, 25, 50 milliliters). I am trying to see how the rates of strength and buoyancy increase (in relation to each other) as the thickness and volume for each object (respectively) increases. I was hoping to come out with a graph showing the upward trend for each property (strength and buoyancy), as I believe them to be unequal (one exponential the other linear). I hope that isn't more confusing than clarifying, but any pointers would be GREATLY appreciated.
You cannot overload functions this way in R, what you can do is provide optional arguments (which is a kind of overload) with syntax function(mandatory, optionnal="")
For what you are trying to do, you have to use formula.2 if you want to use the 3-arguments formula.
A workaround could be to use one function with one optionnal argument and check if this argument has been used. Something like :
formula = function(m, h, g, p="") {
if (is.numeric(p)) {
(2*m)/(m+(sqrt(m^2+1)))*p*g*(h^2/2)
} else {
m*h*g
}
}
This is ugly and a very bad way to do it (your variables do not really mean the same thing from one call to the other) but it works.