I have not got a clear idea about how labels for the softmax classifier should be shaped.
What I could understand from my experiments is that a scalar laber indicating the index of class probability output is one option, while another is a 2D label where the rows are class probabilities, or one-hot encoded variable, like c(1, 0, 0).
What puzzles me though is that:
I can use sclalar label values that go beyong indexing, like 4 in my
example below -- without warning or error. Why is that?
When my label is a negative scalar or an array with a negative value,
the model converges to uniform probablity distribution over classes.
For example, is this expected that actor_train.y = matrix(c(0, -1,v0), ncol = 1) results in equal probabilities in the softmax output?
I try to use softmax MXNET classifier to produce the policy gradient
reifnrocement learning, and my negative rewards lead to the issue
above: uniform probability. Is that expected?
require(mxnet)
actor_initializer <- mx.init.Xavier(rnd_type = "gaussian",
factor_type = "avg",
magnitude = 0.0001)
actor_nn_data <- mx.symbol.Variable('data') actor_nn_label <- mx.symbol.Variable('label')
device.cpu <- mx.cpu()
NN architecture
actor_fc3 <- mx.symbol.FullyConnected(
data = actor_nn_data
, num_hidden = 3 )
actor_output <- mx.symbol.SoftmaxOutput(
data = actor_fc3
, label = actor_nn_label
, name = 'actor' )
crossentfunc <- function(label, pred)
{
- sum(label * log(pred)) }
actor_loss <- mx.metric.custom(
feval = crossentfunc
, name = "log-loss"
)
initialize NN
actor_train.x <- matrix(rnorm(11), nrow = 1)
actor_train.y = 0 #1 #2 #3 #-3 # matrix(c(0, 0, -1), ncol = 1)
rm(actor_model)
actor_model <- mx.model.FeedForward.create(
symbol = actor_output,
X = actor_train.x,
y = actor_train.y,
ctx = device.cpu,
num.round = 100,
array.batch.size = 1,
optimizer = 'adam',
eval.metric = actor_loss,
clip_gradient = 1,
wd = 0.01,
initializer = actor_initializer,
array.layout = "rowmajor" )
predict(actor_model, actor_train.x, array.layout = "rowmajor")
It is quite strange to me, but I found a solution.
I changed optimizer from optimizer = 'adam' to optimizer = 'rmsprop', and the NN started to converge as expected in case of negative targets. I made simulations in R using a simple NN and optim function to get the same result.
Looks like adam or SGD may be buggy or whatever in case of multinomial classification... I also used to get stuck at the fact those optimizers did not converge to a perfect solution on just 1 example, while rmsprop does! Be aware!
I'm trying to fit an ODE model to some data and solve for the values of the parameters in the model.
I know there is a package called FME in R which is designed to solve this kind of problem. However, when I tried to write the code like the manual of this package, the program could not run with the following traceback information:
Error in lsoda(y, times, func, parms, ...) : illegal input detected before taking any integration steps - see written message
The code is the following:
x <- c(0.1257,0.2586,0.5091,0.7826,1.311,1.8636,2.7898,3.8773)
y <- c(11.3573,13.0092,15.1907,17.6093,19.7197,22.4207,24.3998,26.2158)
time <- 0:7
# Initial Values of the Parameters
parms <- c(r = 4, b11 = 1, b12 = 0.2, a111 = 0.5, a112 = 0.1, a122 = 0.1)
# Definition of the Derivative Functions
# Parameters in pars; Initial Values in y
derivs <- function(time, y, pars){
with(as.list(c(pars, y)),{
dx <- r + b11*x + b12*y - a111*x^2 - a122*y^2 - a112*x*y
dy <- r + b12*x + b11*y - a122*x^2 - a111*y^2 - a112*x*y
list(c(dx,dy))
})
}
initial <- c(x = x[1], y = y[1])
data <- data.frame(time = time, x = x, y = y)
# Cost Computation, the Object Function to be Minimized
model_cost <- function(pars){
out <- ode(y = initial, time = time, func = derivs, parms = pars)
cost <- modCost(model = out, obs = data, x = "time")
return(cost)
}
# Model Fitting
model_fit <- modFit(f = model_cost, p = parms, lower = c(-Inf,rep(0,5)))
Is there anyone that knows how to use the FME package and fix the problem?
Your code-syntax is right and it works until the last line.
you can check your code with
model_cost(parms)
This works fine and you can see with
model_cost(parms)$model
that your "initial guess" is far away from the observed data (compare "obs" and "mod"). Perhaps here is a failure so that the fitting procedure will not reach the observed data.
So much for the while ... I also checked different methods with parameter "methods = ..." but still does not work.
Best wishes,
Johannes
edit: If you use:
model_fit <- modFit(f = model_cost, p = parms)
without any lower bounds, then you will get an result (even there are warnings), but then a112 is negative what you wanted to omit.
I am using nnet for the first time, played with the basic examples found on the web, but cannot make out its output with a dummy toy data set. That a simple discrimination of two classes (signal and background) using 2 variables normally distributed.
The following code can be copy&paste in R (version 3.0):
library(nnet)
## Signal
xs = rnorm( mean=0, sd=1, n=10000)
ys = rnorm( mean=1, sd=1, n=10000)
typs = rep( x=1, n=10000 )
sig = data.frame( typs, xs, ys )
colnames(sig) = c("z","x","y")
sig_train = sig[c(1:5000),]
sig_test = sig[c(5001:10000),]
## Background
xb = rnorm( mean=1, sd=1, n=10000)
yb = rnorm( mean=0, sd=1, n=10000)
typb = rep( x=-1, n=10000 )
bkg = data.frame( typb, xb, yb )
colnames(bkg) = c("z","x","y")
bkg_train = bkg[c(1:5000),]
bkg_test = bkg[c(5001:10000),]
## Training
trainData = rbind( sig_train, bkg_train )
nnRes = nnet( z ~ ., trainData, size = 2, rang = 0.5, maxit = 100)
print(nnRes)
## Testing
sigNNPred = predict(nnRes, sig_test )
bkgNNPred = predict(nnRes, bkg_test )
When looking at sigNNPred I have only zero's!
So either the configuration of my NN is not performant, or I am looking at the wrong thing.
Any hint is welcome.
Thanks in advance,
Xavier
There is a misconception about the target values (in your case, the column 'z'). If you want to do classification, you either have to convert the target column to a factor or you have to use 0/1 instead of -1/1. Otherwise, the -1 values are far outside the possible range of the activation function (unless you use linout=TRUE, which makes little sense for classification).
I tried your code with z being a factor and, as suggested by Fernando earlier, type='class' when calling predict: works nicely now, though your two classes overlap way too much to allow for a decent classification accuracy.
Cheers, UBod
I need to fit this formula
y ~ 1/(pi*a*(1+((x-2.15646)/a)^2))+1/(pi*b*(1+((x-2.16355)/b)^2))
to the variables x and y
x<- c(2.15011, 2.15035, 2.15060, 2.15084, 2.15109, 2.15133, 2.15157, 2.15182, 2.15206, 2.15231, 2.15255, 2.15280, 2.15304, 2.15329, 2.15353, 2.15377, 2.15402, 2.15426, 2.15451, 2.15475, 2.15500, 2.15524, 2.15549, 2.15573, 2.15597, 2.15622, 2.15646, 2.15671, 2.15695, 2.15720, 2.15744, 2.15769, 2.15793, 2.15817, 2.15842, 2.15866, 2.15891, 2.15915, 2.15940, 2.15964, 2.15989, 2.16013, 2.16037, 2.16062, 2.16086, 2.16111, 2.16135, 2.16160, 2.16184, 2.16209, 2.16233, 2.16257, 2.16282, 2.16306, 2.16331, 2.16355, 2.16380, 2.16404, 2.16429, 2.16453, 2.16477, 2.16502, 2.16526, 2.16551, 2.16575, 2.16600, 2.16624, 2.16649, 2.16673, 2.16697, 2.16722, 2.16746, 2.16771, 2.16795, 2.16820, 2.16844, 2.16869, 2.16893, 2.16917, 2.16942, 2.16966, 2.16991)
y<- c(3.77212, 3.79541, 3.84574, 3.91918, 4.01056, 4.11677, 4.23851, 4.37986, 4.54638, 4.74367, 4.97765, 5.25593, 5.58823, 5.98405, 6.44850, 6.98006, 7.57280, 8.22085, 8.92094, 9.66990, 10.45900, 11.26720, 12.05540, 12.76920, 13.34830, 13.74250, 13.92420, 13.89250, 13.67090, 13.29980, 12.82780, 12.30370, 11.76950, 11.25890, 10.80020, 10.41860, 10.13840, 9.98005, 9.95758, 10.07690, 10.33680, 10.73210, 11.25730, 11.90670, 12.67240, 13.54110, 14.49530, 15.51670, 16.58660, 17.67900, 18.75190, 19.74600, 20.59680, 21.24910, 21.66800, 21.83910, 21.76560, 21.46020, 20.94020, 20.22730, 19.35360, 18.36460, 17.31730, 16.26920, 15.26920, 14.35320, 13.54360, 12.85230, 12.28520, 11.84690, 11.54040, 11.36610, 11.32130, 11.39980, 11.59230, 11.88310, 12.25040, 12.66660, 13.09810, 13.50220, 13.82580, 14.01250)
for estiming 'a' and 'b' values according to x and y. 'a' and 'b' are in the range between 0 and 1.
However, when I used the nls command:
nls(y ~1/(pi*a*(1+((x-2.15646)/a)^2))+1/(pi*b*(1+((x-2.16355)/b)^2)), control = list(maxiter = 500), start=list(a=0.4,b=0.4))
The console reported the following error:
singular gradient
Can anyone explain to me why does the console print this message?
This gives a better fit:
Before getting into the code (below), there are several issues with your model:
Assuming this is proton NMR, the area under the peaks is proportional to the proton abundance (so, number of protons). Your model does not allow for this, essentially forcing all peaks to have the same area. This is the main reason for the poor fit. We can accommodate this easily by including a "height" factor for each peak.
Your model assumes the peak positions. Why not just let the algorithm find the true peak positions?
Your model does not account for baseline drift, which as you can see is quite severe in your dataset. We can accommodate this by adding a linear drift function to the model.
nls(...) is poor for this type of modeling - the algorithms it uses are not especially robust. The default algorithm, Gauss-Newton, is especially poor when fitting offsetted data. So estimating p1 and p2 in a model with f(x-p1,x-p2) nearly always fails.
A better approach is to use the exceptionally robust Levenberg-Marquardt algorithm implemented in nls.lm(...) in package minpack. This package is a bit harder to use but it is capable of dealing with problems inaccessible with nls(...). If you're going to do a lot of this, you should read the documentation to understand how this example works.
Finally, even with nls.lm(...) the starting points have to be reasonable. In your model a and b are the peak widths. Clearly they must be comparable to or smaller than the difference in peak positions or the peaks will get smeared together. Your estimates of (a,b) = (0.4, 0.4) were much too large.
plot(x,y)
library(minpack.lm)
lorentzian <- function(par,x){
a <- par[1]
b <- par[2]
p1 <- par[3]
p2 <- par[4]
h1 <- par[5]
h2 <- par[6]
drift.a <- par[7]
drift.b <- par[8]
h1/(pi*a*(1+((x-p1)/a)^2))+h2/(pi*b*(1+((x-p2)/b)^2)) + drift.a + drift.b*x
}
resid <- function(par,obs,xx) {obs-lorentzian(par,xx)}
par=c(a=0.001,b=0.001, p1=2.157, p2=2.163, h1=1, h2=1, drift.a=0, drift.b=0)
lower=c(a=0,b=0,p1=0,p2=0,h1=0, h2=0,drift.a=NA,drift.b=NA)
nls.out <- nls.lm(par=par, lower=lower, fn=resid, obs=y, xx=x,
control = nls.lm.control(maxiter=500))
coef(nls.out)
# a b p1 p2 h1 h2 drift.a drift.b
# 1.679632e-03 1.879690e-03 2.156308e+00 2.163500e+00 4.318793e-02 8.199394e-02 -9.273083e+02 4.323897e+02
lines(x,lorentzian(coef(nls.out), x), col=2, lwd=2)
One last thing: the convention on SO is to wait a day before "accepting" an answer. The reason is that questions with accepted answers rarely get additional attention - once you accept an answer, no one else will look at it.
Try using the reciprocals of a and b:
fm<-nls(y~1/(pi*(1/a)*(1+((x-2.15646)/(1/a))^2))+1/(pi*(1/b)*(1+((x-2.16355)/(1/b))^2)),
lower = 1, alg = "port",
control = list(maxiter = 500),
start = list(a = 1/.4, b = 1/.4))
which gives:
> 1/coef(fm)
a b
1.00000000 0.02366843
Unfortunately the model does not work very well as the graph at the bottom shows.
plot(y ~ x, pch = 20)
lines(x, fitted(fm), col = "red")
ADDED:
In his answer, #jlhoward provided a better fitting model based on 8 parameters. I will just point out that if we used his model with his starting values for a, b, p1 and p2 (we don't need starting values for the linear parameters if we specify alg = "plinear") then nls would work too.
fo <- y ~ cbind(1/(pi*a*(1+((x-p1)/a)^2)), 1/(pi*b*(1+((x-p2)/b)^2)), 1, x)
start <- c(a = 0.001, b = 0.001, p1 = 2.157, p2 = 2.163)
fm2 <- nls(fo, start = start, alg = "plinear")
giving:
> coef(fm2)
a b p1 p2 .lin1
1.679635e-03 1.879682e-03 2.156308e+00 2.163500e+00 4.318798e-02
.lin2 .lin3 .lin.x
8.199364e-02 -9.273104e+02 4.323907e+02
Graph showing poor fit for fm:
REVISED to add constraints.
Assume A follows Exponential distribution; B follows Gamma distribution
How to plot the PDF of 0.5*(A+B)
This is fairly straight forward using the "distr" package:
library(distr)
A <- Exp(rate=3)
B <- Gammad(shape=2, scale=3)
conv <- 0.5*(A+B)
plot(conv)
plot(conv, to.draw.arg=1)
Edit by JD Long
Resulting plot looks like this:
If you're just looking for fast graph I usually do the quick and dirty simulation approach. I do some draws, slam a Gaussian density on the draws and plot that bad boy:
numDraws <- 1e6
gammaDraws <- rgamma(numDraws, 2)
expDraws <- rexp(numDraws)
combined <- .5 * (gammaDraws + expDraws)
plot(density(combined))
output should look a little like this:
Here is an attempt at doing the convolution (which #Jim Lewis refers to) in R. Note that there are probably much more efficient ways of doing this.
lower <- 0
upper <- 20
t <- seq(lower,upper,0.01)
fA <- dexp(t, rate = 0.4)
fB <- dgamma(t,shape = 8, rate = 2)
## C has the same distribution as (A + B)/2
dC <- function(x, lower, upper, exp.rate, gamma.rate, gamma.shape){
integrand <- function(Y, X, exp.rate, gamma.rate, gamma.shape){
dexp(Y, rate = exp.rate)*dgamma(2*X-Y, rate = gamma.rate, shape = gamma.shape)*2
}
out <- NULL
for(ix in seq_along(x)){
out[ix] <-
integrate(integrand, lower = lower, upper = upper,
X = x[ix], exp.rate = exp.rate,
gamma.rate = gamma.rate, gamma.shape = gamma.shape)$value
}
return(out)
}
fC <- dC(t, lower=lower, upper=upper, exp.rate=0.4, gamma.rate=2, gamma.shape=8)
## plot the resulting distribution
plot(t,fA,
ylim = range(fA,fB,na.rm=TRUE,finite = TRUE),
xlab = 'x',ylab = 'f(x)',type = 'l')
lines(t,fB,lty = 2)
lines(t,fC,lty = 3)
legend('topright', c('A ~ exp(0.4)','B ~ gamma(8,2)', 'C ~ (A+B)/2'),lty = 1:3)
I'm not an R programmer, but it might be helpful to know that for independent random variables with PDFs f1(x) and f2(x), the PDF
of the sum of the two variables is given by the convolution f1 * f2 (x) of the two input PDFs.