why feedforward Neural network does not generalized? - r

I made a simple NN predict x from Sin(x). It failed. The NN was successful in predicting sin(x) form x but could not predict x from Sin(x). in both cases(sin(x) and arcsin(x)) we have a non-linear mapping and NN is supposed to be able to fit any function. so, my question is why the NN failed? is this a case of underfitting? can I figure out at which point in the training process the divergence happens?
set.seed(1234567890)
Var3 <- runif(500, 0, 20)
mydata3 <- data.frame(Sin=sin(Var3),Var=Var3)
set.seed(1234567890)
winit <- runif(5500, -1, 1)
#hidUnit <- c(9,1)
set.seed(1234567890)
nn3 <-neuralnet(formula = Var~Sin,data = mydata3,
hidden =c(4,2,1),startweights =winit,
learningrate = 0.01,act.fct = "tanh")
plot(mydata3, cex=2,main='Predicting x from Sin(x)',
pch = 21,bg="darkgrey",
ylab="X",xlab="Sin(X)")
points(mydata3[,1],predict(nn3,mydata3), col="darkred",
cex=1,pch=21,bg="red")
legend("bottomleft", legend=c("true","predicted"), pch=c(21,21),
col = c("darkgrey","red"),cex = 0.65,bty = "n")

You are trying to predict infinitely many x values from one sin(x) value. Think about it, it's not a function that you are trying to predict. A function maps every x value to exactly one y value. In your case, there are theoretically infinitely many values that x can take on for every sin(x) you feed into the function.
The domain of arcsin(x) is only from -1 to 1 and the range is from -pi/2 to pi/2 radians (not from 0 to 20).
Perhaps constraining your x values to -pi/2 to pi/2 would work.

Related

Determine what is the break point for the slope change in R [migrated]

I'm trying to implement a "change point" analysis, or a multiphase regression using nls() in R.
Here's some fake data I've made. The formula I want to use to fit the data is:
$y = \beta_0 + \beta_1x + \beta_2\max(0,x-\delta)$
What this is supposed to do is fit the data up to a certain point with a certain intercept and slope ($\beta_0$ and $\beta_1$), then, after a certain x value ($\delta$), augment the slope by $\beta_2$. That's what the whole max thing is about. Before the $\delta$ point, it'll equal 0, and $\beta_2$ will be zeroed out.
So, here's my function to do this:
changePoint <- function(x, b0, slope1, slope2, delta){
b0 + (x*slope1) + (max(0, x-delta) * slope2)
}
And I try to fit the model this way
nls(y ~ changePoint(x, b0, slope1, slope2, delta),
data = data,
start = c(b0 = 50, slope1 = 0, slope2 = 2, delta = 48))
I chose those starting parameters, because I know those are the starting parameters, because I made the data up.
However, I get this error:
Error in nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
Have I just made unfortunate data? I tried fitting this on real data first, and was getting the same error, and I just figured that my initial starting parameters weren't good enough.
(At first I thought it could be a problem resulting from the fact that max is not vectorized, but that's not true. It does make it a pain to work with changePoint, wherefore the following modification:
changePoint <- function(x, b0, slope1, slope2, delta) {
b0 + (x*slope1) + (sapply(x-delta, function (t) max(0, t)) * slope2)
}
This R-help mailing list post describes one way in which this error may result: the rhs of the formula is overparameterized, such that changing two parameters in tandem gives the same fit to the data. I can't see how that is true of your model, but maybe it is.
In any case, you can write your own objective function and minimize it. The following function gives the squared error for data points (x,y) and a certain value of the parameters (the weird argument structure of the function is to account for how optim works):
sqerror <- function (par, x, y) {
sum((y - changePoint(x, par[1], par[2], par[3], par[4]))^2)
}
Then we say:
optim(par = c(50, 0, 2, 48), fn = sqerror, x = x, y = data)
And see:
$par
[1] 54.53436800 -0.09283594 2.07356459 48.00000006
Note that for my fake data (x <- 40:60; data <- changePoint(x, 50, 0, 2, 48) + rnorm(21, 0, 0.5)) there are lots of local maxima depending on the initial parameter values you give. I suppose if you wanted to take this seriously you'd call the optimizer many times with random initial parameters and examine the distribution of results.
Just wanted to add that you can do this with many other packages. If you want to get an estimate of uncertainty around the change point (something nls cannot do), try the mcp package.
# Simulate the data
df = data.frame(x = 1:100)
df$y = c(rnorm(20, 50, 5), rnorm(80, 50 + 1.5*(df$x[21:100] - 20), 5))
# Fit the model
model = list(
y ~ 1, # Intercept
~ 0 + x # Joined slope
)
library(mcp)
fit = mcp(model, df)
Let's plot it with a prediction interval (green line). The blue density is the posterior distribution for the change point location:
# Plot it
plot(fit, q_predict = T)
You can inspect individual parameters in more detail using plot_pars(fit) and summary(fit).

Numerically solving Lotka-Volterra ODE in R

Disclaimer: Cross-post on Stack Computational Science
Aim: I am trying to numerically solve a Lotka-Volterra ODE in R, using de sde.sim() function in the sde package. I would like to use the sde.sim() function in order to eventually transform this system into an SDE. So initially, I started with an simple ODE system (Lotka Volterra model) without a noise term.
The Lotka-Volterra ODE system:
with initial values for x = 10 and y = 10.
The parameter values for alpha, beta, delta and gamma are 1.1, 0.4, 0.1 and 0.4 respectively (mimicking this example).
Attempt to solve problem:
library(sde)
d <- expression((1.1 * x[0] - 0.4 * x[0] * x[1]), (0.1 * x[0] * x[1] - 0.4 * x[1]))
s <- expression(0, 0)
X <- sde.sim(X0=c(10,10), T = 10, drift=d, sigma=s)
plot(X)
However, this does not seem to generate a nice cyclic behavior of the predator and prey population.
Expected Output
I used the deSolve package in R to generate the expected output.
library(deSolve)
alpha <-1.1
beta <- 0.4
gamma <- 0.1
delta <- 0.4
yini <- c(X = 10, Y = 10)
Lot_Vol <- function (t, y, parms) {
with(as.list(y), {
dX <- alpha * X - beta * X * Y
dY <- 0.1 * X * Y - 0.4 * Y
list(c(dX, dY))
}) }
times <- seq(from = 0, to = 100, by = 0.01)
out <- ode(y = yini, times = times, func = Lot_Vol, parms = NULL)
plot(y=out[, "X"], x = out[, "time"], type = 'l', col = "blue", xlab = "Time", ylab = "Animals (#)")
lines(y=out[, "Y"], x = out[, "time"], type = 'l', col = "red")
Question
I think something might be wrong the the drift function, however, I am not sure what. What is going wrong in the attempt to solve this system of ODEs in sde.sim()?
Assuming that not specifying a method takes the first in the list, and that all other non-specified parameters take default values, you are performing the Euler method with step size h=0.1.
As is known on a function that has convex concentric trajectories, the Euler method will produce an outward spiral. As a first order method, the error should grow to size about T*h=10*0.1=1. Or if one wants to take the more pessimistic estimate, the error has size (exp(LT)-1)*h/L, with L=3 in some adapted norm this gives a scale of 3.5e11.
Exploring the actual error e(t)=c(t)*h of the Euler method, one gets the following plots. Left are the errors of the components and right the trajectories for various step sizes in the Euler method. The error coefficient the function c(t) in the left plots is scaled down by the factor (exp(L*t)-1)/L to get comparable values over large time intervals, the value L=0.06 gave best balance.
One can see that the actual error
abs(e(t))<30*h*(exp(L*t)-1)/L
is in-between the linear and exponential error models, but closer to the linear one.
To reduce the error, you have to decrease the step size. In the call of SDE.sim, this is achieved by setting the parameter N=5000 or larger to get a step size h=10/5000=0.002 so that you can hope to be correct in the first two digits with an error bound of 30*h*T=0.6. In the SDE case you accumulate Gaussian noise of size sqrt(h) in every step, so that the truncation error of O(h^2) is a rather small perturbation of the random number.

Plotting different linear functions

I am trying to optimize a linear function by using a gradient descent method.
At the end of my algorithm, I end up with a vector of a coefficients and b coefficients of the same dimensions which are different from a and b that were calculated by my algorithm.
For each combination of a and b, I would like to plot a linear function y = a*x + b knowing that I generated x and y.
The own is to have all the representations of the intermediate linear functions that were calculated through the algorithm. At the end I want to add the linear regression obtained by lm() to demonstrate how well the method can optimize the a and b coefficients.
It should look like this: linear functions obtained thanks to the different a and b coefficient calculated with the algorithm method
This is the code that I wrote for plotting the different linear functions:
#a and b obtained with algorithm
h = function (a,b,x) a * x + b
data = matrix(c(a,b,x), ncol = 3, nrow = 358)
# 358 is the length of the vectors
i = 1
for (i in length(a)){
plot(h(a[i,1],x[i,3],b[i,2]))
i = i+1
}
One of the problem that annoys me is that I am not sure that I can superimpose the linear functions without using the plot and the points functions.
The second one is that I am not sure that I can plot a linear function if I give the a and b coefficient ?
Would you have a better idea ?
The function abline will add straight lines to a plot. It can also be used to plot a line straight from a regression.
You don't give any sample data (next time, include sample data in your question!), but it would look something like this:
set.seed(47)
x = runif(50) - 0.5
y = 4 * x + 1 + rnorm(50)
a_values = seq(0, 1, length.out = 10)
b_values = seq(0, 4, length.out = 10)
plot(x, y)
for (i in seq_along(a_values)) {
abline(a = a_values[i], b = b_values[i], col = "dodgerblue2")
}
abline(lm(y ~ x), lwd = 2)

How to run monte carlo simulation from a custom distribution in R

I would like to pull 1000 samples from a custom distribution in R
I have the following custom distribution
library(gamlss)
mu <- 1
sigma <- 2
tau <- 3
kappa <- 3
rate <- 1
Rmax <- 20
x <- seq(1, 2e1, 0.01)
points <- Rmax * dexGAUS(x, mu = mu, sigma = sigma, nu = tau) * pgamma(x, shape = kappa, rate = rate)
plot(points ~ x)
How can I randomly sample via Monte Carlo simulation from this distribution?
My first attempt was the following code which produced a histogram shape I did not expect.
hist(sample(points, 1000), breaks = 51)
This is not what I was looking for as it does not follow the same distribution as the pdf.
If you want a Monte Carlo simulation, you'll need to sample from the distribution a large number of times, not take a large sample one time.
Your object, points, has values that increases as the index increases to a threshold around 400, levels off, and then decreases. That's what plot(points ~ x) shows. It may describe a distribution, but the actual distribution of values in points is different. That shows how often values are within a certain range. You'll notice your x axis for the histogram is similar to the y axis for the plot(points ~ x) plot. The actual distribution of values in the points object is easy enough to see, and it is similar to what you're seeing when sampling 1000 values at random, without replacement from an object with 1900 values in it. Here's the distribution of values in points (no simulation required):
hist(points, 100)
I used 100 breaks on purpose so you could see some of the fine details.
Notice the little bump in the tail at the top, that you may not be expecting if you want the histogram to look like the plot of the values vs. the index (or some increasing x). That means that there are more values in points that are around 2 then there are around 1. See if you can look at how the curve of plot(points ~ x) flattens when the value is around 2, and how it's very steep between 0.5 and 1.5. Notice also the large hump at the low end of the histogram, and look at the plot(points ~ x) curve again. Do you see how most of the values (whether they're at the low end or the high end of that curve) are close to 0, or at least less than 0.25. If you look at those details, you may be able to convince yourself that the histogram is, in fact, exactly what you should expect :)
If you want a Monte Carlo simulation of a sample from this object, you might try something like:
samples <- replicate(1000, sample(points, 100, replace = TRUE))
If you want to generate data using points as a probability density function, that question has been asked and answered here
Let's define your (not normalized) probability density function as a function:
library(gamlss)
fun <- function(x, mu = 1, sigma = 2, tau = 3, kappa = 3, rate = 1, Rmax = 20)
Rmax * dexGAUS(x, mu = mu, sigma = sigma, nu = tau) *
pgamma(x, shape = kappa, rate = rate)
Now one approach is to use some MCMC (Markov chain Monte Carlo) method. For instance,
simMCMC <- function(N, init, fun, ...) {
out <- numeric(N)
out[1] <- init
for(i in 2:N) {
pr <- out[i - 1] + rnorm(1, ...)
r <- fun(pr) / fun(out[i - 1])
out[i] <- ifelse(runif(1) < r, pr, out[i - 1])
}
out
}
It starts from point init and gives N draws. The approach can be improved in many ways, but I'm simply only going to start form init = 5, include a burnin period of 20000 and to select every second draw to reduce the number of repetitions:
d <- tail(simMCMC(20000 + 2000, init = 5, fun = fun), 2000)[c(TRUE, FALSE)]
plot(density(d))
You invert the ECDF of the distribution:
ecd.points <- ecdf(points)
invecdfpts <- with( environment(ecd.points), approxfun(y,x) )
samp.inv.ecd <- function(n=100) invecdfpts( runif(n) )
plot(density (samp.inv.ecd(100) ) )
plot(density(points) )
png(); layout(matrix(1:2,1)); plot(density (samp.inv.ecd(100) ),main="The Sample" )
plot(density(points) , main="The Original"); dev.off()
Here's another way to do it that draws from R: Generate data from a probability density distribution and How to create a distribution function in R?:
x <- seq(1, 2e1, 0.01)
points <- 20*dexGAUS(x,mu=1,sigma=2,nu=3)*pgamma(x,shape=3,rate=1)
f <- function (x) (20*dexGAUS(x,mu=1,sigma=2,nu=3)*pgamma(x,shape=3,rate=1))
C <- integrate(f,-Inf,Inf)
> C$value
[1] 11.50361
# normalize by C$value
f <- function (x)
(20*dexGAUS(x,mu=1,sigma=2,nu=3)*pgamma(x,shape=3,rate=1)/11.50361)
random.points <- approx(cumsum(pdf$y)/sum(pdf$y),pdf$x,runif(10000))$y
hist(random.points,1000)
hist((random.points*40),1000) will get the scaling like your original function.

Difference in 2D KDE produced using kde2d (R) and ksdensity2d (Matlab)

While trying to port some code from Matlab to R I have run into a problem. The gist of the code is to produce a 2D kernel density estimate and then do some simple calculations using the estimate. In Matlab the KDE calculation was done using the function ksdensity2d.m. In R the KDE calculation is done with kde2d from the MASS package. So lets say I want to calculate the KDE and just add the values (this is not what i intend to do, but it serves this purpose). In R this can be done by
library(MASS)
set.seed(1009)
x <- sample(seq(1000, 2000), 100, replace=TRUE)
y <- sample(seq(-12, 12), 100, replace=TRUE)
kk <- kde2d(x, y, h=c(30, 1.5), n=100, lims=c(1000, 2000, -12, 12))
sum(kk$z)
which gives the answer 0.3932732. When using ksdensity2d in Matlab using the same exact data and conditions the answer is 0.3768. From looking at the code for kde2d I noticed that the bandwidth is divided by 4
kde2d <- function (x, y, h, n = 25, lims = c(range(x), range(y)))
{
nx <- length(x)
if (length(y) != nx)
stop("data vectors must be the same length")
if (any(!is.finite(x)) || any(!is.finite(y)))
stop("missing or infinite values in the data are not allowed")
if (any(!is.finite(lims)))
stop("only finite values are allowed in 'lims'")
n <- rep(n, length.out = 2L)
gx <- seq.int(lims[1L], lims[2L], length.out = n[1L])
gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])
h <- if (missing(h))
c(bandwidth.nrd(x), bandwidth.nrd(y))
else rep(h, length.out = 2L)
if (any(h <= 0))
stop("bandwidths must be strictly positive")
h <- h/4
ax <- outer(gx, x, "-")/h[1L]
ay <- outer(gy, y, "-")/h[2L]
z <- tcrossprod(matrix(dnorm(ax), , nx), matrix(dnorm(ay),
, nx))/(nx * h[1L] * h[2L])
list(x = gx, y = gy, z = z)
}
A simple check to see if the difference in bandwidth is the reason for the difference in the results is then
kk <- kde2d(x, y, h=c(30, 1.5)*4, n=100, lims=c(1000, 2000, -12, 12))
sum(kk$z)
which gives 0.3768013 (which is the same as the Matlab answer).
So my question is then: Why does kde2d divide the bandwidth by four? (Or why doesn't ksdensity2d?)
At the mirrored github source, lines 31-35:
if (any(h <= 0))
stop("bandwidths must be strictly positive")
h <- h/4 # for S's bandwidth scale
ax <- outer(gx, x, "-" )/h[1L]
ay <- outer(gy, y, "-" )/h[2L]
and the help file for kde2d(), which suggests looking at the help file for bandwidth. That says:
...which are all scaled to the width argument of density and so give
answers four times as large.
But why?
density() says that the width argument exists for the sake of compatibility with S (the precursor to R). The comments in the source for density() read:
## S has width equal to the length of the support of the kernel
## except for the gaussian where it is 4 * sd.
## R has bw a multiple of the sd.
The default is the Gaussian one. When the bw argument is unspecified and width is, width is substituted in, eg.
library(MASS)
set.seed(1)
x <- rnorm(1000, 10, 2)
all.equal(density(x, bw = 1), density(x, width = 4)) # Only the call is different
However, because kde2d() was apparently written to remain compatible with S (and I suppose it was originally written FOR S, given it's in MASS), everything ends up divided by four. After flipping to the relevant section of MASS the book (around p.126), it seems they may have picked four to strike a balance between smoothness and fidelity of data.
In conclusion, my guess is that kde2d() divides by four to remain consistent with the rest of MASS (and other things originally written for S), and that the way you're going about things looks fine.

Resources