Using optim to choose initial values for nls - r

One method I have seen in the literature is the use of optim() to choose initial values for nonlinear models in the package nls or nlme, however, I am puzzled by the actual implementation.
Take an example using COVID data from Alachua, FL:
dat=data.frame(x=seq(1,10,1), y=c(27.9,23.1,24.6,33.0,48.0,136.4,243.4,396.7,519.9,602.8))
x are time points and y is the number of people infected per 10,000 people
Now, if I wanted to fit a four-parameter logistic model in nls, I could use
n1 <- nls(y ~ SSfpl(x, A, B, M, S), data = dat)
But now imagine that parameter estimation is highly sensitive to the initial values so I want to optimize my approach. How would this be achieved?
The way I have thought to try is as follows
fun_to_optim <- function(data, guess){
x = data$x
y = data$y
A = guess[1]
B = guess[2]
M = guess[3]
S = guess[4]
y = A + (B-A)/(1+exp((M-x)/S))
return(-sum(y)) }
optim(fn=fun_to_optim, data=dat,
The result from optim() is wrong but I cannot see my error. Thank you for any assistance.

The main issue is that you're not computing/returning the sum of squares from your objective function. However: I think you really have it backwards. Using nls() with SSfpl is about the best you're going to do in terms of optimization: it has sensible heuristics for picking starting values (SS stands for "self-starting"), and it provides a gradient function for the optimizer. It's not impossible that, with a considerable amount of work, you could find better heuristics for picking starting values for a particular system, but in general switching from nls to optim + Nelder-Mead will leave you worse off than when you started (illustration below).
fun_to_optim <- function(data, guess){
x = data$x
y = data$y
A = guess[1]
B = guess[2]
M = guess[3]
S = guess[4]
y_pred = A + (B-A)/(1+exp((M-x)/S))
Fit optim() with (1) your suggested starting values; (2) better starting values that are somewhere nearer the correct values (you could get most of these values by knowing the geometry of the function — e.g. A is the left asymptote, B is the right asymptote, M is the midpoint, S is the scale); (3) same as #2 but using BFGS rather than Nelder-Mead.
opt1 <- optim(fn=fun_to_optim, data=dat,
opt2 <- optim(fn=fun_to_optim, data=dat,
method = "Nelder-Mead")
opt3 <- optim(fn=fun_to_optim, data=dat,
method = "BFGS")
xvec <- seq(1,10,length=101)
plot(y~x, data=dat)
lines(xvec, predict(n1, newdata=data.frame(x=xvec)))
p1 <- with(as.list(opt1$par), A + (B-A)/(1+exp((M-xvec)/S)))
lines(xvec, p1, col=2)
p2 <- with(as.list(opt2$par), A + (B-A)/(1+exp((M-xvec)/S)))
lines(xvec, p2, col=4)
p3 <- with(as.list(opt3$par), A + (B-A)/(1+exp((M-xvec)/S)))
lines(xvec, p3, col=6)
legend("topleft", col=c(1,2,4,6), lty=1,
legend=c("nls","NM (bad start)", "NM", "BFGS"))
nls and good starting values + BFGS overlap, and provide a good fit
optim/Nelder-Mead from bad starting values is absolutely terrible — converges on a constant line
optim/N-M from good starting values gets a reasonable fit, but obviously worse; I haven't analyzed why it gets stuck there.


Determine what is the break point for the slope change in R [migrated]

I'm trying to implement a "change point" analysis, or a multiphase regression using nls() in R.
Here's some fake data I've made. The formula I want to use to fit the data is:
$y = \beta_0 + \beta_1x + \beta_2\max(0,x-\delta)$
What this is supposed to do is fit the data up to a certain point with a certain intercept and slope ($\beta_0$ and $\beta_1$), then, after a certain x value ($\delta$), augment the slope by $\beta_2$. That's what the whole max thing is about. Before the $\delta$ point, it'll equal 0, and $\beta_2$ will be zeroed out.
So, here's my function to do this:
changePoint <- function(x, b0, slope1, slope2, delta){
b0 + (x*slope1) + (max(0, x-delta) * slope2)
And I try to fit the model this way
nls(y ~ changePoint(x, b0, slope1, slope2, delta),
data = data,
start = c(b0 = 50, slope1 = 0, slope2 = 2, delta = 48))
I chose those starting parameters, because I know those are the starting parameters, because I made the data up.
However, I get this error:
Error in nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
Have I just made unfortunate data? I tried fitting this on real data first, and was getting the same error, and I just figured that my initial starting parameters weren't good enough.
(At first I thought it could be a problem resulting from the fact that max is not vectorized, but that's not true. It does make it a pain to work with changePoint, wherefore the following modification:
changePoint <- function(x, b0, slope1, slope2, delta) {
b0 + (x*slope1) + (sapply(x-delta, function (t) max(0, t)) * slope2)
This R-help mailing list post describes one way in which this error may result: the rhs of the formula is overparameterized, such that changing two parameters in tandem gives the same fit to the data. I can't see how that is true of your model, but maybe it is.
In any case, you can write your own objective function and minimize it. The following function gives the squared error for data points (x,y) and a certain value of the parameters (the weird argument structure of the function is to account for how optim works):
sqerror <- function (par, x, y) {
sum((y - changePoint(x, par[1], par[2], par[3], par[4]))^2)
Then we say:
optim(par = c(50, 0, 2, 48), fn = sqerror, x = x, y = data)
And see:
[1] 54.53436800 -0.09283594 2.07356459 48.00000006
Note that for my fake data (x <- 40:60; data <- changePoint(x, 50, 0, 2, 48) + rnorm(21, 0, 0.5)) there are lots of local maxima depending on the initial parameter values you give. I suppose if you wanted to take this seriously you'd call the optimizer many times with random initial parameters and examine the distribution of results.
Just wanted to add that you can do this with many other packages. If you want to get an estimate of uncertainty around the change point (something nls cannot do), try the mcp package.
# Simulate the data
df = data.frame(x = 1:100)
df$y = c(rnorm(20, 50, 5), rnorm(80, 50 + 1.5*(df$x[21:100] - 20), 5))
# Fit the model
model = list(
y ~ 1, # Intercept
~ 0 + x # Joined slope
fit = mcp(model, df)
Let's plot it with a prediction interval (green line). The blue density is the posterior distribution for the change point location:
# Plot it
plot(fit, q_predict = T)
You can inspect individual parameters in more detail using plot_pars(fit) and summary(fit).

Using mle2 function

I would like to find the MLE for parameters epsilon and mu in such a model:
$$X \sim \frac{1}{mu1}e^{-x/mu1}+\frac{1}{mu2}e^[-x/mu2}$$
epsilon = 0.01
#the real model
X <- rmixexp2(n = 20, prob1 = epsilon, rate1 = 1/mu1, rate2 = 1/mu2)
LL <- function(mu1,mu2, eps){
R = (1-eps)*dexp(X,rate=1/mu1,log=TRUE)+eps*dexp(X,rate=1/mu2,log=TRUE)
fit_norm <- mle2(LL, start = list(eps = 0,mu1=1, mu2 = 1), lower = c(-Inf, 0),
upper = c(Inf, Inf), method = 'L-BFGS-B')
But I get the error
> fn = function (p) ':method 'L-BFGS-B' requires finite values of fn"
There are a bunch of issues here. The primary one is that your likelihood expression is wrong (you can't log the components separately and then add them, you have to add the components and then take the log). Your bounds are also funny: the mixture probability should be [0,1] and the means should be [0, Inf].
The other problem you have is that with the current simulation design (n=20, prob=0.01), you have a high probability of getting no points in the first mixture component (the probability of a point being in the second component is 1-0.01=0.99, so the probability that all of the points are in the second component is 0.99^20 = 82%). In this case the MLE will be degenerate (i.e., you're trying to fit a two-component mixture to a data set that essentially only has one component); in this case any of these solutions will give equivalent likelihoods:
prob=0, mu2=mean of the data, mu1=anything
prob=1, mu1=mean of the data, mu2=anything
mu1=mu2=mean of the data, prob=anything
With all these solutions, where you end up will depend very sensitively on starting conditions and optimization algorithm.
For this problem I would encourage you to use the built-in dmixexp2 function from the Renext package (which correctly implements the log-likelihood as log(p*Prob(X|exp1) + (1-p)*Prob(X|exp2))) and the formula interface to mle2:
fit_norm <- mle2(X ~ dmixexp2(rate1=1/mu1,rate2=1/mu2,prob1=eps),
start = list(mu1=1, mu2 = 2, eps=0.4),
lower = c(mu1=0, mu2=0, eps=0),
upper = c(mu1=Inf, mu2=Inf, eps=1),
method = 'L-BFGS-B')
This gives me estimates of mu1=1.58, mu2=2.702, eps=0. mean(X) in my case equals the value of mu2, so this is the first case in the bulleted list above. You also get a warning:
some parameters are on the boundary: variance-covariance calculations based on Hessian may be unreliable
There are also a variety of more specialized algorithms for fitting mixture models (especially those based on the expectation-maximization algorithm); you can look for packages on CRAN (flexmix is one of them).
This problem is small enough that you can visualize the whole log-likelihood surface by brute force (code below): the colours represent deviations from the minimum negative log-likelihood (the colour gradient is log-scaled, so there's a small offset to avoid log(0)). Dark blue represents parameters that are the best fit to the data, yellow are the worst.
dd <- expand.grid(mu1=seq(0.1,4,length=51),
for (i in 1:nrow(dd)) {
dd$nll[i] <- with(dd[i,],
ggplot(dd,aes(mu1,mu2,fill=nll-min(nll)+1e-4)) +
facet_wrap(~eps, labeller=label_both) +
geom_raster() +
scale_fill_viridis_c(trans="log10") +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)) +
ggsave("fit_norm.png", type="cairo-png")

using weights in the simulate() function in R

I want to generate predicted values of a GLM function including stochastic uncertainty. I use 2 approaches and compare them to make sure its correct.
n <- 1500
d <- mvrnorm(n=n, mu=c(0,0,0,0),Sigma=matrix(.7, nrow=4, ncol=4) + diag(4)*.3)
d[,1] <- qgamma(p=pnorm(q=d[,1]), shape=2, rate=2) * 1000
m <- glm(formula=d[,1] ~ d[,2] + d[,3] + d[,4], family=gaussian(link="sqrt"))
p_lin <- m$coef[1] + m$coef[2]*d[,2] + m$coef[3]*d[,3] + m$coef[4]*d[,4]
p1 <- rnorm(n=n, mean=p_lin^2, sd=sd(p_lin^2 - d[,1]))
p2 <- simulate(m)$sim_1
par(mfrow=c(1,1), mar=c(4,2,2,1), pch=16, cex=0.8, pty="s")
xylim <- c(min(c(d[,1], p1, p2)), max(c(d[,1], p1, p2)))
plot(x=d[,1], y=p1, xlab="predicted values", ylab="original data", xlim=xylim, ylim=xylim, col=rgb(0,0,0,alpha=0.1))
points(x=d[,1], y=simulate(m)$sim_1, col=rgb(0,1,0,alpha=0.1))
abline(a=0, b=1, col="red")
The predictions differ. Looking at the source code of the simulate() function
(this can be done by using:
getS3method(c("predict"), class = "glm")
I see that the simulate() function applies a weighted-based sd:
if (!is.null(object$weights))
vars <- vars/object$weights ftd + rnorm(ntot, sd = sqrt(vars)) # this is the prediction including stochastic uncertainty; ftd is defined as fitted(object)
Looking at the help function I read that "The methods for linear models fitted by lm or glm(family = "gaussian") assume that any weights which have been supplied are inversely proportional to the error variance." However, I assume this is about the prior weights, which I did not apply and are NULL (m$prior.weights). However, the simulate function seems to use the m$weights, which seems identical to 4*m$fitted.values. I googled a lot but can't get to the bottom of this. Why does the simulate() function apply these weights in the sd? Is this correct? How are these weights calculated?
(its related to the post: microsimulation GLM including stochastic part; hopefully I'm not wrong in starting a new one)

Prevent a nls-fit from falling below zero

I'm trying to fit a function in R and therefor I use nls().
Is there a way to prevent the fitted function from falling below zero?
An easy work around would be to rise the parameter b0 in the target function after the fit, but this is actually not what I want because I expect a real fit with the constraint of beeing positive to lead to a better result.
d=data.frame(seq(1, 10, 1),y=y)
fitFun <- function(x, add, b0, b1) {b0 + (x+add)^b1}
Are you looking for this? Constraining the parameters to make the prediction non-negative can be tricky if the prediction is a hard-to-invert function of the parameters, but in this case we just have to require b0>=0 ... using #Roland's example,
fit2 <- nls(y~b0+(x+add)^b1,
In the following the blue is the original unconstrained fit; red is #Roland's fit; and purple is the fit above.
You need to change your model. For that you need to define what should happen if the function values would fall below zero. Here is an example, which sets these values to 0.
x <- 1:200/100
y <- -10+(x+1)^3.5+rnorm(length(x),sd=3)
df <- data.frame(x,y)
fitFun <- function(x, add, b0, b1) {
res <- b0 + (x+add)^b1
res[res<0] <- 0
fit <- nls(y~fitFun(x,add,intercept,power),
Thanks a lot for the answers. Maybe I didn't give enough information about my problem, but I'm not yet allowed to post pictures and describing everything would have led to a short story.
#Roland was perfectly right it's not the optimizers task to care about the behaviour of the target function, but as I mentioned I assume the model to be fix.
#Ben Bolker's suggestion to limit the additive part of the function to positive values led to an unsatifying result.
What I didn't mention was that m1 to m10 are mean values of a data collection I recorded. I solved my problem by using the variance of the recorded series as weights during the fitting process.
d=data.frame(seq(1, 10, 1),y=y)
vars = c(var(lt1$V1),var(lt2$V1),var(lt3$V1),var(lt4$V1),var(lt5$V1),var(lt6$V1),var(lt7$V1),var(lt8$V1),var(lt9$V1),var(lt10$V1))
weights = rep(max(vars),10)/vars
fitFun <- function(x, add, b0, b1) {b0 + (x+add)^b1}

specifying degrees of freedom for b-spline fit using bs function in splines package

I am using the bs function of the splines package to create a b-spline smoothing curve for graphical purposes. (There is at least one report that Excel uses a third order b-spline for its smooth line graphs, and I would like to be able to duplicate those curves.) I am having trouble understanding the arguments required by the bs function. Representative code follows below, as adapted from the bs documentation:
n <- 10
x <- 1:10
y <- rnorm(n)
d <- data.frame(x=x, y=y)
summary(fm1 <- lm(y ~ bs(x, degree=3)), data=d)
x.spline <- seq(1, 10, length.out=n*10) <- data.frame(x=x.spline, y=predict(fm1, data.frame(x=x.spline)))
ggplot(d, aes(x,y)) + geom_point + geom_line(aes(x,y),
The example code in the bs documentation specifies df=5 in the call to bs, and does not specify degree. I have no idea how many degrees of freedom I have. All I know is that I want a third order b-spline. I have experimented with specifying different values of df instead of, or in addition to degree, and I get dramatically different results. This is why I suspect that a specification of df is the issue here. How would I calculate df in this context?
The help file suggests df = length(knots) + degree. If I treat the interior points as knots, this gives me df=11 for this example, which generates error messages and a nonsensical spline fit.
Thank you in advance.
I was apparently not clear in my intentions. I am trying to do this:
How can I use spline() with ggplot?, but with b-splines.
You should not be trying to fit every point. The goal is to find a summary that is an acceptable fit but which depends on a limited number of knots. There is not much value in increasing hte degree of the polynomial above the default of three. With only 10 points you surely do not want df=11. Try df=5 and the results should be reasonably flat. The rms/Hnisc package author, Frank Harrell, prefers restricted cubic splines because the predictions at the extremes are linear and thus less wild than would occur with other polynomial bases.
I corrected a couple of misspellings and added a knots argument to make your code work:
require(ggplot2); set.seed(trunc(100000*pi))
n <- 10
x <- 1:10
y <- rnorm(n)
d <- data.frame(x=x, y=y)
summary(fm1 <- lm(y ~ bs(x, degree=3, knots=2)), data=d)
x.spline <- seq(1, 10, length.out=n*10) <- data.frame(x=x.spline, y=predict(fm1, data.frame(x=x.spline)))
ggplot(d, aes(x,y)) + geom_point() + geom_line(aes(x,y),
I came away from the exercise of varying the randomseed with the opinion that Frank Harrell knows what he is talking about. I don't get the same sort of behavior at the extremes when using his packages.
I did a little more work and came up with the following. First, an apology. What I was looking for was a smoothing spline, rather than a regression spline. I did not have the vocabulary to phrase the question properly. While the example in the help file for bs() appears to provide this, the function does not provide the same behavior for my sample data. There is another function, smooth.spline, in the stats package, which offers what I needed.
n <- 10
x <- 1:n
xx <- seq(1, n, length.out=200)
y <- rnorm(n)
d <- data.frame(x=x, y=y)
spl <- smooth.spline(x,y, spar=0.1) <- data.frame(y=predict(spl,xx))
ggplot(d,aes(x,y)) + geom_point() + geom_line(aes(x,y),
spl2 <- smooth.spline(x, y, control=
list(trace=TRUE, tol=1e-6, spar=0.1, low=-1.5, high=0.3))
spline.data2 <- data.frame(predit(spl2,xx))
ggplot(d,aes(x,y)) + geom_point() + geom_line(aes(x,y), spline.data2)
The two calls to smooth.spline represent two approaches. The first specifies the smoothing parameter manually, and the second iterates to an optimal solution. I found that I had to constrain the optimization properly to get the type of solution I was after.
The result is intended to match the b-spline used by the Excel line plot. I have collaborators who consider Excel graphics to be the standard, and I need to at least match that performance.
