do a nonlinear least square fit in r - r

I have two vectors:
y <- c(0.044924, 0.00564, 0.003848, 0.002385, 0.001448, 0.001138,
0.001025, 0.000983, 0.00079, 0.000765, 0.000721, 0.00061, 0.000606,
0.000699, 0.000883, 0.001069, 0.001226, 0.001433, 0.00162, 0.001685,
0.001604, 0.001674, 0.001706, 0.001683, 0.001505, 0.001497, 0.001416,
0.001449, 0.001494, 0.001544, 0.00142, 0.001458, 0.001544, 0.001279,
0.00159, 0.001756, 0.001749, 0.001909, 0.001885, 0.002063, 0.002265,
0.002137, 0.002391, 0.002619, 0.002733, 0.002957, 0.003244, 0.003407,
0.003563, 0.003889, 0.004312, 0.004459, 0.004946, 0.005248, 0.005302,
0.00574, 0.006141, 0.006977, 0.007386, 0.007843, 0.008473, 0.008949,
0.010164, 0.010625, 0.011279, 0.01191, 0.012762, 0.014539, 0.01477)
x <- 0:68
I am trying to use the non-linear least squares function to fit the data but I keep getting the error:
Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates
My code is:
a=0.00012
b=0.08436
k=0.21108
fit = nls(y ~ (a*b*exp(b*x)*k)/((k*b)+(a*(exp(b*x)-1))), start=list(a=a,b=b,k=k))
The parameters I have entered are parameters that I know are close to the expected values. Does anyone know what am I doing wrong here?
I have tried various initial values for the parameters a, b and k, but I always get some kind of error.

Use optim() instead. You have to make a function which takes a,b and k as input (collected as a vector), and which returns the squared error as a result:
func <-function(pars) {
a <- pars["a"]
b <- pars["b"]
k <- pars["k"]
fitted <- (a*b*exp(b*x)*k)/((k*b)+(a*(exp(b*x)-1)))
sum((y-fitted)^2)
}
Then we run optim() using the initial values:
result <- optim(c(a=0.00012, b=0.08436, k=0.21108), func)
To test the resulting fit:
plot(x, y)
a <- result$par["a"]
b <- result$par["b"]
k <- result$par["k"]
lines((a*b*exp(b*x)*k)/((k*b)+(a*(exp(b*x)-1))), col = "blue")

Related

Fit a dataset that presents an elbow/knee bent using in nlsLM and "force" the coefficients to be near a threshold

Due to the necessity of fitting a dataset that is related to a two dimensional diffusion process D2 process with a sestak berggren model (derived from logistic model) I needed to understand how to use the nlsLM
when in presence of a elbow/knee because the following "easy way did not work"
x=c(1.000000e-05, 1.070144e-05, 1.208082e-05, 1.456624e-05, 1.861581e-05, 2.490437e-05, 3.407681e-05, 4.696710e-05,
6.474653e-05, 8.870800e-05, 1.206194e-04, 1.624442e-04, 2.172716e-04, 2.882747e-04, 3.794489e-04, 4.956619e-04,
6.427156e-04, 8.275095e-04, 1.058201e-03, 1.344372e-03, 1.697222e-03, 2.129762e-03, 2.657035e-03, 3.296215e-03,
4.067301e-03, 4.992831e-03, 6.098367e-03, 7.412836e-03, 8.968747e-03, 1.080251e-02, 1.295471e-02, 1.547045e-02,
1.839960e-02, 2.179713e-02, 2.572334e-02, 3.024414e-02, 3.543131e-02, 4.136262e-02, 4.812205e-02, 5.579985e-02,
6.449256e-02, 7.430297e-02, 8.533991e-02, 9.771803e-02, 1.115573e-01, 1.269824e-01, 1.441219e-01, 1.631074e-01,
1.840718e-01, 2.071477e-01, 2.324656e-01, 2.601509e-01, 2.903210e-01, 3.230812e-01, 3.585200e-01, 3.967033e-01,
4.376671e-01, 4.814084e-01, 5.278744e-01, 5.769469e-01, 6.284244e-01, 6.819947e-01, 7.371982e-01, 7.933704e-01,
8.495444e-01, 9.042616e-01)
ynorm=c(
1.000000e+00, 8.350558e-01, 6.531870e-01, 4.910995e-01, 3.581158e-01, 2.553070e-01, 1.814526e-01, 1.290639e-01,
9.219591e-02, 6.623776e-02, 4.817180e-02, 3.543117e-02, 2.624901e-02, 1.961542e-02, 1.478284e-02, 1.123060e-02,
8.597996e-03, 6.631400e-03, 5.151026e-03, 4.028428e-03, 3.171096e-03, 2.511600e-03, 2.001394e-03, 1.604211e-03,
1.292900e-03, 1.047529e-03, 8.530624e-04, 6.981015e-04, 5.739778e-04, 4.740553e-04, 3.932255e-04, 3.275345e-04,
2.739059e-04, 2.299339e-04, 1.937278e-04, 1.637946e-04, 1.389500e-04, 1.182504e-04, 1.009406e-04, 8.641380e-05,
7.418032e-05, 6.384353e-05, 5.508090e-05, 4.762920e-05, 4.127282e-05, 3.583451e-05, 3.116813e-05, 2.715264e-05,
2.368759e-05, 2.068935e-05, 1.808802e-05, 1.582499e-05, 1.385102e-05, 1.212452e-05, 1.061032e-05, 9.278534e-06,
8.103650e-06, 7.063789e-06, 6.140038e-06, 5.315870e-06, 4.576585e-06, 3.908678e-06, 3.298963e-06, 2.732866e-06,
2.189810e-06, 1.614149e-06)
dfxy <- data.frame(x[1:length(ynorm)],ynorm)
fn=funSel <-"co*((1-x)^m)*(x^n)"
mod_fit <- nlsLM(ynorm~eval(parse(text=fn)),start=c(co=0.5,m=-1,n=0.5),data=dfxy)
plot(dfxy$x,dfxy$y,xlim=c(0,0.001))
plot(dfxy$x,(fitted(mod_fit))[1:length(dfxy$x)],xlim=c(0,0.001))
The only solution I've found is based on https://stackoverflow.com/a/54286595/6483091. So first finding the "elbow" and then applying the regression only to the reduced dataset. Everything in this way works but I was wondering if there can be other solutions (tweaking the parameter of the regression instead of making it in two steps, in some way let nlsLM "recognize" the curve using Dynamic First Derivate Threshold, but still forcing the fn for regression)
Also the "biggest problem is that I alredy know the "range" for the parameters" (i.e.
Applying a regression using "good" starting point (coefficients near the "ground truth" ynorm <- 0.973*(1-x)^(0.425)*x^(-1.008) ) but even if I give them as a starting point there is no way I obtain anything with similar values.
the "ground truth"
plot(x,yrnom) yt <- 0.973*(1-x)^(0.425)*x^(-1.008)
lines(x,yt/max(yt))
Here is a solution using nls and a hyperbolic fit:
x=c(1.000000e-05, 1.070144e-05, 1.208082e-05, 1.456624e-05, 1.861581e-05, 2.490437e-05, 3.407681e-05, 4.696710e-05,
6.474653e-05, 8.870800e-05, 1.206194e-04, 1.624442e-04, 2.172716e-04, 2.882747e-04, 3.794489e-04, 4.956619e-04,
6.427156e-04, 8.275095e-04, 1.058201e-03, 1.344372e-03, 1.697222e-03, 2.129762e-03, 2.657035e-03, 3.296215e-03,
4.067301e-03, 4.992831e-03, 6.098367e-03, 7.412836e-03, 8.968747e-03, 1.080251e-02, 1.295471e-02, 1.547045e-02,
1.839960e-02, 2.179713e-02, 2.572334e-02, 3.024414e-02, 3.543131e-02, 4.136262e-02, 4.812205e-02, 5.579985e-02,
6.449256e-02, 7.430297e-02, 8.533991e-02, 9.771803e-02, 1.115573e-01, 1.269824e-01, 1.441219e-01, 1.631074e-01,
1.840718e-01, 2.071477e-01, 2.324656e-01, 2.601509e-01, 2.903210e-01, 3.230812e-01, 3.585200e-01, 3.967033e-01,
4.376671e-01, 4.814084e-01, 5.278744e-01, 5.769469e-01, 6.284244e-01, 6.819947e-01, 7.371982e-01, 7.933704e-01,
8.495444e-01, 9.042616e-01)
ynorm=c(
1.000000e+00, 8.350558e-01, 6.531870e-01, 4.910995e-01, 3.581158e-01, 2.553070e-01, 1.814526e-01, 1.290639e-01,
9.219591e-02, 6.623776e-02, 4.817180e-02, 3.543117e-02, 2.624901e-02, 1.961542e-02, 1.478284e-02, 1.123060e-02,
8.597996e-03, 6.631400e-03, 5.151026e-03, 4.028428e-03, 3.171096e-03, 2.511600e-03, 2.001394e-03, 1.604211e-03,
1.292900e-03, 1.047529e-03, 8.530624e-04, 6.981015e-04, 5.739778e-04, 4.740553e-04, 3.932255e-04, 3.275345e-04,
2.739059e-04, 2.299339e-04, 1.937278e-04, 1.637946e-04, 1.389500e-04, 1.182504e-04, 1.009406e-04, 8.641380e-05,
7.418032e-05, 6.384353e-05, 5.508090e-05, 4.762920e-05, 4.127282e-05, 3.583451e-05, 3.116813e-05, 2.715264e-05,
2.368759e-05, 2.068935e-05, 1.808802e-05, 1.582499e-05, 1.385102e-05, 1.212452e-05, 1.061032e-05, 9.278534e-06,
8.103650e-06, 7.063789e-06, 6.140038e-06, 5.315870e-06, 4.576585e-06, 3.908678e-06, 3.298963e-06, 2.732866e-06,
2.189810e-06, 1.614149e-06)
dfxy <- data.frame(x[1:length(ynorm)],ynorm)
plot(ynorm ~ x.1.length.ynorm.., data = dfxy)
mod <- nls(ynorm ~ a/x.1.length.ynorm.. + b, data = dfxy, start = list(a = 1, b = 0))
lines(x = dfxy$x.1.length.ynorm.., y = predict(mod, newdata = dfxy$x.1.length.ynorm..))
The fit isn't perfect, though. I guess there is no continuous function to fit a right angle...
Depending on what you want to use the regression for, you could also use a loess regression:
dfxy <- data.frame(x[1:length(ynorm)],ynorm)
names(dfxy) <- c("x", "y")
plot(y ~ x, data = dfxy)
mod <- loess(y ~ x, data = dfxy, span = 0.1)
lines(x = dfxy$x, y = predict(mod, newdata = dfxy$x), col = "red")
Resulting in:

Error in 'indepTest' in PC algorithm for conditional Independence Test

I am using PC algorithm function, in which Conditional Independence is one of the attribute. Facing error in the following code. Note that 'data' here is the data that I have been using, and 1,6,2 in gaussCItest are the node positions in my adjacency matrix x and y of the data.
code:
library(pcalg)
suffstat <- list(C = cor(data), n = nrow(data))
pc.data <- pc(suffstat,
indepTest=gaussCItest(1,6,2,suffstat),
p=ncol(data),alpha=0.01)
Error:
Error in indepTest(x, y, nbrs[S], suffStat) :
could not find function "indepTest"
Below is the code that worked.removed the parameters for gaussCItest as its a function, which can be used directly.
library(pcalg)
suffstat <- list(C = cor(data), n = nrow(data))
pc.data <- pc(suffstat,indepTest=gaussCItest, p=ncol(data),alpha=0.01)

error optim birnbaum sanders distribution fnpar

I don't know why I get an error message when I'm trying to use optim in R.
I have following data:
x <- c(6.0401209, 7.2888217, 0.4868070,
1.1699703, 51.5998419, 11.8766734,
2.3873264, 16.9583702, 21.6142835,
0.3133089, 3.4178360, 4.4367427,
2.0205100, 10.5798884, 0.4890031,
1.6734176, 10.2809820, 6.4705424,
5.6801965, 0.9438700)
And following log-likelihood function:
log.lik.bs <- function(gamma, betha, z){
n <- length(z)
- n * log(gamma) - n * log(2*sqrt(2*2*pi)) - sum(log(z)) + sum(log(sqrt(z/betha)) + sqrt(betha/z)) - (1/2*gamma^2) * sum((sqrt(z/betha) - sqrt(betha/z))^2)
}
What I'm trying to do is following:
optim(c(2, 6), log.lik.bs, control=list(fnscale=-1), x=x, method="BFGS")$par
But I get an error message:
Error in fn(par, ...) :
unused argument (x = c(6.04012089885228, 7.28882174812723, 0.486806990614708, 1.1699703323488, 51.5998418613029, 11.8766733963947, 2.38732637900487, 16.9583701851951, 21.6142834611592, 0.313308870127425, 3.41783600439905, 4.43674270859797, 2.02051001746263, 10.5798883747597, 0.489003100259996, 1.67341757119939, 10.2809820486722, 6.4705423816332, 5.68019649178721, 0.943869996033357))
It is not quite clear which parameters you are trying to optimize. I assume you want to optimize log.lik.bs with respect to gamma and betha for given z with initial values 2 and 6. In that case you have two errors in your code:
log.lik.bs is expecting an argument named z but you are providing an argument x. That is the error you are getting. Fix: z = x
When using optim your target function must take a single argument for the parameters. From ?optim:
A function to be minimized (or maximized), with first argument the vector of parameters over which minimization is to take place. It should return a scalar result.
Combining this I get:
x <- c(6.0401209, 7.2888217, 0.4868070,
1.1699703, 51.5998419, 11.8766734,
2.3873264, 16.9583702, 21.6142835,
0.3133089, 3.4178360, 4.4367427,
2.0205100, 10.5798884, 0.4890031,
1.6734176, 10.2809820, 6.4705424,
5.6801965, 0.9438700)
log.lik.bs <- function(x, z){
gamma <- x[1]
betha <- x[2]
n <- length(z)
- n * log(gamma) - n * log(2*sqrt(2*2*pi)) - sum(log(z))
+ sum(log(sqrt(z/betha)) + sqrt(betha/z))
- (1/2*gamma^2) * sum((sqrt(z/betha) - sqrt(betha/z))^2)
}
optim(c(2, 6), log.lik.bs, control=list(fnscale=-1), z=x, method="BFGS")$par
Unfortunately this still throws an error:
Error in optim(c(2, 6), log.lik.bs, control = list(fnscale = -1), z = x, :
non-finite finite-difference value [1]
In addition there are several warnings that NaNs where introduced by sqrt and log. So my interpretation of your question might be wrong. After all, the function goes to infinity as gamma goes to zero.

R - numerical errors with analytical gradient?

I've got the following code:
theta=0.05
n=1000
m=200
r=rnorm(2000)
#ER check function
nu=Vectorize(function(a,tau){return(abs(tau-(a<0))*a^2)})
#Selecting 10 lowest sum values (lowest10 function returns indices)
lowest10=function(x){
values=sort(x)[1:min(10,length(x))]
indices=match(values,x)
return(indices)
}
sym.expectile=function(beta,e,abs.r){return(beta[1]+beta[2]*e+beta[3]*abs.r)}
ERsum=function(beta,tau,start,end){
y=r[(start+1):end]
X1=rep(1,n-1)
X3=abs(r[start:(end-1)])
X2=c()
X2[1]=e.sym.optimal[start-m]
for (i in 2:(n-1)){
X2[i]=sym.expectile(beta,X2[i-1],X3[i-1])
}
X=matrix(c(X1,X2,X3),ncol=3)
res=y-X%*%beta
sum.nu=mean(nu(res,tau))
return(sum.nu)
}
ERsum.gr=function(beta,tau,start,end){
y=r[(start+1):end]
X1=rep(1,n-1)
X3=abs(r[start:(end-1)])
X2=c()
X2[1]=e.sym.optimal[start-m]
for (i in 2:(n-1)){
X2[i]=sym.expectile(beta,X2[i-1],X3[i-1])
}
X=matrix(c(X1,X2,X3),ncol=3)
partial.beta0=c()
for (i in 1:(n-1)){partial.beta0[i]=-(1-beta[2]^(i))/(1-beta[2])}
gr.beta0=2/T*sum(abs(tau-(y<X%*%beta))*(y-X%*%beta)*partial.beta0)/1000
partial.beta1=c()
partial.beta1[1]=-X2[1]
for (i in 2:(n-1)){partial.beta1[i]=partial.beta1[i-1]*beta[2]-X2[i]}
gr.beta1=2/T*sum(abs(tau-(y<X%*%beta))*(y-X%*%beta)*partial.beta1)/1000
partial.beta2=c()
partial.beta2[1]=-X3[1]
for (i in 2:(n-1)){partial.beta2[i]=partial.beta2[i-1]*beta[2]-X3[i]}
gr.beta2=2/T*sum(abs(tau-(y<X%*%beta))*(y-X%*%beta)*partial.beta2)/1000
c(gr.beta0,gr.beta1,gr.beta2)
}
beta=matrix(nrow=1e4,ncol=3)
beta[,1]=runif(1e4,-1,0)#beta0
beta[,2]=runif(1e4,0,1)#beta1
beta[,3]=runif(1e4,-1,0)#beta2
e.sym.optimal=c()
tau.found.sym.optim=0.02234724
library('expectreg')
e.sym.optimal[1]=expectile(r[1:m],tau.found.sym.optim)
ERsums.sym=c()
for (i in 1:nrow(beta)){
ERsums.sym[i]=ERsum(beta[i,],tau.found.sym.optim,m+1,m+n)
}
initialbeta.esym=beta[lowest10(ERsums.sym),]
intermedietebeta.esym=matrix(ncol=3,nrow=10)
for (i in 1:10){
intermedietebeta.esym[i,]=optim(initialbeta.esym[i,],ERsum,
gr=ERsum.gr,tau=tau.found.sym.optim,
start=m+1,end=m+n,
method="BFGS")$par
}
I tried to replace the optim function with optimx, but got the following error:
Error: Gradient function might be wrong - check it!
To check if my gradient is ok I tried to evaluate values of gradient function using function grad from numDeriv and directly calling my ERsum.gr function. For the sample vector
beta
[1] -0.8256490 0.7146256 -0.4945032
I obtained following results:
>grad(function(beta) ERsum(c(beta[1],beta[2],beta[3]),tau.found.sym.optim,m+1,m+n),beta)
[1] -0.6703170 2.8812666 -0.5573101
> ERsum.gr2(beta,tau.found.sym.optim,m+1,m+n)
[1] -0.6696467 2.8783853 -0.5567527
So here is my question: is it possible that these differences are just some numerical errors caused by rounding down the partial.beta0, partial.beta1, partial.beta2 which are just the components of the sum representing gradient? I think so, because if my analytical formula for gradient misses something, the discrepancies would be probably much larger, but how can I be sure? If this is a case is there any other way to obtain more accurate values of gradient?
You've got further problems down the line even if you solve the question of whether that is really a proper gradient, which I see as too complex to tackle. If you take out the gr argument and try to run with only optimx instead of optim, you get:
Error in intermedietebeta.esym[i, ] <- optimx(initialbeta.esym[i, ], ERsum, :
number of items to replace is not a multiple of replacement length
This probably relates to the fact that optimx does not return the same structure as is returned by optim:
> optimx(initialbeta.esym[i,],ERsum,
+ tau=tau.found.sym.optim,
+ start=m+1,end=m+n,
+ method="BFGS")$par
NULL
> optimx(initialbeta.esym[i,],ERsum,
+ tau=tau.found.sym.optim,
+ start=m+1,end=m+n,
+ method="BFGS") # leave out `$par`
p1 p2 p3 value fevals gevals niter convcode kkt1 kkt2 xtimes
BFGS -1.0325 0.2978319 0.04921863 0.09326904 102 100 NA 1 TRUE FALSE 3.366
If you disagree with the decision to allow a default gradient estimate, hten you need to narrow down your debugging to the function that throws the error:
Error: Gradient function might be wrong - check it!
> traceback()
3: stop("Gradient function might be wrong - check it! \n", call. = FALSE)
2: optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower,
upper, hessian, optcfg$ctrl, have.bounds = optcfg$have.bounds,
usenumDeriv = optcfg$usenumDeriv, ...)
1: optimx(initialbeta.esym[i, ], ERsum, gr = ERsum.gr, tau = tau.found.sym.optim,
start = m + 1, end = m + n, method = "BFGS")
And look at the documentation (there was no help page) and code for optimx:::optimx.check. This is the section of code that does the checking:
if (!is.null(ugr) && !usenumDeriv) {
gname <- deparse(substitute(ugr))
if (ctrl$trace > 0)
cat("Analytic gradient from function ", gname,
"\n\n")
fval <- ufn(par, ...)
gn <- grad(func = ufn, x = par, ...)
ga <- ugr(par, ...)
teps <- (.Machine$double.eps)^(1/3)
if (max(abs(gn - ga))/(1 + abs(fval)) >= teps) {
stop("Gradient function might be wrong - check it! \n",
call. = FALSE)
optchk$grbad <- TRUE
}

Using Beta.Select function in R (prior estimate)

I am trying to formulate the priors by using total counts and beta distribution.
I have following written:
quantile(df$row, probs=c(0.00001, 0.5, 0.99999))
quantile1 <- list(p=0.5, x=8)
quantile2 <- list(p=0.99999, x=10)
quantile3 <- list(p=0.00001, x=1)
library("LearnBayes")
findBeta <- function(quantile1,quantile2,quantile3)
quantile1_p <- quantile1[[1]]; quantile1_q <- quantile1[[2]]
quantile2_p <- quantile2[[1]]; quantile2_q <- quantile2[[2]]
quantile3_p <- quantile3[[1]]; quantile3_q <- quantile3[[2]]
priorA <- beta.select(list(p=0.5, x=8), list(p=0.99999, x=10))
and once I am trying to calculate priorA using beta.select function I get following error:
Error in if (p0 < p) m.hi = m0 else m.lo = m0 :
missing value where TRUE/FALSE needed
In addition: Warning message:
In pbeta(x, K * m0, K * (1 - m0)) : NaNs produced
I just can't get rid of the error and do not know how to approach it any more. Urgently need help.
I am guessing (completely out of thin air) that you are dealing with percentages. In which case you want to use x/100
beta.select(list(p=0.5, x=.08), list(p=0.9, x=.10))
# [1] 28.02 318.74
Either way, while it would be nice of beta.select to throw a more appropriate error message (or rather, to have an error check in there), the root of the issue is that your x's are out of bounds. (As #Didzis noted, the interval for a beta dist is [0, 1])

Resources