Unknown error message when attempting to find MLE in R - r

I'm trying to find the MLE of distribution whose pdf is specified as 'mixture' in the code. I've provided the code below that gives an error of
"Error in optim(start, f, method = method, hessian = TRUE, ...) :
L-BFGS-B needs finite values of 'fn'"
"claims" is the dataset im using. I tried the same code with just the first two values of "claims" and encountered the same problem, so for a reproducible example the first two values are 1536.77007 and 1946.92409.
The limits on the parameters of the distribution is that 0<.p.<1 and a>0 and b>0, hence the lower and upper bounds in the MLE function. Any help is much appreciated.
#create mixture of two exponential distribution
mixture<-function(x,p,a,b){
d<-p*a*exp(-a*x)+(1-p)*b*exp(-b*x)
d
}
#find MLE of mixture distribution
LL <- function(p,a,b) {
X = mixture(claims,p,a,b)
#
-sum(log(X))
}
mle(LL, start = list(p=0.5,a=1/100,b=1/100),method = "L-BFGS-B", lower=c(0,0,0), upper=c(1,Inf,Inf))
edit: Not really sure why dput(), but anyway,
#first two values of claims put into dput() (the actual values are above)
dput(claims[1:2])
c(307522.103, 195633.5205)

Related

Error using fitdist with truncated distributions

I am trying to fit a certain truncated distribution to a data set. For example, for a lognormal distribution, I define the density function of the truncated distribution as:
dtlnorm <- function(x, meanlog, sdlog,low)
dlnorm(x,meanlog,sdlog)/(1-plnorm(low,meanlog,sdlog))* (x >= low)
where low is the truncation point.
My data is the following vector
Data <- c(1068295.00589834, 1406446.49289834, 1540330.78489834, 1152321.94489834,
3108649.66189834, 3718417.97089834, 2981945.18089834, 4552923.31989834,
5747260.98289834, 2105461.57989834, 1044515.95889834, 1133641.75289834,
3847920.72789834, 2536441.02989834, 3073854.15789834, 1591039.28389834,
2592446.73289834, 4989152.55189834, 2426457.45489834, 120265066.499898,
6888222046.1999, 1092811.87089834, 3440123.51689834, 74684298.1398983,
1475038.27689834, 1124226.39489834, 11739544.5798983, 1187688.74489834,
1023193.88789834, 18784663.9698983)
To fit the distribution, I write:
fitdist(Data,distr="tlnorm",method="mle",start = list(meanlog=0,sdlog=0),fix.arg = list(low=100))
But the following error appears:
Error in fitdist(Data, distr = "tlnorm", method = "mle", start = list(meanlog = 0, :
the function mle failed to estimate the parameters,
with the error code 100
I do not know what is happening. Can somebody help me? Thank you!

R non-linear model fitting using fitModel function

I want to fit a non-linear model to a real data.
The real data consists of 2 known numerical vectors ; thickness as 'x' and fh as 'y'
thickness=seq(0.15,2.00,by=0.05)
fh = c(5.17641, 4.20461, 3.31091, 2.60899, 2.23541, 1.97771, 1.88141, 1.62821, 1.50138, 1.51075, 1.40850, 1.26222, 1.09432, 1.13202, 1.12918, 1.10355, 1.11867, 1.09740,1.08324, 1.05687, 1.19422, 1.22984, 1.34516, 1.19713,1.25398 ,1.29885, 1.33658, 1.31166, 1.40332, 1.39550,1.37855, 1.41491, 1.59549, 1.56027, 1.63925, 1.72440, 1.74192, 1.82049)
plot(thickness,fh)
This is apparently non-linear. So, I am trying to fit this model as a non-linear function of
y= x*2/3+(2+2*a)/(3*x)
Variable a is an unknown constant and I am trying to find the best constant a that minimizes the sum of square of error between the regression line and the real data.
I first used a function fitModel that I found on a YouTube video, Fitting Functions to Data in R.
library(TIMP)
f=fitModel(fh~thickness^2/3+(2+2*A)/(3*thickness)) #it finds the coefficient 'A'
coef(f) # to represent just the coefficient
However, there's an error
Error in modelspec[[datasetind[i]]] : subscript out of bounds
So, as an alternative, want to find a plot of 'a' and 'the Sum of Squares of Error'. This time, I have such a hard time finding 'a' and plotting this graph. By manual work, I figured out the value 'a' is somewhere near 0.2 but this is not a precise value.
It would be helpful if someone could manifest either:
Why the fitModel function didn't work or
How to find the value a and plot the graph.
You could try this instead:
yf = function(a,xv) xv*(2/3)+(2+2*a)/(3*xv)
yf(2,thickness)
f <- function (a,y, xv) sum((y - yf(a,xv))^2)
f(2,fh,thickness)
xmin <- optimize(f, c(0, 10), tol = 0.0001, y=fh,xv=thickness)
xmin
plot(thickness,fh)
lines(thickness,yf(xmin$minimum,thickness),col=3)

Power law fitted by `fitdistr()` function in package `fitdistrplus`

I generate some random variables using rplcon() function in package poweRlaw
data <- rplcon(1000,10,2)
Now, I want to know which known distributions fit the data best. Lognorm? exp? gamma? power law? power law with exponential cutoff?
So I use function fitdist() in package fitdistrplus:
fit.lnormdl <- fitdist(data,"lnorm")
fit.gammadl <- fitdist(data, "gamma", lower = c(0, 0))
fit.expdl <- fitdist(data,"exp")
Due to the power law distribution and power law with exponential cutoff are not the base probability function according to CRAN Task View: Probability Distributions, so I write the d,p,q function of power law based on the example 4 of ?fitdist
dplcon <- function (x, xmin, alpha, log = FALSE)
{
if (log) {
pdf = log(alpha - 1) - log(xmin) - alpha * (log(x/xmin))
pdf[x < xmin] = -Inf
}
else {
pdf = (alpha - 1)/xmin * (x/xmin)^(-alpha)
pdf[x < xmin] = 0
}
pdf
}
pplcon <- function (q, xmin, alpha, lower.tail = TRUE)
{
cdf = 1 - (q/xmin)^(-alpha + 1)
if (!lower.tail)
cdf = 1 - cdf
cdf[q < round(xmin)] = 0
cdf
}
qplcon <- function(p,xmin,alpha) alpha*p^(1/(1-xmin))
Finally, I use codes below to get parameter xmin and alpha of power law:
fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=1))
But it throws an error:
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, ddistnam = ddistname, hessian = TRUE, method = meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdist(data, "plcon", start = list(xmin = 1, alpha = 1)) :
the function mle failed to estimate the parameters,
with the error code 100
I try to search in google and stackoverflow, and so many similar error questions appear, but after reading and trying, no solutions work in my issues, what should I do to complete it correctly to get the parameters?
Thank you for everyone who does me a favor!
This was an interesting one that I am not entirely happy with the discovery but I will tell you what I have found and see if it helps.
On calling the fitdist function, by default it wants to use mledist from the same package. This itself results in a call to stats::optim which is a general optimization function. In it's return value it gives a convergence error code, see ?optim for details. The 100 you see is not one of the ones returned by optim. So I pulled apart the code for mledist and fitdist to find where that error code comes from. Unfortunately it is defined in more than one case and is a general trap error code. If you break down all of the code, what fitdist is trying to do here is the following, subject to various checks etc beforehand.
fnobj <- function(par, fix.arg, obs, ddistnam) {
-sum(do.call(ddistnam, c(list(obs), as.list(par),
as.list(fix.arg), log = TRUE)))
}
vstart = list(xmin=5,alpha=5)
fnobj <- function(par, fix.arg obs, ddistnam) {
-sum(do.call(ddistnam, c(list(obs), as.list(par),
as.list(fix.arg), log = TRUE)))
}
ddistname=dplcon
fix.arg = NULL
meth = "Nelder-Mead"
lower = -Inf
upper = Inf
optim(par = vstart, fn = fnobj,
fix.arg = fix.arg, obs = data, ddistnam = ddistname,
hessian = TRUE, method = meth, lower = lower,
upper = upper)
If we run this code we find a more useful error "function cannot be evaluated at initial parameters". Which makes sense if we look at the function definition. Having xmin=0 or alpha=1 will yield a log-likelihood of -Inf. OK so think try different initial values, I tried a few random choices but all returned a new error, "non-finite finite-difference value 1".
Searching the optim source further for the source of these two errors they are not part of the R source itself, there is however a .External2 call so I can only assume the errors come from there. The non-finite error implies that one of the function evaluations somewhere gives a non numeric result. The function dplcon will do so when alpha <= 1 or xmin <= 0. fitdist lets you specify additional arguments that get passed to mledist or other (depending on what method you choose, mle is default) of which lower is one for controlling lower bounds on the parameters to be optimized. So I tried imposing these limits and trying again:
fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=2), lower = c(xmin = 0, alpha = 1))
Annoyingly this still gives an error code 100. Tracking this down yields the error "L-BFGS-B needs finite values of 'fn'". The optimization method has changed from the default Nelder-Mead as you specifying the boundary and somewhere on the external C code call this error arises, presumably close to the limits of either xmin or alpha where the stability of the numerical calculation as we approach infinity is important.
I decided to do quantile matching rather than max likelihood to try to find out more
fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=2),
method= "qme",probs = c(1/3,2/3))
fitpl
## Fitting of the distribution ' plcon ' by matching quantiles
## Parameters:
## estimate
## xmin 0.02135157
## alpha 46.65914353
which suggests that the optimum value of xmin is close to 0, it's limits. The reason I am not satisfied is that I can't get a maximum-likelihood fit of the distribution using fitdist however hopefully this explanation helps and the quantile matching gives an alternative.
Edit:
After learning a little more about power law distributions in general it makes sense that this does not work as you expect. The parameter power parameter has a likelihood function which can be maximised conditional on a given xmin. However no such expression exists for xmin since the likelihood function is increasing in xmin. Typically estimation of xmin comes from a Kolmogorov--Smirnov statistic, see this mathoverflow question and the d_jss_paper vignette of the poweRlaw package for more info and associated references.
There is functionality to estimate the parameters of the power law distribution in the poweRlaw package itself.
m = conpl$new(data)
xminhat = estimate_xmin(m)$xmin
m$setXmin(xminhat)
alphahat = estimate_pars(m)$pars
c(xmin = xminhat, alpha = alphahat)

Maximum Likelihood Estimation by hand for normal distribution in R

I am a newbie in R and searched in several forums but didn't got an answer so far. We are asked to do a maximum likelihood estimation in R for an AR(1) model without using the arima() command. We should estimate the intercept alpha, the coefficient beta and the variance sigma2. The data should be following a normal distribution, where I derived the log-likelihood function from. I was then trying to program the function with the following code:
Y <- data$V2
nlogL <- function(theta,Y){
alpha <- theta[1]
rho <- theta[2]
sigma2 <- theta[3]
logl <- -(100/2)*log(2*pi) - (100/2)*log(theta[3]) - (0.5*theta[3])*sum(Y-(theta[1]/(1-theta[2]))**2)
return(-logl)
}
par0 <- c(0.1,0.1,0.1)
opt <- optim(par0, nlogL, hessian = TRUE)
When running this code I always get the error message: Error in Y - (theta[1]/(1 - theta[2]))^2 : 'Y' is missing.
It would be great if you could have a look whether the likelihood function is derived correctly.
Thank you very much in advance for your help!
Your nlogL function should only take a single argument, theta. So you can fix your immediate problem simply by removing the 2nd argument to the function, and the Y variable would be resolved by its definition outside of nlogL. Alternatively, you could keep the signature of nlogL as-is and pass Y as an additional argument through optim like this: optim(par0, nlogL, hessian = TRUE, Y=Y). Also I would second chinsoon12's suggestion to review ?optim.

Model fitting with nls.lm in R, "Error: unused argument"

I'm trying to use the nls.lm function in the minpack.lm to fit a non-linear model to some data from a psychophysics experiment.
I've had a search around and can't find a lot of information about the package so have essentially copied the format of the example given on the nls.lm help page. Unfortunately my script is still failing to run and R is throwing out this error:
Error in fn(par, ...) :
unused argument (observed = c(0.1429, 0.2857, 0.375, 0.3846, 0.4667, 0.6154))
It appears that the script thinks the data I want to fit the model to is irrelevant, which is definitely wrong.
I'm expecting it to fit the model and produce a value of 0.5403 for the spare parameter (w).
Any help is greatly appreciated.
I'm making the transfer from Matlab over to R so apologies if my code looks sloppy.
Here's the script.
install.packages("pracma")
require(pracma)
install.packages("minpack.lm")
require(minpack.lm)
# Residual function, uses parameter w (e.g. .23) to predict accuracy error at a given ratio [e.g. 2:1]
residFun=function(w,n) .5 * erfc( abs(n[,1]-n[,2])/ ((sqrt(2)*w) * sqrt( (n[,1]^2) + (n[,2]^2) ) ) )
# example for residFun
# calculates an error rate of 2.59%
a=matrix(c(2,1),1,byrow=TRUE)
residFun(.23,a)
# Initial guess for parameter to be fitted (w)
parStart=list(w=0.2)
# Recorded accuracies in matrix, 1- gives errors to input into residFun
# i.e. the y-values I want to fit the model
Acc=1-(matrix(c(0.8571,0.7143,0.6250,0.6154,0.5333,0.3846),ncol=6))
# Ratios (converted to proportions) used in testing
# i.e. the points along the x-axis to fit the above data to
Ratios=matrix(c(0.3,0.7,0.4,0.6,0.42,0.58,0.45,0.55,0.47,0.53,0.49,0.51),nrow=6,byrow=TRUE)
# non-linear model fitting, attempting to calculate the value of w using the Levenberg-Marquardt nonlinear least-squares algorithm
output=nls.lm(par=parStart,fn=residFun,observed=Acc,n=Ratios)
# Error message shown after running
# Error in fn(par, ...) :
# unused argument (observed = c(0.1429, 0.2857, 0.375, 0.3846, 0.4667, 0.6154))
The error means you passed a function an argument that it did not expect. ?nls.lm has no argument observed, so it is passed to the function passed to fn, in your case, residFun. However, residFun doesn't expect this argument either, hence the error. You need to redefine this function like this :
# Residual function, uses parameter w (e.g. .23) to predict accuracy error at a given ratio [e.g. 2:1]
residFun=function(par,observed, n) {
w <- par$w
r <- observed - (.5 * erfc( abs(n[,1]-n[,2])/ ((sqrt(2)*w) * sqrt( (n[,1]^2) + (n[,2]^2) ) ) ))
return(r)
}
It gives the following result :
> output = nls.lm(par=parStart,fn=residFun,observed=Acc,n=Ratios)
> output
Nonlinear regression via the Levenberg-Marquardt algorithm
parameter estimates: 0.540285874836135
residual sum-of-squares: 0.02166
reason terminated: Relative error in the sum of squares is at most `ftol'.
Why that happened :
It seems that you were inspired by this example in he documentation :
## residual function
residFun <- function(p, observed, xx) observed - getPred(p,xx)
## starting values for parameters
parStart <- list(a=3,b=-.001, c=1)
## perform fit
nls.out <- nls.lm(par=parStart, fn = residFun, observed = simDNoisy,
xx = x, control = nls.lm.control(nprint=1))
Note that observed is an argument of residFun here.

Resources