Error when trying to find MLE for a Geometric Distribution - r

So I'm given a task in R that says to find MLE for a random value X that has a geometric distribution using the values given in the vector:
> x<-c(4, 2, 1, 1, 4, 8, 3, 1, 2, 3, 8, 6, 2, 2, 3)
Here is the function for the MLE:
> log.L<-function(p=0.3){
+ n<-length(x)
+ (sum(x)-n)*log(1-p)+n*log(p)
+ }
And here is the call:
> library(stats4)
> fir<-mle(log.L)
However when I make the call I get the following error:
Error in optim(start, f, method = method, hessian = TRUE, ...) :
non-finite finite-difference value [1]
Any ideas?

The issue is that the first argument is
minuslogl - Function to calculate negative log-likelihood.
Hence, instead we need
log.L <- function(p = 0.3) {
n <- length(x)
-((sum(x) - n) * log(1 - p) + n * log(p))
}
mle(log.L)
#
# Call:
# mle(minuslogl = log.L)
#
# Coefficients:
# p
# 0.3

Related

FME package: "Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix" in fitting using modFit()

I'm trying to fit the differential equation using the least squares method (FME package).
However, I keep getting this error that I don't know how to tackle.
The reproducible example:
times = seq(0, 4, by = 0.5)
dat = data.frame(time = seq(1,4),
Tick = c(128, 52.5, 28, 121))
N = 10
tick.model <- function(time, y, params, ...) { #here we begin a function with three arguments
with(as.list(c(y, params)),{
dTick <- (30 - s.t*Tick)*Tick*0.3*N - delta.t*Tick
return(list(c(dTick)))
})
}
y = c(Tick = 82.375)
cost1 <- function(p) {
out <- ode(y, times, tick.model, p)
modCost(out, dat, weight = "none")
}
params <- c(s.t=0.1, delta.t = 1)
fit = modFit(f = cost1, p = params, lower = rep(0,2),
upper = c(10, 5))
summary(fit)
The result comes out like this:
Parameters:
Estimate Std. Error t value Pr(>|t|)
s.t 0.3641876 NA NA NA
delta.t 0.0001417 NA NA NA
Residual standard error: 60.92 on 2 degrees of freedom
Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix
In addition: Warning message:
In summary.modFit(fit) : Cannot estimate covariance; system is singular
Also, the fitted model doesn't look nice
.
I have no idea what I could have done wrong.

Plotting incomplete elliptic integral of 1st kind

I wanted to set a small dataframe in order to plot myself some points of the incomplete elliptic integral of 1st kind for different values of amplitude phi and modulus k. The function to integrate is 1/sqrt(1 - (k*sin(x))^2) between 0 and phi.Here is the code I imagined:
v.phi <- seq(0, 2*pi, 1)
n.phi <- length(v.phi)
v.k <- seq(-1, +1, 0.5)
n.k <- length(v.k)
k <- rep(v.k, each = n.phi, times = 1)
phi <- rep(v.phi, each = 1, times = n.k)
df <- data.frame(k, phi)
func <- function(x, k) 1/sqrt(1 - (k*sin(x))^2)
df$area <- integrate(func,lower=0, upper=df$phi, k=df$k)
But this generates errors and I am obviously mistaking in constructing the new variable df$area... Could someone put me in the right way?
You can use mapply:
df$area <- mapply(function(phi,k){
integrate(func, lower=0, upper=phi, k=k)$value
}, df$phi, df$k)
However that generates an error because there are some values of k equal to 1 or -1, while the allowed values are -1 < k < 1. You can't evaluate this integral for k = +/- 1.
Note that there's a better way to evaluate this integral: the incomplete elliptic function of the first kind is implemented in the gsl package:
> integrate(func, lower=0, upper=6, k=0.5)$value
[1] 6.458877
> gsl::ellint_F(6, 0.5)
[1] 6.458877
As I said, this function is not defined for k=-1 or k=1:
> gsl::ellint_F(6, 1)
[1] NaN
> gsl::ellint_F(6, -1)
[1] NaN
> integrate(func, lower=0, upper=6, k=1)
Error in integrate(func, lower = 0, upper = 6, k = 1) :
non-finite function value

Maximum likelihood estimation with uniform distribution in R leads to absurd result

I want to use the mle function to get estimates of a and b in a Unif(a,b) distribution. But I get absurd estimates nowhere close to 1 and 3.
library(stats4)
set.seed(20161208)
N <- 100
c <- runif(N, 1, 3)
LL <- function(min, max) {
R <- runif(100, min, max)
suppressWarnings((-sum(log(R))))
}
mle(minuslogl = LL, start = list(min = 1, max = 3), method = "BFGS",
lower = c(-Inf, 0), upper = c(Inf, Inf))
I got:
Call:
mle(minuslogl = LL, start = list(min = 1, max = 3), method = "BFGS")
Coefficients:
min max
150.8114 503.6586
Any ideas of what's going on? Thank you in advance!
I would first point out where your code is wrong.
You need dunif not runif. You may define:
LL <- function (a, b) -sum(dunif(x, a, b, log.p = TRUE))
In my code below I did not use dunif, as the density is just 1 / (b - a) so I wrote it directly.
You are generating samples inside objective function. For U[a,b] this is OK as its density is free of x. But for other distributions the objective function changes at each iteration.
With box constraints, you need method = "L-BFGS-B", not the ordinary "BFGS". And you are not using the right constraints.
Now in more depth...
For a length-n sample vector x from U[a, b], the likelihood is (b - a) ^ (-n), and negative-log-likelihood is n * log(b - a). Obviously the MLE are a = min(x) and b = max(x).
Numerical optimization is completely unnecessary, and is in fact impossible without constraints. Look at the gradient vector:
( n / (a - b), n / (b - a) )
The partial derivative w.r.t. a / b is always negative / positive and can't be 0.
Numerical approach becomes feasible when we impose box constraints: -Inf < a <= min(x) and max(x) <= b < Inf. We know for sure that iteration terminates at the boundary.
My code below uses both optim and mle. Note mle will fail, when it inverts Hessian matrix, as it is singular:
-(b - a) ^ 2 (b - a) ^ 2
(b - a) ^ 2 -(b - a) ^ 2
Code:
## 100 samples
set.seed(20161208); x <- runif(100, 1, 3)
# range(x)
# [1] 1.026776 2.984544
## using `optim`
nll <- function (par) log(par[2] - par[1]) ## objective function
gr_nll <- function (par) c(-1, 1) / diff(par) ## gradient function
optim(par = c(0,4), fn = nll, gr = gr_nll, method = "L-BFGS-B",
lower = c(-Inf, max(x)), upper = c(min(x), Inf), hessian = TRUE)
#$par
#[1] 1.026776 2.984544 ## <- reaches boundary!
#
# ...
#
#$hessian ## <- indeed singular!!
# [,1] [,2]
#[1,] -0.2609022 0.2609022
#[2,] 0.2609022 -0.2609022
## using `stats4::mle`
library(stats4)
nll. <- function (a, b) log(b - a)
mle(minuslogl = nll., start = list(a = 0, b = 4), method = "L-BFGS-B",
lower = c(-Inf, max(x)), upper = c(min(x), Inf))
#Error in solve.default(oout$hessian) :
# Lapack routine dgesv: system is exactly singular: U[2,2] = 0

Maximum likelihood in R

I am new both to R and statistics. I am playing with maximum likelihood estimation, and I am getting some incorrect results. I want to model x with a simple linear function:
x<-apply(matrix(seq(1,10,1), nrow=1), 1, function(x) 10*x+runif(10,-3,3))
LL<-function(a,b){
R=apply(x,1,function(y) a*y+b)
-sum(log(R))
}
mle(LL, start=list(a=10, b=0))
I am getting the following result:
Coefficients:
a b
43571.957 1338.345
instead of a~10, b~0.
I modified the code according to the suggestions of Spacedman:
set.seed(99)
x<-apply(matrix(seq(1,10,1), nrow=1), 1, function(x) 10*x+runif(10,-3,3))
LL<-function(a,b){
R = x[,1] - a*(1:10) + b
-sum(R^2)
}
library(stats4)
mle(LL, start=list(a=11, b=0.3))
Error in solve.default(oout$hessian) :
Lapack routine dgesv: system is exactly singular: U[1,1] = 0
I do not know how to get rid of this error. Changing the sees and generating the x values again does not help.
There are a couple of things to notice here. To clarify we start by changing the distribution of the error-term from a uniform distribution runif(x, -3, 3) to the std. normal distribution: rnorm(x). We can now easily simulate your data, then set up your (minus) loglikelihood and maximize (minizime) by:
a <- 10
b <- 0
set.seed(99)
x <- apply(matrix(seq(1, 10, 1), nrow=1), 1, function(x) b + a * x + rnorm(10))
minuslogL <- function(a, b) -sum(dnorm(x[, 1] - (b + a * 1:10), log = TRUE))
library(stats4)
mle(minuslogL, start = list(a = 11, b = 0.3))
Call:
mle(minuslogl = minuslogL, start = list(a = 11, b = 0.3))
Coefficients:
a b
9.8732793 0.5922192
Notice that this works well, since the likelihood is smooth and mle() uses "BFGS" for the optimization, eg. a quasi-Newton, gradient approach. Lets try the same with uniform errors:
set.seed(99)
x <- apply(matrix(seq(1, 10, 1), nrow=1), 1, function(x) b + a * x + runif(10, -3, 3))
minuslogL2 <- function(a,b) -sum(dunif(x[, 1] -(a * 1:10 + b), -3, 3, log = TRUE))
mle(minuslogL2, start = list(a = 11, b = 0.3))
Error in optim(start, f, method = method, hessian = TRUE, ...) :
initial value in 'vmmin' is not finite
This fails! Why? Since the uniform-errors restrict the parameter space, you will not get a smooth likelihood. If you move your parameters a,b too far away from the true values, you will get Inf. If you move close enough, you will get the same likelihood (eg. many possible min. values):
> minuslogL2(11, 0.3)
[1] Inf
> minuslogL2(10, 0)
[1] 17.91759
> minuslogL2(10.02, 0.06)
[1] 17.91759
Maximizing this likelihood compares to finding the set: {a,b}: -logL(a, b) == -logL(10, 0), which can be found by a plain search algorithm.

Fitting a 3 parameter Weibull distribution

I have been doing some data analysis in R and I am trying to figure out how to fit my data to a 3 parameter Weibull distribution. I found how to do it with a 2 parameter Weibull but have come up short in finding how to do it with a 3 parameter.
Here is how I fit the data using the fitdistr function from the MASS package:
y <- fitdistr(x[[6]], 'weibull')
x[[6]] is a subset of my data and y is where I am storing the result of the fitting.
First, you might want to look at FAdist package. However, that is not so hard to go from rweibull3 to rweibull:
> rweibull3
function (n, shape, scale = 1, thres = 0)
thres + rweibull(n, shape, scale)
<environment: namespace:FAdist>
and similarly from dweibull3 to dweibull
> dweibull3
function (x, shape, scale = 1, thres = 0, log = FALSE)
dweibull(x - thres, shape, scale, log)
<environment: namespace:FAdist>
so we have this
> x <- rweibull3(200, shape = 3, scale = 1, thres = 100)
> fitdistr(x, function(x, shape, scale, thres)
dweibull(x-thres, shape, scale), list(shape = 0.1, scale = 1, thres = 0))
shape scale thres
2.42498383 0.85074556 100.12372297
( 0.26380861) ( 0.07235804) ( 0.06020083)
Edit: As mentioned in the comment, there appears various warnings when trying to fit the distribution in this way
Error in optim(x = c(60.7075705026659, 60.6300379017397, 60.7669410153573, :
non-finite finite-difference value [3]
There were 20 warnings (use warnings() to see them)
Error in optim(x = c(60.7075705026659, 60.6300379017397, 60.7669410153573, :
L-BFGS-B needs finite values of 'fn'
In dweibull(x, shape, scale, log) : NaNs produced
For me at first it was only NaNs produced, and that is not the first time when I see it so I thought that it isn't so meaningful since estimates were good. After some searching it seemed to be quite popular problem and I couldn't find neither cause nor solution. One alternative could be using stats4 package and mle() function, but it seemed to have some problems too. But I can offer you to use a modified version of code by danielmedic which I have checked a few times:
thres <- 60
x <- rweibull(200, 3, 1) + thres
EPS = sqrt(.Machine$double.eps) # "epsilon" for very small numbers
llik.weibull <- function(shape, scale, thres, x)
{
sum(dweibull(x - thres, shape, scale, log=T))
}
thetahat.weibull <- function(x)
{
if(any(x <= 0)) stop("x values must be positive")
toptim <- function(theta) -llik.weibull(theta[1], theta[2], theta[3], x)
mu = mean(log(x))
sigma2 = var(log(x))
shape.guess = 1.2 / sqrt(sigma2)
scale.guess = exp(mu + (0.572 / shape.guess))
thres.guess = 1
res = nlminb(c(shape.guess, scale.guess, thres.guess), toptim, lower=EPS)
c(shape=res$par[1], scale=res$par[2], thres=res$par[3])
}
thetahat.weibull(x)
shape scale thres
3.325556 1.021171 59.975470
An alternative: package "lmom". The estimative by L-moments technique
library(lmom)
thres <- 60
x <- rweibull(200, 3, 1) + thres
moments = samlmu(x, sort.data = TRUE)
log.moments <- samlmu( log(x), sort.data = TRUE )
weibull_3parml <- pelwei(moments)
weibull_3parml
zeta beta delta
59.993075 1.015128 3.246453
But I donĀ“t know how to do some Goodness-of-fit statistics in this package or in the solution above. Others packages you can do Goodness-of-fit statistics easily. Anyway, you can use alternatives like: ks.test or chisq.test

Resources