Related
I first want to sample 100 gamma distributed numbers where shape = 2 and scale = 1/2. I wrote down the log-likelyhood function and negated it since I'm using a minimization tool to maximize. I also tried using optim but to no avail. both optim and nlm gave me different answers. This is my code thus far:
N = 100
shape = 2
scale = 1/2
Data <- rgamma(SampSize, shape, scale)
LogL = function (x){
k = x[1]
gamma = x[2]
(-1)*(N*x[1]*log(x[2])+(x[1]-1)*sum(log(Data))-x[2]*sum(Data))
}
nlm(LogL,c(1.5,1))
logL <- function (x) -sum(dgamma(Data, x[1], x[2], log = TRUE))
N = 100
shape = 2
scale = 1/2
Data <- rgamma(N, shape, scale)
optim(c(1.5, 1), logL)$par
nlm(logL, c(1.5, 1))$estimate
nll <- function(lambda, kappa){
logit=function(x) {log(x/(1-x))}
a=c(1-exp(-(15/lambda)^kappa), 1-exp(-(25/lambda)^kappa), 1-exp(-(35/lambda)^kappa))
a=logit(a)
mu = c(0.1, 0.2, 0.3)
mu = logit(mu)
cov = matrix(c(0.18830690, 0.00235681, 0.00071954, 0.00235681, 0.00736811, 0.00110457, 0.00071954, 0.00110457, 0.00423955), nrow =3)
L1 = dmvnorm(a, mu, cov)
a=c(1-exp(-(25/lambda)^kappa), 1-exp(-(35/lambda)^kappa), 1-exp(-(45/lambda)^kappa))
a=logit(a)
mu = c(0.4, 0.1, 0.9)
mu = logit(mu)
cov = matrix(c(2.7595442, 0.0045178, 0.0010505, 0.0045178, 0.00972309, 0.0015120, 0.0010505, 0.0015120, 0.0088425), nrow =3)
L2 = dmvnorm(a, mu, cov)
-sum(log(L1*L2))
}
> mle(nll, start = list(lambda = 1, kappa = 1))
Error in optim(start, f, method = method, hessian = TRUE, ...) :
initial value in 'vmmin' is not finite
I'm trying to find the lambda and kappa values that maximize the function above.
My original likelihood function returns L1*L2, but because the mle function requires the negative log-likelihood to be passed in, I modified the function to return -sum(log(L1*L2)) instead.
However, I ran into the above error. And I've also tried specifying dmvnorm(... ,log = TRUE) but that didn't solve the problem.
L1 and L2 are both scalars. Assuming we're going to pass log=TRUE to dmvnorm so they are each log-likelihoods, do you mean just -(L1+L2) in the final line?
by specifying debug(nll) and nll(lambda=1,kappa=1) , then stepping through the code, waiting til we find an infinite value, and then backtracking, we see that 1-exp(-(45/lambda)^kappa) is exactly 1 (exp(-45) is less than 1e-16, the smallest value for which 1+x is > 1, so that the final element of logit(a) is infinite ...
So if I make dmvnorm(...,log=TRUE) in both places, change the last line to return(-(L1+L2)), and change the initial value of lambda to 10, I get a finite value for nll(10,1) (4474), and stats4::mle(nll,start=list(lambda=10,kappa=1)) gives:
Coefficients:
lambda kappa
40.622673 4.883857
I want to use the mle function to get estimates of a and b in a Unif(a,b) distribution. But I get absurd estimates nowhere close to 1 and 3.
library(stats4)
set.seed(20161208)
N <- 100
c <- runif(N, 1, 3)
LL <- function(min, max) {
R <- runif(100, min, max)
suppressWarnings((-sum(log(R))))
}
mle(minuslogl = LL, start = list(min = 1, max = 3), method = "BFGS",
lower = c(-Inf, 0), upper = c(Inf, Inf))
I got:
Call:
mle(minuslogl = LL, start = list(min = 1, max = 3), method = "BFGS")
Coefficients:
min max
150.8114 503.6586
Any ideas of what's going on? Thank you in advance!
I would first point out where your code is wrong.
You need dunif not runif. You may define:
LL <- function (a, b) -sum(dunif(x, a, b, log.p = TRUE))
In my code below I did not use dunif, as the density is just 1 / (b - a) so I wrote it directly.
You are generating samples inside objective function. For U[a,b] this is OK as its density is free of x. But for other distributions the objective function changes at each iteration.
With box constraints, you need method = "L-BFGS-B", not the ordinary "BFGS". And you are not using the right constraints.
Now in more depth...
For a length-n sample vector x from U[a, b], the likelihood is (b - a) ^ (-n), and negative-log-likelihood is n * log(b - a). Obviously the MLE are a = min(x) and b = max(x).
Numerical optimization is completely unnecessary, and is in fact impossible without constraints. Look at the gradient vector:
( n / (a - b), n / (b - a) )
The partial derivative w.r.t. a / b is always negative / positive and can't be 0.
Numerical approach becomes feasible when we impose box constraints: -Inf < a <= min(x) and max(x) <= b < Inf. We know for sure that iteration terminates at the boundary.
My code below uses both optim and mle. Note mle will fail, when it inverts Hessian matrix, as it is singular:
-(b - a) ^ 2 (b - a) ^ 2
(b - a) ^ 2 -(b - a) ^ 2
Code:
## 100 samples
set.seed(20161208); x <- runif(100, 1, 3)
# range(x)
# [1] 1.026776 2.984544
## using `optim`
nll <- function (par) log(par[2] - par[1]) ## objective function
gr_nll <- function (par) c(-1, 1) / diff(par) ## gradient function
optim(par = c(0,4), fn = nll, gr = gr_nll, method = "L-BFGS-B",
lower = c(-Inf, max(x)), upper = c(min(x), Inf), hessian = TRUE)
#$par
#[1] 1.026776 2.984544 ## <- reaches boundary!
#
# ...
#
#$hessian ## <- indeed singular!!
# [,1] [,2]
#[1,] -0.2609022 0.2609022
#[2,] 0.2609022 -0.2609022
## using `stats4::mle`
library(stats4)
nll. <- function (a, b) log(b - a)
mle(minuslogl = nll., start = list(a = 0, b = 4), method = "L-BFGS-B",
lower = c(-Inf, max(x)), upper = c(min(x), Inf))
#Error in solve.default(oout$hessian) :
# Lapack routine dgesv: system is exactly singular: U[2,2] = 0
I try to estimate the three parameters a, b0 and b1 with the optim() function. But I always get the error:
Error in optim(par = c(1, 1, 1), fn = logweibull, method = "L-BFGS-B", :
L-BFGS-B needs finite values of 'fn'
t<-c(6,6,6,6,7,9,10,10,11,13,16,17,19,20,22,23,25,32,32,34,35,1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17,22,23)
d<-c(0,1,1,1,1,0,0,1,0,1,1,0,0,0,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
X<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
logweibull <- function (a,b0,b1) {a <- v[1];b0 <- v[2]; b1 <- v[3];
sum (d*log(t^a*exp(b0+X*b1)-t^a*exp(b0+X*b1))) + sum (d + log((a*t^(a-1))/t^a)) }
v<-c(1,1,1)
optim( par=c(1,1,1) ,fn = logweibull, method = "L-BFGS-B",lower = c(0.1, 0.1,0.1), upper = c(100, 100,100),control = list(fnscale = -1) )
Can you help me? Do you know what I did wrong?
You may also consider
(1) passing the additional data variables to the objective function along with the parameters you want to estimate.
(2) passing the gradient function (added the gradient function)
(3) the original objective function can be further simplified (as below)
logweibull <- function (v,t,d,X) {
a <- v[1]
b0 <- v[2]
b1 <- v[3]
sum(d*(1+a*log(t)+b0+X*b1) - t^a*exp(b0+X*b1) + log(a/t)) # simplified function
}
grad.logweibull <- function (v,t,d,X) {
a <- v[1]
b0 <- v[2]
b1 <- v[3]
c(sum(d*log(t) - t^a*log(t)*exp(b0+X*b1) + 1/a),
sum(d-t^a*exp(b0+X*b1)),
sum(d*X - t^a*X*exp(b0+X*b1)))
}
optim(par=c(1,1,1), fn = logweibull, gr = grad.logweibull,
method = "L-BFGS-B",
lower = c(0.1, 0.1,0.1),
upper = c(100, 100,100),
control = list(fnscale = -1),
t=t, d=d, X=X)
with output
$par
[1] 0.2604334 0.1000000 0.1000000
$value
[1] -191.5938
$counts
function gradient
10 10
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Also, below is a comparison between the convergence of with and without gradient function (with finite difference). With an explicit gradient function it takes 9 iterations to converge to the solution, whereas without it (with finite difference), it takes 126 iterations to converge.
I have a simple likelihood function (from a normal dist with mean=0) that I want to maximize. optim keeps giving me this error:
Error in optim(par = phi, fn = loglike, estimates = estimates, NULL, hessian = TRUE, : non-finite finite-difference value [1]
Here is my data and likelihood function:
y = [ -0.01472 0.03942 0.03592 0.02776 -0.00090 ]
C = a varcov matrix:
1.66e-03 -0.000120 -6.780e-06 0.000102 -4.000e-05
-1.20e-04 0.001387 7.900e-05 -0.000140 -8.000e-05
-6.78e-06 0.000079 1.416e-03 -0.000070 8.761e-06
1.02e-04 -0.000140 -7.000e-05 0.001339 -6.000e-05
-4.00e-05 -0.000080 8.761e-06 -0.000060 1.291e-03
my log likelihood function is:
lglkl = -.5*(log(det(v)) + (t(y)%%vi%%y))` where v = phi*I + C and vi=inverse(v) and I= 5*5 Identity matrix.
I am trying to get the mle estimate for "phi". I thought this would be a simple optimization problem but am struggling. Would really appreciate any help. Thanks in advance. My code is below:
loglike <- function(phi,y) {
v = phi*I + C
vi = solve(v)
loglike = -.5*(log(det(v)) + (t(y)%*%vi%*%y))
return(-loglike)
}
phi = 0
parm <- optim(par=phi,fn=loglike,y=y,NULL,hessian = TRUE, method="L-BFGS-B",lower=0,upper=1000)
The error you ran into is because ϕ becomes negative beyond a certain number of iterations (which indicates that the constraints are not being applied correctly by the algorithm). Also, the solution does not converge to a single value but jumps between a few small values before reaching a situation where the updated covariance matrix is no-longer positive definite. At that stage you get det(v) < 0 and log[det(v)] is undefined. The optim algorithm bails out at that stage.
To see what's happening, play with the maxit and ndeps parameters in the code below.
require("matrixcalc")
#-------------------------------------------------
# Log-likelihood function
#-------------------------------------------------
loglike <- function(phi, y) {
# Shift the covariance matrix
print(paste("phi = ", phi))
#v = phi*I + (1 - phi)*C
v = phi*I + C
stopifnot(is.positive.definite(v))
# Invert shifted matrix
vi = solve(v)
# Compute log likelihood
loglike = -.5*(log(det(v)) + (t(y) %*% vi %*% y))
print(paste("L = ", loglike))
return(-loglike)
}
#-------------------------------------------------
# Data
#-------------------------------------------------
y = c(-0.01472, 0.03942, 0.03592, 0.02776, -9e-04)
C = structure(c(0.00166, -0.00012, -6.78e-06, 0.000102, -4e-05, -0.00012,
0.001387, 7.9e-05, -0.00014, -8e-05, -6.78e-06, 7.9e-05,
0.001416, -7e-05, 8.761e-06, 0.000102, -0.00014, -7e-05,
0.001339, -6e-05, -4e-05, -8e-05, 8.761e-06, -6e-05, 0.001291),
.Dim = c(5L, 5L ))
#--------
# Initial parameter
#--------
I = diag(5)
phi = 50
#--------
# Minimize
#--------
parm <- optim(par = phi, fn = loglike, y = y, NULL, hessian = TRUE,
method = "L-BFGS-B", lower = 0.0001, upper = 1000,
control = list(trace = 3,
maxit = 1000,
ndeps = 1e-4) )