Using JAGS or STAN when an observed node is the max of latent nodes - r

I have the following latent variable model: Person j has two latent variables, Xj1 and Xj2. The only thing we get to observe is their maximum, Yj = max(Xj1, Xj2). The latent variables are bivariate normal; they each have mean mu, variance sigma2, and their correlation is rho. I want to estimate the three parameters (mu, sigma2, rho) using only Yj, with data from n patients, j = 1,...,n.
I've tried to fit this model in JAGS (so I'm putting priors on the parameters), but I can't get the code to compile. Here's the R code I'm using to call JAGS. First I generate the data (both latent and observed variables), given some true values of the parameters:
# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7
# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)
Then I define the JAGS model, and write a little function to run the JAGS sampler and return the samples:
# JAGS model code
model.text <- '
model {
for (i in 1:n) {
Y[i] <- max(X[i,1], X[i,2]) # Ack!
X[i,1:2] ~ dmnorm(X_mean, X_prec)
}
# mean vector and precision matrix for X[i,1:2]
X_mean <- c(mu, mu)
X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
X_prec[1,2] <- X_prec[2,1]
X_prec[2,2] <- X_prec[1,1]
mu ~ dnorm(0, 1)
sigma2 <- 1 / tau
tau ~ dgamma(2, 1)
rho ~ dbeta(2, 2)
}
'
# run JAGS code. If latent=FALSE, remove the line defining Y[i] from the JAGS model
fit.jags <- function(latent=TRUE, data, n.adapt=1000, n.burnin, n.samp) {
require(rjags)
if (!latent)
model.text <- sub('\n *Y.*?\n', '\n', model.text)
textCon <- textConnection(model.text)
fit <- jags.model(textCon, data, n.adapt=n.adapt)
close(textCon)
update(fit, n.iter=n.burnin)
coda.samples(fit, variable.names=c("mu","sigma2","rho"), n.iter=n.samp)[[1]]
}
Finally, I call JAGS, feeding it only the observed data:
samp1 <- fit.jags(latent=TRUE, data=list(n=n, Y=Y), n.burnin=1000, n.samp=2000)
Sadly this results in an error message: "Y[1] is a logical node and cannot be observed". JAGS does not like me using "<-" to assign a value to Y[i] (I denote the offending line with an "Ack!"). I understand the complaint, but I'm not sure how to rewrite the model code to fix this.
Also, to demonstrate that everything else (besides the "Ack!" line) is fine, I run the model again, but this time I feed it the X data, pretending that it's actually observed. This runs perfectly and I get good estimates of the parameters:
samp2 <- fit.jags(latent=FALSE, data=list(n=n, X=X), n.burnin=1000, n.samp=2000)
colMeans(samp2)
If you can find a way to program this model in STAN instead of JAGS, that would be fine with me.

Theoretically you can implement a model like this in JAGS using the dsum distribution (which in this case uses a bit of a hack as you are modelling the maximum and not the sum of the two variables). But the following code does compile and run (although it does not 'work' in any real sense - see later):
set.seed(2017-02-08)
# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7
# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)
model.text <- '
model {
for (i in 1:n) {
Y[i] ~ dsum(max_X[i])
max_X[i] <- max(X[i,1], X[i,2])
X[i,1:2] ~ dmnorm(X_mean, X_prec)
ranks[i,1:2] <- rank(X[i,1:2])
chosen[i] <- ranks[i,2]
}
# mean vector and precision matrix for X[i,1:2]
X_mean <- c(mu, mu)
X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
X_prec[1,2] <- X_prec[2,1]
X_prec[2,2] <- X_prec[1,1]
mu ~ dnorm(0, 1)
sigma2 <- 1 / tau
tau ~ dgamma(2, 1)
rho ~ dbeta(2, 2)
#data# n, Y
#monitor# mu, sigma2, rho, tau, chosen[1:10]
#inits# X
}
'
library('runjags')
results <- run.jags(model.text)
results
plot(results)
Two things to note:
JAGS isn't smart enough to initialise the matrix of X while satisfying the dsum(max(X[i,])) constraint on its own - so we have to initialise X for JAGS using sensible values. In this case I'm using the simulated values which is cheating - the answer you get is highly dependent on the choice of initial values for X, and in the real world you won't have the simulated values to fall back on.
The max() constraint causes problems to which I can't think of a solution within a general framework: unlike the usual dsum constraint that allows one parameter to decrease while the other increases and therefore both parameters are used at all times, the min() value of X[i,] is ignored and the sampler is therefore free to do as it pleases. This will very very rarely (i.e. never) lead to values of min(X[i,]) that happen to be identical to Y[i], which is the condition required for the sampler to 'switch' between the two X[i,]. So switching never happens, and the X[] that were chosen at initialisation to be the maxima stay as the maxima - I have added a trace parameter 'chosen' which illustrates this.
As far as I can see the other potential solutions to the 'how do I code this' question will fall into essentially the same non-mixing trap which I think is a fundamental problem here (although I might be wrong and would very much welcome working BUGS/JAGS/Stan code that illustrates otherwise).
Solutions to the failure to mix are harder, although something akin to the Carlin & Chibb method for model selection may work (force a min(pseudo_X) parameter to be equal to Y to encourage switching). This is likely to be tricky to get working, but if you can get help from someone with a reasonable amount of experience with BUGS/JAGS you could try it - see:
Carlin, B.P., Chib, S., 1995. Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B 57, 473–484.
Alternatively, you could try thinking about the problem slightly differently and model X directly as a matrix with the first column all missing and the second column all equal to Y. You could then use dinterval() to set a constraint on the missing values that they must be lower than the corresponding maximum. I'm not sure how well this would work in terms of estimating mu/sigma2/rho but it might be worth a try.
By the way, I realise that this doesn't necessarily answer your question but I think it is a useful example of the difference between 'is it codeable' and 'is it workable'.
Matt
ps. A much smarter solution would be to consider the distribution of the maximum of two normal variates directly - I am not sure if such a distribution exists, but it it does and you can get a PDF for it then the distribution could be coded directly using the zeros/ones trick without having to consider the value of the minimum at all.

I believe you can model this in the Stan language treating the likelihood as a two component mixture with equal weights. The Stan code could look like
data {
int<lower=1> N;
vector[N] Y;
}
parameters {
vector<upper=0>[2] diff[N];
real mu;
real<lower=0> sigma;
real<lower=-1,upper=1> rho;
}
model {
vector[2] case_1[N];
vector[2] case_2[N];
vector[2] mu_vec;
matrix[2,2] Sigma;
for (n in 1:N) {
case_1[n][1] = Y[n]; case_1[n][2] = Y[n] + diff[n][1];
case_2[n][2] = Y[n]; case_2[n][1] = Y[n] + diff[n][2];
}
mu_vec[1] = mu; mu_vec[2] = mu;
Sigma[1,1] = square(sigma);
Sigma[2,2] = Sigma[1,1];
Sigma[1,2] = Sigma[1,1] * rho;
Sigma[2,1] = Sigma[1,2];
// log-likelihood
target += log_mix(0.5, multi_normal_lpdf(case_1 | mu_vec, Sigma),
multi_normal_lpdf(case_2 | mu_vec, Sigma));
// insert priors on mu, sigma, and rho
}

Related

RJAGS - How to pass more complex functions in BUGS file

My goal is to basically migrate this code to R.
All the preprocessing wrt datasets has been already done, now however I am stuck in writing the "model" file. As a first attempt, and for the sake of clarity, I wrote the code which is shown below in R language.
What I want to do is to run an MCMC to have an estimate of the parameter R_t, given the daily reported data for Italian Country.
The main steps that have been pursued are:
Sample an array parameter, namely the log(R_t), from a Gaussian RW distribution
Gauss_RandomWalk <- function(N, x0, mu, variance) {
z <- cumsum(rnorm(n=N, mean=mu, sd=sqrt(variance)))
t <- 1:N
x <- (x0 + t*mu + z)
return(x)
}
log_R_t <- Gauss_RandomWalk(tot_dates, 0., 0., 0.035**2)
R_t_candidate <- exp(log_R_t)
Compute some quantities, that are function of this sampled parameters, namely the number of infections. This dependence is quite simple, since it is linear algebra:
infections <- rep(0. , tot_dates)
infections[1] <- exp(seed)
for (t in 2:tot_dates){
infections[t] <- sum(R_t_candidate * infections * gt_to_convolution[t-1,])
}
Convolve the array I have just computed with a delay distribution (onset+reporting delay), finally rescaling it by the exposure variable:
test_adjusted_positive <- convolve(infections, delay_distribution_df$density, type = "open")
test_adjusted_positive <- test_adjusted_positive[1:tot_dates]
positive <- round(test_adjusted_positive*exposure)
Compute the Likelihood, which is proportional to the probability that a certain set of data was observed (i.e. daily confirmed cases), by sampling the aforementioned log(R_t) parameter from which the variable positive is computed.
likelihood <- dnbinom(round(Italian_data$daily_confirmed), mu = positive, size = 1/6)
Finally, here we come to my BUGS model file:
model {
#priors as a Gaussian RW
log_rt[1] ~ dnorm(0, 0.035)
log_rt[2] ~ dnorm(0, 0.035)
for (t in 3:tot_dates) {
log_rt[t] ~ dnorm(log_rt[t-1] + log_rt[t-2], 0.035)
R_t_candidate[t] <- exp(log_rt[t])
}
# data likelihood
for (t in 2:tot_dates) {
infections[t] <- sum(R_t_candidate * infections * gt_to_convolution[t-1,])
}
test_adjusted_positive <- convolve(infections, delay_distribution)
test_adjusted_positive <- test_adjusted_positive[1:tot_dates]
positive <- test_adjusted_positive*exposure
for (t in 2:tot_dates) {
confirmed[t] ~ dnbinom( obs[t], positive[t], 1/6)
}
}
where gt_to_convolution is a constant matrix, tot_dates is a constant value and exposure is a constant array.
When trying to compile it through:
data <- NULL
data$obs <- round(Italian_data$daily_confirmed)
data$tot_dates <- n_days
data$delay_distribution <- delay_distribution_df$density
data$exposure <- exposure
data$gt_to_convolution <- gt_to_convolution
inits <- NULL
inits$log_rt <- rep(0, tot_dates)
library (rjags)
library (coda)
set.seed(1995)
model <- "MyModel.bug"
jm <- jags.model(model , data, inits)
It raises the following raising error:
Compiling model graph
Resolving undeclared variables
Allocating nodes
Deleting model
Error in jags.model(model, data, inits) : RUNTIME ERROR:
Compilation error on line 19.
Possible directed cycle involving test_adjusted_positive
Hence I am not even able to debug it a little, even though I'm pretty sure there is something wrong more in general but I cannot figure out what and why.
At this point, I think the best choice would be to implement a Metropolis Algorithm myself according to the likelihood above, but obviously, I would way much more prefer to use an already tested framework that is BUGS/JAGS, this is the reason why I am asking for help.

Problem with simple numerical estimation for MLE of multinomial in R

I am trying to set up a simple numerical MLE estimation of a multinomial distribution.
The multinomial has one constraint - all the cell probabilities need to add up to one.
Usually the way to have this constraint is to re-express one of the probabilities as (1 - sum of the others)
When I run this however, I have a problem as during the optimization procedure, I might have logarithm of a negative value.
Any thoughts of how to fix this? I tried using another optimization package (Rsolnp) and it worked, but I am trying to make it work with the simple default R optim in order to avoid constrained/nonlinear optimization.
Here is my code (I know that I can get the result in this particular case analytically, but this is a toy example, my actual problem is bigger than this here).
set.seed(1234)
test_data <- rmultinom(n = 1, size = 1000, prob = rep(1/4, 4))
N <- test_data
loglik_function <- function(theta){
output <- -1*(N[1]*log(theta[1]) + N[2]*log(theta[2]) + N[3]*log(theta[3]) + N[4]*log(1- sum(theta)))
return(output)
}
startval <- rep(0.1, 3)
my_optim <- optim(startval, loglik_function, lower = 0.0001, upper = 0.9999, method = "L-BFGS-B")
Any thoughts or help would be very much appreciated. Thanks
Full heads-up: I know you asked about (constrained) ML estimation, but how about doing this the Bayesian way à la Stan/rstan. I will remove this if it's not useful/missing the point.
The model is only a few lines of code.
library(rstan)
model_code <- "
data {
int<lower=1> K; // number of choices
int<lower=0> y[K]; // observed choices
}
parameters {
simplex[K] theta; // simplex of probabilities, one for every choice
}
model {
// Priors
theta ~ cauchy(0, 2.5); // weakly informative
// Likelihood
y ~ multinomial(theta);
}
generated quantities {
real ratio;
ratio = theta[1] / theta[2];
}
"
You can see how easy it is to implement the simplex constraint on the thetas using the Stan data type simplex. In the Stan language, simplex allows you to easily implement a probability (unit) simplex
where K denotes the number of parameters (here: choices).
Also note how we use the generated quantities code block, to calculate derived quantities (here ratio) based on the parameters (here theta[1] and theta[2]). Since we have access to the posterior distributions of all parameters, calculating the distribution of derived quantities is trivial.
We then fit the model to your test_data
fit <- stan(model_code = model_code, data = list(K = 4, y = test_data[, 1]))
and show a summary of the parameter estimates
summary(fit)$summary
# mean se_mean sd 2.5% 25%
#theta[1] 0.2379866 0.0002066858 0.01352791 0.2116417 0.2288498
#theta[2] 0.26 20013 0.0002208638 0.01365478 0.2358731 0.2526111
#theta[3] 0.2452539 0.0002101333 0.01344665 0.2196868 0.2361817
#theta[4] 0.2547582 0.0002110441 0.01375618 0.2277589 0.2458899
#ratio 0.9116350 0.0012555320 0.08050852 0.7639551 0.8545142
#lp__ -1392.6941655 0.0261794859 1.19050097 -1395.8297494 -1393.2406198
# 50% 75% 97.5% n_eff Rhat
#theta[1] 0.2381541 0.2472830 0.2645305 4283.904 0.9999816
#theta[2] 0.2615782 0.2710044 0.2898404 3822.257 1.0001742
#theta[3] 0.2448304 0.2543389 0.2722152 4094.852 1.0007501
#theta[4] 0.2545946 0.2638733 0.2822803 4248.632 0.9994449
#ratio 0.9078901 0.9648312 1.0764747 4111.764 0.9998184
#lp__ -1392.3914998 -1391.8199477 -1391.3274885 2067.937 1.0013440
as well as a plot showing point estimates and CIs for the theta parameters
plot(fit, pars = "theta")
Update: Constrained ML estimation using maxLik
You can in fact implement constrained ML estimation using methods provided by the maxLik library. I found it a bit "fiddly", because convergence seems to be quite sensitive to changes in the starting values and the optimisation method used.
For what it's worth, here is a reproducible example:
library(maxLik)
x <- test_data[, 1]
Define the log-likelihood function for a multinomial distribution; I've included an if statement here to prevent theta < 0 cases from throwing an error.
loglik <- function(theta, x)
if (all(theta > 0)) sum(dmultinom(x, prob = theta, log = TRUE)) else 0
I use the Nelder-Mead optimisation method here to find the maximum of the log-likelihood function. The important bit here is the constraints argument that implements a constraint in the form of the equality A theta + B = 0, see ?maxNM for details and examples.
res <- maxNM(
loglik,
start = rep(0.25, length(x)),
constraints = list(
eqA = matrix(rep(1, length(x)), ncol = length(x)),
eqB = -1),
x = x)
We can inspect the results
summary(res)
--------------------------------------------
Nelder-Mead maximization
Number of iterations: 111
Return code: 0
successful convergence
Function value: -10.34576
Estimates:
estimate gradient
[1,] 0.2380216 -0.014219040
[2,] 0.2620168 0.012664714
[3,] 0.2450181 0.002736670
[4,] 0.2550201 -0.002369234
Constrained optimization based on SUMT
Return code: 1
penalty close to zero
1 outer iterations, barrier value 5.868967e-09
--------------------------------------------
and confirm that indeed the sum of the estimates equals 1 (within accuracy)
sum(res$estimate)
#[1] 1.000077
Sample data
set.seed(1234)
test_data <- rmultinom(n = 1, size = 1000, prob = rep(1/4, 4))

LaplacesDemon: when should I sum prior density?

I am migrating from JAGS to LaplacesDemon and trying to rewrite some of my codes. I have read the LaplacesDemon Tutorial and LaplacesDemon Examples vignettes and am a bit confused about some of examples in the vignettes.
In the simple example in LaplacesDemon Tutorial (p.5), the model is written as:
Model <- function(parm, Data)
{beta <- parm[Data$pos.beta]
sigma <- interval(parm[Data$pos.sigma], 1e-100, Inf)
parm[Data$pos.sigma] <- sigma
beta.prior <- dnormv(beta, 0, 1000, log=TRUE)
sigma.prior <- dhalfcauchy(sigma, 25, log=TRUE)
mu <- tcrossprod(beta, Data$X)
LL <- sum(dnorm(Data$y, mu, sigma, log=TRUE))
LP <- LL + sum(beta.prior) + sigma.prior
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP,
yhat=rnorm(length(mu), mu, sigma), parm=parm)
return(Modelout)}
Here, the beta.prior was summed up for LP as there are more than one beta parameters.
But I found in the more advanced examples in the LaplacesDemon Example vignette, it doesn't seem to always follow the rule. Such as in example 87 (p.162):
Model <- function(parm, Data)
{### Log-Prior
beta.prior <- sum(dnormv(beta[,1], 0, 1000, log=TRUE), dnorm(beta[,-1], beta[,-Data$T], matrix(tau, Data$K, Data$T-1), log=TRUE))
zeta.prior <- dmvn(zeta, rep(0,Data$S), Sigma[ , , 1], log=TRUE)
phi.prior <- sum(dhalfnorm(phi[1], sqrt(1000), log=TRUE), dtrunc(phi[-1], "norm", a=0, b=Inf, mean=phi[-Data$T], sd=sigma[2], log=TRUE))
### Log-Posterior
LP <- LL + beta.prior + zeta.prior + sum(phi.prior) + sum(kappa.prior) + sum(lambda.prior) + sigma.prior + tau.prior
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP, yhat=rnorm(prod(dim(mu)), mu, sigma[1]), parm=parm)
return(Modelout)}
(Put only part of the codes owing to the length of the example codes)
Here, zeta is more than one but wasn't summed in either the Log-Prior or Log-Posterior part, beta is more than one and was summed in Log-Prior and phi is also more than one parameters but it was summed in both Log-Prior and Log-Posterior parts.
And in the next example on p.167, it seems to be different again.
I was wondering in what scenario we should sum the prior density? Many thanks!
Have you tried running the code line by line? You would learn that there is nothing to sum since dmvn is the density function of multivariate normal distribution and it returns a single value -- probability density of observing vector zeta. The reason for all the sums is that to obtain probability of observing two independent events together we multiply their marginal probabilities (or sum their logs). So we multiply the probabilities of observing all the priors together to obtain their joint distribution.

Estimating AR(1) coefficient using metropolis-Hastings algorithm (MCMC) in R

I am trying to write a program to estimate AR(1) coefficients using metropolis-hastings algorithm. My R code is as following,
set.seed(101)
#loglikelihood
logl <- function(b,data) {
ly = length(data)
-sum(dnorm(data[-1],b[1]+b[2]*data[1:(ly-1)],(b[3])^2,log=TRUE))
}
#proposal function
proposalfunction <- function(param,s){
return(rnorm(3,mean = param, sd= s))
}
#MH sampler
MCMC <- function(startvalue, iterations,data,s){
i=1
chain = array(dim = c(iterations+1,3))
chain[i,] = startvalue
while (i <= iterations){
proposal = proposalfunction(chain[i,],s)
probab = exp(logl(proposal,data = data) - logl(chain[i,],data = data))
if(!is.na(probab)){
if (runif(1) <= min(1,probab)){
chain[i+1,] = proposal
}else{
chain[i+1,] = chain[i,]
}
i=i+1
}else{
cat('\r !')
}
}
acceptance = round((1-mean(duplicated(chain)))*100,1)
print(acceptance)
return(chain)
}
#example
#generating data
data <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 2000,sd = sqrt(1))
r=MCMC(c(0,.7,1),50000,data,s=.00085)
In the example, I must get zero for the mean and 0.7 for the coefficient and 1 for error variance. but everytime I run this code I get completely different values. I tried to adjust proposal scale but still I get the results that are far from the true values. Figure below shows the results.
You've flipped the sign in your log-likelihood function. This is an easy mistake to make because maximum likelihood estimation usually proceeds by minimizing the negative log-likelihood, but the requirement in MCMC is to be working with the likelihood itself (not its inverse).
Also:
dnorm() takes the standard deviation as its third argument, not the variance. You can simplify your code slightly by using head(data,-1) to get all but the last element in a vector. So your log-likelihood would be:
sum(dnorm(data[-1],b[1]+b[2]*head(data,-1),b[3],log=TRUE))
you're probably hurting yourself by fixing the candidate distribution to be independent Normal with equal SDs for all variables; allowing them to differ (although the posterior SDs are not as different as I thought they might be - about {0.02,0.016,0.0077} - so this might not be such a big problem.

Wrong Hessian from optim in R

I am doing some Extreme Values analysis. I don't want to use the fevd package for a variety of reasons (the first I want to be able to tweak some things that I cannot do otherwise). I wrote my own code. It is mostly very simple, and I thought I had solved everything. But for some parameter combinations, the Hessian coming out of my log-likelihood analysis (based on optim ) will not be correct.
Going over one step at the time. My code - or selected part of it - looks like this:
# routines for non stationary
Log_lik_GEV <- function(dataIN,scaleIN,shapeIN,locationIN){
# simply calculate the negative log likelihood value for a set of X and parameters, for the GPD
#xi, mu, sigma - xi is the shape parameter, mu the location parameter, and sigma is the scale parameter.
# shape = xi
# location = mu
# scale = beta
library(fExtremes)
#dgev Density of the GEV Distribution, dgev(x, xi = 1, mu = 0, sigma = 1)
LLvalues <- dgev(dataIN, xi = shapeIN, mu = locationIN, beta = scaleIN)
NLL <- -sum(log(LLvalues[is.finite(LLvalues)]))
return(NLL)
}
function_MLE <- function(par , dataIN){
scoreLL <- 0
shape_param <- par[1]
scale_param <- par[2]
location_param <- par[3]
scoreLL <- Log_lik_GEV(dataIN, scale_param, shape_param, location_param)
if (abs(shape_param) > 0.3) scoreLL <- scoreLL*10000000
if ((scale_param) <= 0) {
scale_param <- abs(scale_param)
par[2] <- abs(scale_param)
scoreLL <- scoreLL*1000000000
}
sum(scoreLL)
}
kernel_estimation <- function(dati_AM, shape_o, scale_o, location_o) {
paramOUT <- optim(par = c(shape_o, scale_o, location_o), fn = function_MLE, dataIN = dati_AM, control = list(maxit = 3000, reltol = 0.00000001), hessian = TRUE)
# calculation std errors
covmat <- solve(paramOUT$hessian)
stde <- sqrt(diag(covmat))
print(covmat)
print('')
result <- list(shape_gev =paramOUT$par[1], scale_gev = paramOUT$par[2],location_gev =paramOUT$par[3], var_covar = covmat)
return(result)
}
Everything works great, in some cases. If I run my routines and the fevd routines, I get exactly the same results. In some cases (in my specific case when shape=-0.29 so strongly negative/weibull), my routine will give negative variances and funky hessians. It is not always wrong, but some parameter combinations are clearly not giving valid hessian (Note: the parameters are still estimated correctly, meaning are identical to the fevd results, but the covariance matrix is completely off).
I found this post that compared the hessian from two procedures, and indeed optim seems to be flaky. However, if I simply substitute maxLik in my routine, it just doesn't converge at all (even in those cases when the convergence was happening).
paramOUT = maxLik(function_MLE, start =c(shape_o, scale_o, location_o),
dataIN=dati_AM, method ='NR' )
I tried to give different initial values - even the correct ones - but it just doesn't converge.
I am not supplying data because I think that the optim routine is used correctly in my example. Simply, the numerical results are not stable for some parameter combination. My question is:
1) Am I missing something in the way I use maxLik?
2) Are there other optimization routines, besides maxLik, from which I can extract the hessian?
thanks

Resources