Conducting MLE for multivariate case (bivariate normal) in R - r

The case is that I am trying to construct an MLE algortihm for a bivariate normal case. Yet, I stuck somewhere that seems there is no error, but when I run the script it ends up with a warning.
I have a sample of size n (a fixed constant, trained with 100, but can be anything else) from a bivariate normal distribution with mean vector = (0,0) and covariance matrix = matrix(c(2.2,1.8,1.8,3),2,2)
I've tried several optimization functions (including nlm(), mle(), spg() and optim()) to maximize the likelihood function (,or minimize neg-likelihood), but there are warnings or errors.
require(MASS)
require(tmvtnorm)
require(BB)
require(matrixcalc)
I've defined the first likelihood function as follows;
bvrt_ll = function(mu,sigma,rho,sample)
{
n = nrow(sample)
mu_hat = c(mu[1],mu[2])
p = length(mu)
if(sigma[1]>0 && sigma[2]>0)
{
if(rho<=1 && rho>=-1)
{
sigma_hat = matrix(c(sigma[1]^2
,sigma[1]*sigma[2]*rho
,sigma[1]*sigma[2]*rho
,sigma[2]^2),2,2)
stopifnot(is.positive.definite(sigma_hat))
neg_likelihood = (n*p/2)*log(2*pi) + (n/2)*log(det(sigma_hat)) + 0.5*sum(((sample-mu_hat)%*%solve(sigma_hat)%*%t(sample-mu_hat)))
return(neg_likelihood)
}
}
else NA
}
I prefered this one since I could set the constraints for sigmas and rho, but when I use mle()
> mle(minuslogl = bvrt_ll ,start = list(mu = mu_est,sigma=sigma_est,rho =
rho_est)
+ ,method = "BFGS")
Error in optim(start, f, method = method, hessian = TRUE, ...) :
(list) object cannot be coerced to type 'double'
I also tried nlm and spg in package BB, but they did not help as well. I tried the same function without defining constraints (inside the likelihood, not in optimization function), I could have some results but with warnings, like in nlm and spg both said the process was failed due to covariance matrix being not positive definite while it was, I think that was due to iteration, when iterating covariance matrix might not have been positive definite, and the fact that I did not define the constraints.
Thus, as a result I need to construct an mle algorithm for bivariate normal, where do I do the mistake?
NOTE: I also tried the optimization functions with the following, (I am not sure I did it correct);
neg_likelihood = function(mu,sigma,rho)
{
if(rho>=-1 && rho<=1)
{
-sum(mvtnorm::dmvnorm(x=sample_10,mean=mu
,sigma = matrix(c(sigma[1]^2
,sigma[1]*sigma[2]*rho,sigma[1]*sigma[2]*rho
,sigma[2]^2),2,2),log = T))
}
else NA
}
Any help is appreciated.
Thanks.
EDIT : mu is a vector of length 2 specifying the population means, sigma is a vector of length 2 (specifying population standard deviations of the random variables), and rho is a scalar as correlation coefficient between the bivariate r.v s.

You can do it in closed form so there is no need for numeric optimization. See wiki. Just use colMeans and cov and take note of the method argument in help("cov") and this comment
The denominator n - 1 is used which gives an unbiased estimator of the
(co)variance for i.i.d. observations. These functions return NA when
there is only one observation (whereas S-PLUS has been returning NaN),
and fail if x has length zero.

Related

Finding minimum by optimising a vector in R

I need to find a minimum of an objective function by optimising a vector. The problem is finance related if that helps - the function RC (provided below) computes the sum of squared differences of risk contribution of different assets, where the risk contribution is a product of input Risk Measure (RM, given) and weights.
The goal is to find such weights that the sum is zero, i.e. all assets have equal risk contributions.
RC = function (RM, w){
w = w/sum(w) # normalizing weights so they sum up to 1
nAssets = length(RM)
rc_matrix = matrix(nrow=1,ncol=nAssets)
rc_matrix = RM*w #risk contributions: RM (risk measure multiplied by asset's
#w eight in the portfolio)
rc_sum_squares = numeric(length=1) #placeholder
rc_sum_squares = sum(combn(
seq_along(RM),
2,
FUN = function(x)
(rc_matrix[ , x[1]] - rc_matrix[, x[2]]) ** 2
)) # this function sums the squared differences of the risk contributions
return(rc_sum_squares)
}
I searched and the solution seems to lie in the "optim" function, so I tried:
out <- optim(
par = rep(1 / length(RM), length(RM)), # initial guess
fn = RC,
RM = RM,
method = "L-BFGS-B",
lower = 0.00001,
upper = 1)
However, this returns an error message: "Error in rc_matrix[, x[1]] : incorrect number of dimensions"
I don't know how the optimization algorithm works, so I can't really wrap my head around it. The RC function works though, here is a sample for replicability:
RM <- c(0.06006928, 0.06823795, 0.05716360, 0.08363529, 0.06491009, 0.06673174, 0.03103578, 0.05741140)
w <- matrix(0.125, nrow=1, ncol=1)
I saw also CVXR package, which crashes my RStudio for some reason and nlm(), which is little more complicated and I can't write the function properly.
A solution might be not to do the funky summation of the squared differences, but finding the weights so that the risk contributions (RM*weight) are equal. I will be very glad for your help.
Note: the vector of the weights has to sum up to 1 and the values have to lie between 0 and 1.
Cheers
Daniel

Trying to plot loglikelihood of Cauchy distribution for different values of theta in R

I am trying to plot the log-likelihood function of the Cauchy distribution for varying values of theta (location parameter). These are my observations:
obs<-c(1.77,-0.23,2.76,3.80,3.47,56.75,-1.34,4.24,3.29,3.71,-2.40,4.53,-0.07,-1.05,-13.87,-2.53,-1.74,0.27,43.21)
Here is my log-likelihood function:
ll_c<-function(theta,x_values){
n<-length(x_values)
logl<- -n*log(pi)-sum(log(1+(x_values-theta)^2))
return(logl)
}
and Ive tried making a plot by using this code:
x<-seq(from=-10,to=10,by=0.1);length(x)
theta_null<-NULL
for (i in x){
theta_log<-ll_c(i,counts)
theta_null<-c(theta_null,theta_log)
}
plot(theta_null)
The graph does not look right and for some reason the length of x and theta_null differs.
I am assuming that theta is your location parameter (the scale is set to 1 in my example). You should obtain the same result using a t-distribution with 1 df and shifting the observations by theta. I left some comments in the code as guidance.
obs = c(1.77,-0.23,2.76,3.80,3.47,56.75,-1.34,4.24,3.29,3.71,-2.40,4.53,-0.07,-1.05,-13.87,-2.53,-1.74,0.27,43.21)
ll_c=function(theta, obs)
{
# Compute log-lik for obs and a value of thet (location)
logl= sum(dcauchy(obs, location = theta, scale = 1, log = T))
return(logl)
}
# Loop for possible values of theta(obs given)
x = seq(from=-10,to=10,by=0.1)
ll = NULL
for (i in x)
{
ll = c(ll, ll_c(i, obs))
}
# Plot log-lik vs possible value of theta
plot(x, ll)
It is hard to say exactly what you are experiencing without more info. But I'll make an educated guess.
First of all, we can simplify this a lot by using the *t family of functions for the t distribution, as the cauchy distribution is just the t distribution with df = 1. So your calculations could've been done using
for(i in ncp)
theta_null <- c(theta_null, sum(dt(values, 1, i, log = TRUE)))
Note that multiplying by n doesn't actually matter for any practical purposes. We are usually interested in minimizing/maximizing the likelihood in which case all constants are irrelevant.
Now if we use this approach, we can quite quickly notice something by printing the values:
print(head(theta_null))
[1] -Inf -Inf -Inf -Inf -Inf -Inf
So I am assuming what you are experiencing is that many of your values are "almost" negative infinity, and maybe these are not stored correctly in your outcome vector. I can't see that this should be the case from your code, but this would be my initial guess.

Error in optim(): searching for global minimum for a univariate function

I am trying to optmize a function in R
The function is the Likelihood function of negative binominal when estimating only mu parameter. This should not be a problem since the function clearly has just one point of maximum. But, I am not being able to reach the desirable result.
The function to be optmized is:
EMV <- function(data, par) {
Mi <- par
Phi <- 2
N <- NROW(data)
Resultado <- log(Mi/(Mi + Phi))*sum(data) + N*Phi*log(Phi/(Mi + Phi))
return(Resultado)
}
Data is a vector of negative binomial variables with parameters 2 and 2
data <- rnegbin(10000, mu = 2, theta = 2)
When I plot the function having mu as variable with the following code:
x <- seq(0.1, 100, 0.02)
z <- EMV(data,0.1)
for (aux in x) {z <- rbind(z, EMV(data,aux))}
z <- z[2:NROW(z)]
plot(x,z)
I get the following curve:
And the maximum value of z is close to parameter value --> 2
x[which.max(z)]
But the optimization is not working with BFGS
Error in optim(par = theta, fn = EMV, data = data, method = "BFGS") :
non-finite finite-difference value [1]
And is not going to right value using SANN, for example:
$par
[1] 5.19767e-05
$value
[1] -211981.8
$counts
function gradient
10000 NA
$convergence
[1] 0
$message
NULL
The questions are:
What am I doing wrong?
Is there a way to tell optim that the param should be bigger than 0?
Is there a way to tell optim that I want to maximize the function? (I am afraid the optim is trying to minimize and is going to a very small value where function returns smallest values)
Minimization or Maximization?
Although ?optim says it can do maximization, but that is in a bracket, so minimization is default:
fn: A function to be minimized (or maximized) ...
Thus, if we want to maximize an objective function, we need to multiply an -1 to it, and then minimize it. This is quite a common situation. In statistics we often want to find maximum log likelihood, so to use optim(), we have no choice but to minimize the negative log likelihood.
Which method to use?
If we only do 1D minimization, we should use method "Brent". This method allows us to specify a lower bound and an upper bound of search region. Searching will start from one bound, and search toward the other, until it hit the minimum, or it reach the boundary. Such specification can help you to constrain your parameters. For example, you don't want mu to be smaller than 0, then just set lower = 0.
When we move to 2D or higher dimension, we should resort to "BFGS". In this case, if we want to constrain one of our parameters, say a, to be positive, we need to take log transform log_a = log(a), and reparameterize our objective function using log_a. Now, log_a is free of constraint. The same goes when we want constrain multiple parameters to be positive.
How to change your code?
EMV <- function(data, par) {
Mi <- par
Phi <- 2
N <- NROW(data)
Resultado <- log(Mi/(Mi + Phi))*sum(data) + N*Phi*log(Phi/(Mi + Phi))
return(-1 * Resultado)
}
optim(par = theta, fn = EMV, data = data, method = "Brent", lower = 0, upper = 1E5)
The help file for optim says: "By default optim performs minimization, but it will maximize if control$fnscale is negative." So if you either multiply your function output by -1 or change the control object input, you should get the right answer.

Wrong Hessian from optim in R

I am doing some Extreme Values analysis. I don't want to use the fevd package for a variety of reasons (the first I want to be able to tweak some things that I cannot do otherwise). I wrote my own code. It is mostly very simple, and I thought I had solved everything. But for some parameter combinations, the Hessian coming out of my log-likelihood analysis (based on optim ) will not be correct.
Going over one step at the time. My code - or selected part of it - looks like this:
# routines for non stationary
Log_lik_GEV <- function(dataIN,scaleIN,shapeIN,locationIN){
# simply calculate the negative log likelihood value for a set of X and parameters, for the GPD
#xi, mu, sigma - xi is the shape parameter, mu the location parameter, and sigma is the scale parameter.
# shape = xi
# location = mu
# scale = beta
library(fExtremes)
#dgev Density of the GEV Distribution, dgev(x, xi = 1, mu = 0, sigma = 1)
LLvalues <- dgev(dataIN, xi = shapeIN, mu = locationIN, beta = scaleIN)
NLL <- -sum(log(LLvalues[is.finite(LLvalues)]))
return(NLL)
}
function_MLE <- function(par , dataIN){
scoreLL <- 0
shape_param <- par[1]
scale_param <- par[2]
location_param <- par[3]
scoreLL <- Log_lik_GEV(dataIN, scale_param, shape_param, location_param)
if (abs(shape_param) > 0.3) scoreLL <- scoreLL*10000000
if ((scale_param) <= 0) {
scale_param <- abs(scale_param)
par[2] <- abs(scale_param)
scoreLL <- scoreLL*1000000000
}
sum(scoreLL)
}
kernel_estimation <- function(dati_AM, shape_o, scale_o, location_o) {
paramOUT <- optim(par = c(shape_o, scale_o, location_o), fn = function_MLE, dataIN = dati_AM, control = list(maxit = 3000, reltol = 0.00000001), hessian = TRUE)
# calculation std errors
covmat <- solve(paramOUT$hessian)
stde <- sqrt(diag(covmat))
print(covmat)
print('')
result <- list(shape_gev =paramOUT$par[1], scale_gev = paramOUT$par[2],location_gev =paramOUT$par[3], var_covar = covmat)
return(result)
}
Everything works great, in some cases. If I run my routines and the fevd routines, I get exactly the same results. In some cases (in my specific case when shape=-0.29 so strongly negative/weibull), my routine will give negative variances and funky hessians. It is not always wrong, but some parameter combinations are clearly not giving valid hessian (Note: the parameters are still estimated correctly, meaning are identical to the fevd results, but the covariance matrix is completely off).
I found this post that compared the hessian from two procedures, and indeed optim seems to be flaky. However, if I simply substitute maxLik in my routine, it just doesn't converge at all (even in those cases when the convergence was happening).
paramOUT = maxLik(function_MLE, start =c(shape_o, scale_o, location_o),
dataIN=dati_AM, method ='NR' )
I tried to give different initial values - even the correct ones - but it just doesn't converge.
I am not supplying data because I think that the optim routine is used correctly in my example. Simply, the numerical results are not stable for some parameter combination. My question is:
1) Am I missing something in the way I use maxLik?
2) Are there other optimization routines, besides maxLik, from which I can extract the hessian?
thanks

Parameters estimation of a bivariate mixture normal-lognormal model

I have to create a model which is a mixture of a normal and log-normal distribution. To create it, I need to estimate the 2 covariance matrixes and the mixing parameter (total =7 parameters) by maximizing the log-likelihood function. This maximization has to be performed by the nlm routine.
As I use relative data, the means are known and equal to 1.
I’ve already tried to do it in 1 dimension (with 1 set of relative data) and it works well. However, when I introduce the 2nd set of relative data I get illogical results for the correlation and a lot of warnings messages (at all 25).
To estimate these parameters I defined first the log-likelihood function with the 2 commands dmvnorm and dlnorm.plus. Then I assign starting values of the parameters and finally I use the nlm routine to estimate the parameters (see script below).
`P <- read.ascii.grid("d:/Documents/JOINT_FREQUENCY/grid_E727_P-3000.asc", return.header=
FALSE );
V <- read.ascii.grid("d:/Documents/JOINT_FREQUENCY/grid_E727_V-3000.asc", return.header=
FALSE );
p <- c(P); # tranform matrix into a vector
v <- c(V);
p<- p[!is.na(p)] # removing NA values
v<- v[!is.na(v)]
p_rel <- p/mean(p) #Transforming the data to relative values
v_rel <- v/mean(v)
PV <- cbind(p_rel, v_rel) # create a matrix of vectors
L <- function(par,p_rel,v_rel) {
return (-sum(log( (1- par[7])*dmvnorm(PV, mean=c(1,1), sigma= matrix(c(par[1]^2, par[1]*par[2]
*par[3],par[1]*par[2]*par[3], par[2]^2 ),nrow=2, ncol=2))+
par[7]*dlnorm.rplus(PV, meanlog=c(1,1), varlog= matrix(c(par[4]^2,par[4]*par[5]*par[6],par[4]
*par[5]*par[6],par[5]^2), nrow=2,ncol=2)) )))
}
par.start<- c(0.74, 0.66 ,0.40, 1.4, 1.2, 0.4, 0.5) # log-likelihood estimators
result<-nlm(L,par.start,v_rel=v_rel,p_rel=p_rel, hessian=TRUE, iterlim=200, check.analyticals= TRUE)
Messages d'avis :
1: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
production de NaN
2: In sqrt(2 * pi * det(varlog)) : production de NaN
3: In nlm(L, par.start, p_rel = p_rel, v_rel = v_rel, hessian = TRUE) :
NA/Inf replaced by maximum positive value
4: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
production de NaN
…. Until 25.
par.hat <- result$estimate
cat("sigN_p =", par[1],"\n","sigN_v =", par[2],"\n","rhoN =", par[3],"\n","sigLN_p =", par [4],"\n","sigLN_v =", par[5],"\n","rhoLN =", par[6],"\n","mixing parameter =", par[7],"\n")
sigN_p = 0.5403361
sigN_v = 0.6667375
rhoN = 0.6260181
sigLN_p = 1.705626
sigLN_v = 1.592832
rhoLN = 0.9735974
mixing parameter = 0.8113369`
Does someone know what is wrong in my model or how should I do to find these parameters in 2 dimensions?
Thank you very much for taking time to look at my questions.
Regards,
Gladys Hertzog
When I do these kind of optimization problems, I find that it's important to make sure that all the variables that I'm optimizing over are constrained to plausible values. For example, standard deviation variables have to be positive, and from knowledge of the situation that I'm modelling I'll probably be able to put an upper bound all my standard deviation variables as well. So if s is one of my standard deviation variables, and if m is the maximum value that I want it to take, instead of working with s I'll solve for the variable z which is related to s via
s = m/(1+e-z)
In that formula, z is unconstrained, but s must lie between 0 and m. This is vital because optimization routines where the variables are not constrained to take plausible values will often try completely implausible values while they're trying to bound the solution. Implausible values often cause problems with e.g. precision, that then results in NaN's etc. The general formula that I use for constraining a single variable x to lie between a and b is
x = a + (b - a)/(1+e-z)
However, regarding your particular problem where you're looking for covariance matrices, a more sophisticated approach is necessary than simply bounding all the individual variables. Covariance matrices must be positive semi-definite, so if you're simply optimizing the individual values in the matrix, the optimization will probably fail (producing NaN's) if a matrix which isn't positive definite is fed into the likelihood function. To get round this problem, one approach is to solve for the Cholesky decomposition of the covariance matrix instead of the covariance matrix itself. My guess is that this is probably what's causing your optimization to fail.

Resources