Coding likelihood and log-likelihood function in r to perform optimization

Coding likelihood and log-likelihood function in r to perform optimization - r

I am working on a paper that requires me to find the MLE of Gumbel’s type I
bivariate exponential distribution. I have proved the likelihood and log-likelihood functions likelihood and log-likelihood but I am struggling to implement it in r to perform optimization with Optim function. My code generates NA values.
Below are my codes.
# likelihood function of x
likelihood.x = function(params, data) {
lambda1 = params[1]
lambda2 = params[2]
theta = params[3]
A = (1 - theta) * (lambda1 * lambda2)
B = theta * (lambda1 ^ 2) * lambda2 * data$X1
C = theta * lambda1 * (lambda2 ^ 2) * data$X2
D = (theta ^ 2) * (lambda1 ^ 2) * (lambda2 ^ 2) * data$X1 * data$X2
E = (lambda1 * data$X1) + (lambda2 * data$X2) + (theta * lambda1 * lambda2 * data$X1 * data$X2)
f = sum(log(A + B + C + D)) - sum(E)
return(exp(f))
}
# Log-likelihood function of x
log.likelihood.x = function(params, data){
lambda1 = params[1]
lambda2 = params[2]
theta = params[3]
A = (1 - theta) * (lambda1 * lambda2)
B = theta * (lambda1 ^ 2) * lambda2 * data$X1
C = theta * lambda1 * (lambda2 ^ 2) * data$X2
D = (theta ^ 2) * (lambda1 ^ 2) * (lambda2 ^ 2) * data$X1 * data$X2
E = (lambda1 * data$X1) + (lambda2 * data$X2) + (theta * lambda1 * lambda2 * data$X1 * data$X2)
f = sum(log(A + B + C + D)) - sum(E)
return(-f)
}
Here's the function for generating the data
# Simulating data
rGBVE = function(n, lambda1, lambda2, theta) {
x1 = rexp(n, lambda1)
lambda12 = lambda1 * lambda2
pprod = lambda12 * theta
C = exp(lambda1 * x1)
A = (lambda12 - pprod + pprod * lambda1 * x1) / C
B = (pprod * lambda2 + pprod ^ 2 * x1) / C
D = lambda2 + pprod * x1
wExp = A / D
wGamma = B / D ^ 2
data.frame(x1, x2 = rgamma(n, (runif(n) > wExp / (wExp + wGamma)) + 1, D))
}
data = rGBVE(n=100, lambda1 = 1.2, lambda2 = 1.4, theta = 0.5)
colnames(data) = c("X1", "X2")
My goal is to find MLE for lambda1, lambda2 and theta using Optim() in r.
Kindly assist me to implement my likelihood and log-likelihood function in r.
Thank you.

Your concern appears to be about the warning message
In log(A+B+C+D): NaNs produced
Such warnings are usually harmless — it just means that the optimization algorithm tried a set of parameters somewhere along the way that violated the condition A+B+C+D ≥ 0. Since these are reasonably complex expressions it would take a little bit of effort to figure out how one might constrain the parameters (or reparameterize the function, e.g. fitting some of the parameters on the log scale) to avoid the warning, but taking a guess that keeping the parameters non-negative will help, we can try using the L-BFGS-B algorithm (which is the only algorithm available in optim() that allows multidimensional bounded optimization).
r1 <- optim(par = c(1,2,1),
fn = log.likelihood.x,
dat = data)
r2 <- optim(par = c(1,2,1),
fn = log.likelihood.x,
lower = rep(0,3),
method = "L-BFGS-B",
dat = data)
The second does not generate warnings, and the results are close (if not identical):
all.equal(r1$par, r2$par)
## "Mean relative difference: 0.0001451953"
You might want to use bbmle, which has some additional features for likelihood modeling:
library(bbmle)
fwrap <- function(x) log.likelihood.x(x, dat = data)
parnames(fwrap) <- c("lambda1", "lambda2", "theta")
m1 <- mle2(fwrap, start = c(lambda1 = 1, lambda2 = 2, theta = 1), vecpar = TRUE,
method = "L-BFGS-B", lower = c(0, 0, -0.5))
pp <- profile(m1)
plot(pp)
confint(pp)
confint(m1, method = "quad")

Related

Auto-Regression (2) in R

I need help generating an AR(2) model in R and I am new to the software.
I have the following AR(2) process:
y[t] = phi_1 * y[t-1] + phi_2 * y[t-2] + e[t] where e[t] ~ N(0,2)
How can I generate a series of y[t]?
Thanks for the help, much appreciated!

You could do:
set.seed(123)
n <- 200
phi_1 <- 0.9
phi_2 <- 0.7
e <- rnorm(n, 0, 2)
y <- vector("numeric", n)
y[1:2] <- c(0, 1)
for (t in 3:n) {
y[t] <- phi_1 * y[t - 1] + phi_2 * y[t - 2] + e[t]
}
plot(seq(n), y, type = "l")

Failing to optimise negative binomial model using optim

I am trying to manually optimise a negative binomial regression model using the optim package in R trying to predict a count variable y using a matrix of factors X using the following code:
# generating some fake data
n <- 1000
X <- matrix(NA, ncol = 5, nrow = n)
X[,1] <- 1
X[,2] <- sample(size = n, x = c(0,1), replace = TRUE)
X[,3] <- sample(size = n, x = c(0,1), replace = TRUE)
X[,4] <- sample(size = n, x = c(0,1), replace = TRUE)
X[,5] <- sample(size = n, x = c(0,1), replace = TRUE)
beta0 <- 3
beta1 <- -2
beta2 <- -2
beta3 <- -4
beta4 <- -0.9
k <- 0.9
## draws from negative binomial distribution
mu <- exp(beta0 + beta1 * X[,2] + beta2 * X[,3] + beta3 * X[,4] + beta4 * X[,5])
theta <- mu + mu ^2 / k
# dependent variable
y <- rnegbin(n, mu = mu, theta = theta)
# function to be optimised
negbin_ll <- function(y, X, theta){
beta <- theta[1:ncol(X)]
alpha <- theta[ncol(X) + 1]
logll <- y * log(alpha) + y *( beta %*% t(X) ) - (y + (1 / alpha ) ) * log( 1 + alpha * exp(beta %*% t(X))) + lgamma(y + (1 / alpha)) - lgamma ( y + 1) - lgamma ( 1 / alpha)
logll <- sum( logll )
return(logll)
}
stval <- rep(0, ncol(X) + 1)
res <-
optim(
stval,
negbin_ll,
y = y,
X = X,
control = list(fnscale = -1),
hessian = TRUE,
method = "BFGS"
)
The code should produce point estimates from the optimisation process, but instead fails when executing the optim-function with the error in optim(stval, negbin_ll, y = y, X = X, control = list(fnscale = -1), : initial value in 'vmmin' is not finite.
I already tried to change log(gamma(...)) to lgamma(...) in the likelihood function and tried many other ways, but I fail to get estimates.
Changing the start values of optim also does not help.
Do you have any idea if there is any particularity to the likelihood function that leads to values being treated in any odd fashion?
Help would be much appreciated.

optim tries several points to get to the minimum, in your case it hits some non-positive values in the arguments inside the logs. One way is to discard the values that return any non-positive inside the problematic functions by returning a negative (in your case) large number, like -lenght(series)*10^6. Remade the log-likelihood function, like this it kinda works:
negbin_ll <- function(y, X, theta){
beta <- theta[1:ncol(X)]
alpha <- theta[ncol(X) + 1]
if(any(alpha<=0)) return(-length(y)*10^6)
if(any(1 + alpha * exp(beta %*% t(X))<=0)) return(-length(y)*10^6)
logll <- y * log(alpha) + y *( beta %*% t(X) ) - (y + (1 / alpha ) ) * log( 1 + alpha * exp(beta %*% t(X))) + lgamma(y + (1 / alpha)) - lgamma ( y + 1) - lgamma ( 1 / alpha)
logll <- sum( logll )
return(logll)
}

Plotting fitted values from regression

Hey I have following code in R
S0 = 40
r = log(1 + 0.07)
sigma = 0.3
K = 45
n_steps_per_year = 4
dt = 1 / n_steps_per_year
T = 3
n_steps = n_steps_per_year * T
R = n_paths
Q = 70
P = 72
n_paths = P * Q
d = exp(-r * dt)
N = matrix(rnorm(n_paths * n_steps, mean = 0, sd = 1), n_paths, n_steps)
paths_S = matrix(nrow = n_paths, ncol = n_steps + 1, S0)
for(i in 1:n_paths){
for(j in 1:n_steps){
paths_S[i, j + 1] = paths_S[i, j] * exp((r - 0.5 * sigma ^ 2) * dt + sigma * sqrt(dt) * N[i, j])
}
}
I = apply(K - paths_S, c(1,2), max, 0)
V = matrix(nrow = n_paths, ncol = n_steps + 1)
V[, n_steps + 1] = I[, n_steps + 1]
dV = d * V[, n_steps + 1]
model = lm(dV ~ poly(paths_S[, n_steps], 10))
pred = predict(model, data.frame(x = paths_S[, n_steps]))
plot(paths_S[, n_steps], d * V[, n_steps + 1])
lines(paths_S[, n_steps], pred)
but when I run the last two lines then I get very strange plot (multiple lines instead of one line). What is going on?

You did not provide n_paths, lets assume:
n_paths = 7
set.seed(111)
Then running your code, before you plot, you need to order your x values before plotting:
o = order(paths_S[,12])
plot(paths_S[o, n_steps], d * V[o, n_steps + 1],cex=0.2,pch=20)
lines(paths_S[o, n_steps], pred[o],col="blue")

How to compute double integral in r?

Suppose that we have the following density :
bvtnorm <- function(x, y, mu_x = 10, mu_y = 5, sigma_x = 3, sigma_y = 7, rho = 0.4) {
function(x, y)
1 / (2 * pi * sigma_x * sigma_y * sqrt(1 - rho ^ 2)) *
exp(- 1 / (2 * (1 - rho ^ 2)) * (((x - mu_x) / sigma_x) ^ 2 +
((y - mu_y) / sigma_y) ^ 2 - 2 * rho * (x - mu_x) * (y - mu_y) /
(sigma_x * sigma_y)))
}
f2 <- bvtnorm(x, y)
I'm wanting to compute the follwing integral :
integral_1=1-adaptIntegrate(f2, lowerLimit = c(-Inf,0), upperLimit = c(+Inf,+Inf))
Unfortunately , it provides this error :
Error in f(tan(x), ...) : argument "y" is missing, with no default
I don't know how to resolve this.
Thank you for help in advance !

With package cubature, functions hcubature and pcubature the integrand would have to be changed a bit. The integrators from that package accept integrand functions of one variable only, that can be a vector in a multidimensional real space. In this case, R2. The values of x and y would have to be assigned in the integrand or change to become x[1] and x[2] in its expression.
bvtnorm <- function(x, mu_x = 10, mu_y = 5, sigma_x = 3, sigma_y = 7, rho = 0.4) {
y <- x[2]
x <- x[1]
1 / (2 * pi * sigma_x * sigma_y * sqrt(1 - rho ^ 2)) *
exp(- 1 / (2 * (1 - rho ^ 2)) * (((x - mu_x) / sigma_x) ^ 2 +
((y - mu_y) / sigma_y) ^ 2 - 2 * rho * (x - mu_x) * (y - mu_y) /
(sigma_x * sigma_y)))
}
library(cubature)
eps <- .Machine$double.eps^0.5
hcubature(bvtnorm, lowerLimit = c(-Inf, 0), upperLimit = c(+Inf,+Inf), tol = eps)
pcubature(bvtnorm, lowerLimit = c(-Inf, 0), upperLimit = c(+Inf,+Inf), tol = eps)

If you need to do a double integral, you could just integrate twice:
bvtnorm <- function(y, mu_x = 10, mu_y = 5, sigma_x = 3, sigma_y = 7, rho = 0.4) {
function(x)
1 / (2 * pi * sigma_x * sigma_y * sqrt(1 - rho ^ 2)) *
exp(- 1 / (2 * (1 - rho ^ 2)) *
(((x - mu_x) / sigma_x) ^ 2 +
((y - mu_y) / sigma_y) ^ 2 - 2 * rho * (x - mu_x) * (y - mu_y) /
(sigma_x * sigma_y)))
}
f3 <- function(y)
{
f2 <- bvtnorm(y = y)
integrate(f2, lower = -Inf, upper = Inf)$value
}
integrate(Vectorize(f3), -Inf, Inf)
#> 1.000027 with absolute error < 1.8e-05
This gives an answer that is pleasingly close to 1, as expected.
Created on 2020-09-05 by the reprex package (v0.3.0)

How to implement Euler method in R

I am trying to implement this Euler Method procedure but I am unable to get the required graphs.
solve_logistic <- function(N0, r = 1, delta_t = 0.01, times = 1000) {
N <- rep(N0, times)
dN <- function(N) r * N * (1 - N)
for (i in seq(2, times)) {
# Euler
N[i] <- N[i-1] + delta_t * dN(N[i-1])
# Improved Euler
# k <- N[i-1] + delta_t * dN(N[i-1])
# N[i] <- N[i-1] + 1 /2 * delta_t * (dN(N[i-1]) + dN(k))
# Runge-Kutta 4th order
# k1 <- dN(N[i-1]) * delta_t
# k2 <- dN(N[i-1] + k1/2) * delta_t
# k3 <- dN(N[i-1] + k2/2) * delta_t
# k4 <- dN(N[i-1] + k3) * delta_t
#
# N[i] <- N[i-1] + 1/6 * (k1 + 2*k2 + 2*k3 + k4)
}
N
}
This is the graph I want to make:
And you can also view the original source which I am following for this graph

Your interest for epedimiological model is a good thing.
To obtain a similar graph as you show, you need to code first the analytical solution of N(t) which is given on the reference web site.
logistic <- function(N0, r, t){
return(1 / (1 + ((1-N0)/N0) * exp(- r * t)))
}
Moreover you should be careful with absisse informations.
r <- 1
t <- 1:1000
N0 <- 0.03
delta_t <- 0.01
plot(t * delta_t, logistic(N0 = N0, r = r, t = t * delta_t), type = "l",
ylim = c(0, 1),
ylab = "N(t)",
xlab = "times")
lines(t * delta_t, solve_logistic(N0 = N0, times = max(t)),
col = "red", lty = 2)
It gives you part of the graphic, now you are able to compute error of the method and test with another delta.
The Euler method is a numerical method for EDO resolution based on Taylor expansion like gradient descent algorithm
.
solve_logistic <- function(N0, r = 1, delta_t = 0.01, times = 1000) {
N <- rep(N0, times)
dN <- function(N) r * N * (1 - N)
for (i in seq(2, times)) {
# Euler (you follow the deepest slope with a small step delta)
N[i] <- N[i-1] + delta_t * dN(N[i-1])
}
N
}