multinomial MLE error in R - r

I am new to R, Trying do MLE using mle2 in bbmle package.
R Code:
rm(list = ls())
library(bbmle)
N <- 100
testmat=rmultinom(N, size=3, prob = c(0.1,0.2,0.8))
LL<- function(s, p){-sum(dmultinom(x=testmat, size = s, prob=p, log = TRUE))}
values.start <- list(3, c(0.1,0.2,0.7))
names(values.start) <- parnames(LL) <- paste0("b",0:1)
mle2(LL, start =values.start)
I keep getting this error
"Error in mle2(LL, start = values.start) :
some named arguments in 'start' are not arguments to the specified log-likelihood function"
I am using mle2, I thought its not needed here. At first I was using "mle"
N <- 100
testmat=t(rmultinom(3, size=3, prob = c(0.1,0.2,0.8)))
LL<- function(s, p1,p2,p3){prob=unlist(as.list(environment()))[2:4]
-sum(dmultinom(x=testmat, size = s, prob=prob, log = TRUE))}
values.start <- list(s=3,p1=0.1,p2=0.2,p3=7)
mle(LL, start =values.start)
which game this error
""Error in dmultinom(x = testmat, size = s, prob = prob, log = TRUE) :
x[] and prob[] must be equal length vectors."
I even edited it as follows
N <- 100
testmat=t(rmultinom(3, size=3, prob = c(0.1,0.2,0.8)))
LL<- function(s=3, p1=0.1,p2=0.2,p3=0.7){
prob=unlist(as.list(environment()))[2:4]
s=unlist(as.list(environment()))[1]
-sum(dmultinom(x=testmat, size = s, prob=prob, log = TRUE))}
mle(LL)
error still persists. Finally I was able to decode the errors, thanks a lot.
library(bbmle)
N <- 1000
X=rmultinom(N,size=3,prob = rep(1/3, 3))
LL <- function( p_1 = 0.1,p_2=0.1,p_3=0.8) {
p <- unlist(as.list(environment()))
-sum(apply(X, MAR = 2, dmultinom, size = NULL, prob = c(p_1,p_2,p_3), log = TRUE))
}
mle(LL,method = "L-BFGS-B", lower = c(-Inf, 0), upper = c(Inf, Inf))
In my current ploblem, I have 5k features, therefore I need to write something like this.
function( p_1 = 0.1,p_2=0.1,p_3=0.8...., p_5000=..)
which not possible. Is there any way out of it?
I was able to do it with mle2. this way
rm(list = ls())
library(bbmle)
N <- 1000
s<-100
X=rmultinom(N,size=s,prob = rep(1/s, s))
LL= function(params){
p <- unlist(as.list(environment()))
minusll = -sum(apply(X, MAR = 2, dmultinom, size = NULL, prob = p, log = TRUE))
return(minusll)
}
values.start<-vector(mode="list", length=s)
values.start <- c(0.02,0.01*rep(98/99,99))
names(values.start) <- parnames(LL)<-paste0("b",1:s)
mle2(LL, start =values.start,vecpar = TRUE, method = "L-BFGS-B", lower = c(rep(0,s)), upper = c(rep(1,s)))
Above I was doing Multinomial MLE parameter estimation for dimension of 100, and 1000 samples. I was able to solve the problem of vector parameters. Now I am having this error
Error in optim(par = c(0.02, 0.0098989898989899, 0.0098989898989899, 0.0098989898989899, :
L-BFGS-B needs finite values of 'fn'
I found out that this error is due to 'fn=Inf', might be due to one of the propabilities becoming zero, therefore fn=-log(0) = Inf. Is there any way to solve this problem?
Thanks for the help.

Related

R function constrOptim can't return hessian matrix

While I try to return hessian metrix and use the method "BFGS", error will come out. the code and errors are in below.
square <- function (par, y){
return(- sum(dnorm(y, mean = par[1], sd = par[2], log = TRUE)))
}
ui <- c(1, -1)
ci <- c(0)
d.y <- rnorm(1000, 10, 6)
res <- constrOptim(theta = c(15, 5),square, grad = NULL, ui = ui, ci = ci, method = "BFGS", hessian = T, y = d.y)
error
Error in colSums(ui * gi.old/gi - ui) (constrOPtim.R#18): 'x' must be an array of at least two dimensions
I don't know if "BFGS" method needs more conditions. How can the program return the hessian matrix rightly whiling using "BFGS" method?

MLE error: initial value in 'vmmin' is not finite

We simulated a data set and created a model.
set.seed(459)
# seed mass
n <- 1000
seed.mass <- round(rnorm(n, mean = 250, sd = 75),digits = 1)
## Setting up the deterministic function
detFunc <- function(a,b,x){
return(exp(a+b*x)) / (1+exp(a+b*x))
}
# logit link function for the binomial
inv.link <- function(z){
p <-1/(1+exp(-z))
return(p)
}
#setting a and b values
a <- -2.109
b <- 0.02
# Simulating data
germination <- (rbinom(n = n, size = 10,
p = inv.link(detFunc(x = seed.mass, a = a, b = b))
))/10
## make data frame
mydata <- data.frame("predictor" = seed.mass, "response" = germination)
# plotting the data
tmp.x <- seq(0,1e3,length.out=500)
plot(germination ~ seed.mass,
xlab = "seed mass (mg)",
ylab = "germination proportion")
lines(tmp.x,inv.link(detFunc(x = tmp.x, a = a, b = b)),col="red",lwd=2)
When we check the model we created and infer the parameters, we get an error:
Error in optim(par = c(a = -2.109, b = 0.02), fn = function (p) : initial value in 'vmmin' is not finite
library(bbmle)
mod1<-mle2(response ~ dbinom(size = 10,
p = inv.link(detFunc(x = predictor, a = a, b = b))
),
data = mydata,
start = list("a"= -2.109 ,"b"= 0.02))
We're stumped and can't figure out why we're getting this error.
Your problem is that you're trying to fit a binomial outcome (which must be an integer) to a proportion.
You can use round(response*10) as your predictor (to put the proportion back on the count scale; round() is because (a/b)*b is not always exactly equal to a in floating-point math ...) Specifically, with your setup
mod1 <- mle2(round(response*10) ~ dbinom(size = 10,
p = inv.link(detFunc(x = predictor, a = a, b = b))
),
data = mydata,
start = list(a = -2.109 ,b = 0.02))
works fine. coef(mod1) is {-1.85, 0.018}, plausibly close to the true values you started with (we don't expect to recover the true values exactly, except as the average of many simulations [and even then MLE is only asymptotically unbiased, i.e. for large data sets ...]
The proximal problem is that trying to evaluate dbinom() with a non-integer value gives NA. The full output from your model fit would have been:
Error in optim(par = c(a = -2.109, b = 0.02), fn = function (p) :
initial value in 'vmmin' is not finite
In addition: There were 50 or more warnings (use warnings() to see the first 50)
It's always a good idea to check those additional warnings ... in this case they are all of the form
1: In dbinom(x = c(1, 1, 1, 0.8, 1, 1, 1, 1, 1, 1, 1, 0.8, ... :
non-integer x = 0.800000
which might have given you a clue ...
PS you can use qlogis() and plogis() from base R for your link and inverse-link functions ...

MLE - Optimization with constraints as non-linear functions of the variables

I have a problem with the following optimization problem. In particular, I would like to add the following constraint to the MLE problem: (x - location)/scale > 0. Without this constraint, the LL is Inf and the L-BGFS-B optimization gives the following error
library(PearsonDS)
x <- rpearsonIII(n=1000, shape = 5, location = 6, scale = 7)
dpearson3 <- function (x, shape, location, scale, log = FALSE)
{
gscale <- abs(scale)
ssgn <- sign(scale)
density <- dgamma(ssgn * (x - location), shape = shape, scale = gscale, log = log)
return(density)
}
LL3 <- function(theta, x, display)
{
shape <- as.numeric(theta[1])
location <- as.numeric(theta[2])
scale <- as.numeric(theta[3])
tmp <- -sum(log(dpearson3(x, shape, location, scale, log = FALSE)))
if (is.na(tmp)) +Inf else tmp
if(display == 1){print(c(tmp, theta))}
return(sum(tmp))
}
control.list <- list(maxit = 100000, factr=1e-12, fnscale = 1)
fit <- optim(par = param,
fn = LL3,
hessian = TRUE,
method = "L-BFGS-B",
lower = c(0,-Inf,-Inf),
upper = c(Inf,Inf,Inf),
control = control.list,
x = x, display = 1)
Assume that I start the search from
param <- c(100,1000,10), I get the following error
Error in optim(par = param, fn = LL3, hessian = TRUE, method = "L-BFGS-B", :
L-BFGS-B needs finite values of 'fn'
How to solve the issue?
Changing the MLE function to
LL3 <- function(theta, x, display){
shape <- as.numeric(theta[1])
location <- as.numeric(theta[2])
scale <- as.numeric(theta[3])
tmp <- -sum(log(dpearson3(x, shape, location, scale, log = FALSE)))
if(min((x-location)/scale) < 0) tmp = + 100000000000 # I added this line
if (is.na(tmp)) +Inf else tmp
if(display == 1){print(c(tmp, theta))}
return(tmp)
}
is the smartest thing I could find. In this way I avoid the Inf problem. Any better answer?

Error when running mle2 function (bbmle)

I am receiving the following error when running the mle2() function from the bbmle package in R:
some parameters are on the boundary: variance-covariance calculations based on Hessian may be unreliable
I am trying to understand if this is due to a problem with my data or an issue with calling the function properly. Unfortunately, I cannot post my real data, so I am using a similar working example of the same sample size.
The custom dAction function I am using is a softmax function. There have to be upper and lower bounds on the optimization so I am using the L-BFGS-B method.
library(bbmle)
set.seed(3939)
### Reproducible data
dat1 <- rnorm(30, mean = 3, sd = 1)
dat2 <- rnorm(30, mean = 3, sd = 1)
dat1[c(1:3, 5:14, 19)] <- 0
dat2[c(4, 15:18, 20:22, 24:30)] <- 0
### Data variables
x <- sample(1:12, 30, replace = TRUE)
pe <- dat1
ne <- dat2
### Likelihood
dAction <- function(x, a, b, t, pe, ne, log = FALSE) {
u <- exp(((x - (a * ne) - (b * pe)) / t))
prob <- u / (1 + u)
if(log) return(prob) else return(-sum(log(prob)))
}
### Fit
fit <- mle2(dAction,
start = list(a = 0.1, b = 0.1, t = 0.1),
data = list(x = x, pe = pe, ne = ne),
method = "L-BFGS-B",
lower = c(a = 0.1, b = 0.1, t = 0.1),
upper = c(a = 10, b = 1, t = 10))
Warning message:
In mle2(dAction, start = list(a = 0.1, b = 0.1, t = 0.1), data = list(x = x, :
some parameters are on the boundary: variance-covariance calculations based on Hessian may be unreliable
Here are the results for summary():
summary(fit)
Maximum likelihood estimation
Call:
mle2(minuslogl = dAction, start = list(a = 0.1, b = 0.1, t = 0.1),
method = "L-BFGS-B", data = list(x = x, pe = pe, ne = ne),
lower = c(a = 0.1, b = 0.1, t = 0.1), upper = c(a = 10, b = 1,
t = 10))
Coefficients:
Estimate Std. Error z value Pr(z)
a 0.1 NA NA NA
b 0.1 NA NA NA
t 0.1 NA NA NA
-2 log L: 0.002048047
Warning message:
In sqrt(diag(object#vcov)) : NaNs produced
And the results for the confidence intervals
confint(fit)
Profiling...
2.5 % 97.5 %
a NA 1.0465358
b NA 0.5258828
t NA 1.1013322
Warning messages:
1: In sqrt(diag(object#vcov)) : NaNs produced
2: In .local(fitted, ...) :
Non-positive-definite Hessian, attempting initial std err estimate from diagonals
I don't entirely understand the context of your problem, but:
The issue (whether it is a real problem or not depends very much on the aforementioned context that I don't understand) has to do with your constraints. If we do the fit without the constraints:
### Fit
fit <- mle2(dAction,
start = list(a = 0.1, b = 0.1, t = 0.1),
data = list(x = x, pe = pe, ne = ne))
## method = "L-BFGS-B",
## lower = c(a = 0.1, b = 0.1, t = 0.1),
## upper = c(a = 10, b = 1, t = 10))
we get coefficients that are below your bounds.
coef(fit)
a b t
0.09629301 0.07724332 0.02405173
If this is correct, at least one of the constraints is going to be active (i.e. when we fit with lower bounds, at least one of our parameters will hit the bounds - in fact, it's all of them). When fits are on the boundary, the simplest machinery for computing confidence intervals (Wald intervals) doesn't work. However, this doesn't affect the profile confidence interval estimates you report above. These are correct - the lower bounds are reported as NA because the lower confidence limit is at the boundary (you can replace these by 0.1 if you like).
If you didn't expect the optimal fit to be on the boundary, then I don't know what's going on, maybe a data issue.
Your log-likelihood function is not wrong, but it's a little confusing because you have a log argument that returns the negative log-likelihood when log=FALSE (default) and the likelihood when log=TRUE. Before I realized that, I rewrote the function (I also made it a little more numerically stable by doing computations on the log scale wherever possible).
dAction <- function(x, a, b, t, pe, ne) {
logu <- (x - (a * ne) - (b * pe)) / t
lprob <- logu - log1p(exp(logu))
return(-sum(lprob))
}

Automatically solve an equation of `pt` for `ncp`

I wonder if it is possible to efficiently change ncp in the below code such that x becomes .025 and .975 (within rounding error).
x <- pt(q = 5, df = 19, ncp = ?)
----------
Clarification
q = 5 and df = 19 (above) are just two hypothetical numbers, so q and df could be any other two numbers. What I expect is a function / routine, that takes q and df as input.
What is wrong with uniroot?
f <- function (ncp, alpha) pt(q = 5, df = 19, ncp = ncp) - alpha
par(mfrow = c(1,2))
curve(f(ncp, 0.025), from = 5, to = 10, xname = "ncp", main = "0.025")
abline(h = 0)
curve(f(ncp, 0.975), from = 0, to = 5, xname = "ncp", main = "0.975")
abline(h = 0)
So for 0.025 case, the root lies in (7, 8); for 0.975 case, the root lies in (2, 3).
uniroot(f, c(7, 8), alpha = 0.025)$root
#[1] 7.476482
uniroot(f, c(2, 3), alpha = 0.975)$root
#[1] 2.443316
---------
(After some discussion...)
OK, now I see your ultimate goal. You want to implement this equation solver as a function, with input q and df. So they are unknown, but fixed. They might come out of an experiment.
Ideally if there is an analytical solution, i.e., ncp can be written as a formula in terms of q, df and alpha, that would be so great. However, this is not possible for t-distribution.
Numerical solution is the way, but uniroot is not a great option for this purpose, as it relies on "plot - view - guess - specification". The answer by loki is also crude but with some improvement. It is a grid search, with fixed step size. Start from a value near 0, say 0.001, and increase this value and check for approximation error. We stop when this error fails to decrease.
This really initiates the idea of numerical optimization with Newton-method or quasi-Newton method. In 1D case, we can use function optimize. It does variable step size in searching, so it converges faster than a fixed step-size searching.
Let's define our function as:
ncp_solver <- function (alpha, q, df) {
## objective function: we minimize squared approximation error
obj_fun <- function (ncp, alpha = alpha, q = q, df = df) {
(pt(q = q, df = df, ncp = ncp) - alpha) ^ 2
}
## now we call `optimize`
oo <- optimize(obj_fun, interval = c(-37.62, 37.62), alpha = alpha, q = q, df = df)
## post processing
oo <- unlist(oo, use.names = FALSE) ## list to numerical vector
oo[2] <- sqrt(oo[2]) ## squared error to absolute error
## return
setNames(oo, c("ncp", "abs.error"))
}
Note, -37.62 / 37.62 is chosen as lower / upper bound for ncp, as it is the maximum supported by t-distribution in R (read ?dt).
For example, let's try this function. If you, as given in your question, has q = 5 and df = 19:
ncp_solver(alpha = 0.025, q = 5, df = 19)
# ncp abs.error
#7.476472e+00 1.251142e-07
The result is a named vector, with ncp and absolute approximation error.
Similarly we can do:
ncp_solver(alpha = 0.975, q = 5, df = 19)
# ncp abs.error
#2.443347e+00 7.221928e-07
----------
Follow up
Is it possible that in the function ncp_solver(), alpha takes a c(.025, .975) together?
Why not wrapping it up for a "vectorization":
sapply(c(0.025, 0.975), ncp_solver, q = 5, df = 19)
# [,1] [,2]
#ncp 7.476472e+00 2.443347e+00
#abs.error 1.251142e-07 7.221928e-07
How come 0.025 gives upper bound of confidence interval, while 0.975 gives lower bound of confidence interval? Should this relationship reversed?
No surprise. By default pt computes lower tail probability. If you want the "right" relationship, set lower.tail = FALSE in pt:
ncp_solver <- function (alpha, q, df) {
## objective function: we minimize squared approximation error
obj_fun <- function (ncp, alpha = alpha, q = q, df = df) {
(pt(q = q, df = df, ncp = ncp, lower.tail = FALSE) - alpha) ^ 2
}
## now we call `optimize`
oo <- optimize(obj_fun, interval = c(-37.62, 37.62), alpha = alpha, q = q, df = df)
## post processing
oo <- unlist(oo, use.names = FALSE) ## list to numerical vector
oo[2] <- sqrt(oo[2]) ## squared error to absolute error
## return
setNames(oo, c("ncp", "abs.error"))
}
Now you see:
ncp_solver(0.025, 5, 19)[[1]] ## use "[[" not "[" to drop name
#[1] 2.443316
ncp_solver(0.975, 5, 19)[[1]]
#[1] 7.476492
--------
Bug report and fix
I was reported that the above ncp_solver is unstable. For example:
ncp_solver(alpha = 0.025, q = 0, df = 98)
# ncp abs.error
#-8.880922 0.025000
But on the other hand, if we double check with uniroot here:
f <- function (ncp, alpha) pt(q = 0, df = 98, ncp = ncp, lower.tail = FALSE) - alpha
curve(f(ncp, 0.025), from = -3, to = 0, xname = "ncp"); abline(h = 0)
uniroot(f, c(-2, -1.5), 0.025)$root
#[1] -1.959961
So there is clearly something wrong with ncp_solver.
Well it turns out that we can not use too big bound, c(-37.62, 37.62). If we narrow it to c(-35, 35), it will be alright.
Also, to avoid tolerance problem, we can change objective function from squared error to absolute error:
ncp_solver <- function (alpha, q, df) {
## objective function: we minimize absolute approximation error
obj_fun <- function (ncp, alpha = alpha, q = q, df = df) {
abs(pt(q = q, df = df, ncp = ncp, lower.tail = FALSE) - alpha)
}
## now we call `optimize`
oo <- optimize(obj_fun, interval = c(-35, 35), alpha = alpha, q = q, df = df)
## post processing and return
oo <- unlist(oo, use.names = FALSE) ## list to numerical vector
setNames(oo, c("ncp", "abs.error"))
}
ncp_solver(alpha = 0.025, q = 0, df = 98)
# ncp abs.error
#-1.959980e+00 9.190327e-07
Damn, this is a pretty annoying bug. But relax now.
Report on getting warning messages from pt
I also receive some report on annoying warning messages from pt:
ncp_solver(0.025, -5, 19)
# ncp abs.error
#-7.476488e+00 5.760562e-07
#Warning message:
#In pt(q = q, df = df, ncp = ncp, lower.tail = FALSE) :
# full precision may not have been achieved in 'pnt{final}'
I am not too sure what is going on here, but meanwhile I did not observe misleading result. Therefore, I decide to suppress those warnings from pt, using suppressWarnings:
ncp_solver <- function (alpha, q, df) {
## objective function: we minimize absolute approximation error
obj_fun <- function (ncp, alpha = alpha, q = q, df = df) {
abs(suppressWarnings(pt(q = q, df = df, ncp = ncp, lower.tail = FALSE)) - alpha)
}
## now we call `optimize`
oo <- optimize(obj_fun, interval = c(-35, 35), alpha = alpha, q = q, df = df)
## post processing and return
oo <- unlist(oo, use.names = FALSE) ## list to numerical vector
setNames(oo, c("ncp", "abs.error"))
}
ncp_solver(0.025, -5, 19)
# ncp abs.error
#-7.476488e+00 5.760562e-07
OK, quiet now.
You could use two while loops like this:
i <- 0.001
lowerFound <- FALSE
while(!lowerFound){
x <- pt(q = 5, df = 19, ncp = i)
if (round(x, 3) == 0.025){
lowerFound <- TRUE
print(paste("Lower is", i))
lower <- i
} else {
i <- i + 0.0005
}
}
i <- 0.001
upperFound <- FALSE
while(!upperFound){
x <- pt(q = 5, df = 19, ncp = i)
if (round(x, 3) == 0.975){
upperFound <- TRUE
print(paste("Upper is ", i))
upper <- i
} else {
i <- i + 0.0005
}
}
c(Lower = lower, Upper = upper)
# Lower Upper
# 7.4655 2.4330
Of course, you can adapt the increment in i <- i + .... or change the check if (round(x,...) == ....) to fit this solution to your specific needs of accuracy.
I know this is an old question, but there is now a one-line solution to this problem using the conf.limits.nct() function in the MBESS package.
install.packages("MBESS")
library(MBESS)
result <- conf.limits.nct(t.value = 5, df = 19)
result
$Lower.Limit
[1] 2.443332
$Prob.Less.Lower
[1] 0.025
$Upper.Limit
[1] 7.476475
$Prob.Greater.Upper
[1] 0.025
$Lower.Limit is the result where pt = 0.975
$Upper.Limit is the result where pt = 0.025
pt(q=5,df=19,ncp=result$Lower.Limit)
[1] 0.975
> pt(q=5,df=19,ncp=result$Upper.Limit)
[1] 0.025

Resources