MLE - Optimization with constraints as non-linear functions of the variables - r

I have a problem with the following optimization problem. In particular, I would like to add the following constraint to the MLE problem: (x - location)/scale > 0. Without this constraint, the LL is Inf and the L-BGFS-B optimization gives the following error
library(PearsonDS)
x <- rpearsonIII(n=1000, shape = 5, location = 6, scale = 7)
dpearson3 <- function (x, shape, location, scale, log = FALSE)
{
gscale <- abs(scale)
ssgn <- sign(scale)
density <- dgamma(ssgn * (x - location), shape = shape, scale = gscale, log = log)
return(density)
}
LL3 <- function(theta, x, display)
{
shape <- as.numeric(theta[1])
location <- as.numeric(theta[2])
scale <- as.numeric(theta[3])
tmp <- -sum(log(dpearson3(x, shape, location, scale, log = FALSE)))
if (is.na(tmp)) +Inf else tmp
if(display == 1){print(c(tmp, theta))}
return(sum(tmp))
}
control.list <- list(maxit = 100000, factr=1e-12, fnscale = 1)
fit <- optim(par = param,
fn = LL3,
hessian = TRUE,
method = "L-BFGS-B",
lower = c(0,-Inf,-Inf),
upper = c(Inf,Inf,Inf),
control = control.list,
x = x, display = 1)
Assume that I start the search from
param <- c(100,1000,10), I get the following error
Error in optim(par = param, fn = LL3, hessian = TRUE, method = "L-BFGS-B", :
L-BFGS-B needs finite values of 'fn'
How to solve the issue?

Changing the MLE function to
LL3 <- function(theta, x, display){
shape <- as.numeric(theta[1])
location <- as.numeric(theta[2])
scale <- as.numeric(theta[3])
tmp <- -sum(log(dpearson3(x, shape, location, scale, log = FALSE)))
if(min((x-location)/scale) < 0) tmp = + 100000000000 # I added this line
if (is.na(tmp)) +Inf else tmp
if(display == 1){print(c(tmp, theta))}
return(tmp)
}
is the smartest thing I could find. In this way I avoid the Inf problem. Any better answer?

Related

How can I reproduce the dist function

I simulated a data frame of points, x and y values, for various calculations. The dist function works pretty well to calculate the distances between every possible combination. And I've been trying to reproduce a simplified version that only does that (getting a Euclidean distance matrix of a data frame) but it hasn't been working so far.
If I was entering the two columns, I would do something like this but I'm trying to use just one input, the data frame.
dist <- function(x,y) {
distance <- sqrt(sum((x - y)^2))
return(distance)
}
I've tried using the source code for dist but I can't figure out to strip away all the stuff I dont want without breaking it.
function (x, method = "euclidean", diag = FALSE, upper = FALSE,
p = 2)
{
if (!is.na(pmatch(method, "euclidian")))
method <- "euclidean"
METHODS <- c("euclidean", "maximum", "manhattan", "canberra",
"binary", "minkowski")
method <- pmatch(method, METHODS)
if (is.na(method))
stop("invalid distance method")
if (method == -1)
stop("ambiguous distance method")
x <- as.matrix(x)
N <- nrow(x)
attrs <- if (method == 6L)
list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag,
Upper = upper, method = METHODS[method], p = p,
call = match.call(), class = "dist")
else list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag,
Upper = upper, method = METHODS[method], call = match.call(),
class = "dist")
.Call(C_Cdist, x, method, attrs, p)
}
Is anyone able to point me to a viable first step? I'm really trying to learn how to program without always relying on pre-packaged functions.
You could use outer
df <- data.frame(x = rnorm(100), y = rnorm(100))
outer(df$x, df$y, function(x, y)sqrt((x - y)^2))

R function constrOptim can't return hessian matrix

While I try to return hessian metrix and use the method "BFGS", error will come out. the code and errors are in below.
square <- function (par, y){
return(- sum(dnorm(y, mean = par[1], sd = par[2], log = TRUE)))
}
ui <- c(1, -1)
ci <- c(0)
d.y <- rnorm(1000, 10, 6)
res <- constrOptim(theta = c(15, 5),square, grad = NULL, ui = ui, ci = ci, method = "BFGS", hessian = T, y = d.y)
error
Error in colSums(ui * gi.old/gi - ui) (constrOPtim.R#18): 'x' must be an array of at least two dimensions
I don't know if "BFGS" method needs more conditions. How can the program return the hessian matrix rightly whiling using "BFGS" method?

multinomial MLE error in R

I am new to R, Trying do MLE using mle2 in bbmle package.
R Code:
rm(list = ls())
library(bbmle)
N <- 100
testmat=rmultinom(N, size=3, prob = c(0.1,0.2,0.8))
LL<- function(s, p){-sum(dmultinom(x=testmat, size = s, prob=p, log = TRUE))}
values.start <- list(3, c(0.1,0.2,0.7))
names(values.start) <- parnames(LL) <- paste0("b",0:1)
mle2(LL, start =values.start)
I keep getting this error
"Error in mle2(LL, start = values.start) :
some named arguments in 'start' are not arguments to the specified log-likelihood function"
I am using mle2, I thought its not needed here. At first I was using "mle"
N <- 100
testmat=t(rmultinom(3, size=3, prob = c(0.1,0.2,0.8)))
LL<- function(s, p1,p2,p3){prob=unlist(as.list(environment()))[2:4]
-sum(dmultinom(x=testmat, size = s, prob=prob, log = TRUE))}
values.start <- list(s=3,p1=0.1,p2=0.2,p3=7)
mle(LL, start =values.start)
which game this error
""Error in dmultinom(x = testmat, size = s, prob = prob, log = TRUE) :
x[] and prob[] must be equal length vectors."
I even edited it as follows
N <- 100
testmat=t(rmultinom(3, size=3, prob = c(0.1,0.2,0.8)))
LL<- function(s=3, p1=0.1,p2=0.2,p3=0.7){
prob=unlist(as.list(environment()))[2:4]
s=unlist(as.list(environment()))[1]
-sum(dmultinom(x=testmat, size = s, prob=prob, log = TRUE))}
mle(LL)
error still persists. Finally I was able to decode the errors, thanks a lot.
library(bbmle)
N <- 1000
X=rmultinom(N,size=3,prob = rep(1/3, 3))
LL <- function( p_1 = 0.1,p_2=0.1,p_3=0.8) {
p <- unlist(as.list(environment()))
-sum(apply(X, MAR = 2, dmultinom, size = NULL, prob = c(p_1,p_2,p_3), log = TRUE))
}
mle(LL,method = "L-BFGS-B", lower = c(-Inf, 0), upper = c(Inf, Inf))
In my current ploblem, I have 5k features, therefore I need to write something like this.
function( p_1 = 0.1,p_2=0.1,p_3=0.8...., p_5000=..)
which not possible. Is there any way out of it?
I was able to do it with mle2. this way
rm(list = ls())
library(bbmle)
N <- 1000
s<-100
X=rmultinom(N,size=s,prob = rep(1/s, s))
LL= function(params){
p <- unlist(as.list(environment()))
minusll = -sum(apply(X, MAR = 2, dmultinom, size = NULL, prob = p, log = TRUE))
return(minusll)
}
values.start<-vector(mode="list", length=s)
values.start <- c(0.02,0.01*rep(98/99,99))
names(values.start) <- parnames(LL)<-paste0("b",1:s)
mle2(LL, start =values.start,vecpar = TRUE, method = "L-BFGS-B", lower = c(rep(0,s)), upper = c(rep(1,s)))
Above I was doing Multinomial MLE parameter estimation for dimension of 100, and 1000 samples. I was able to solve the problem of vector parameters. Now I am having this error
Error in optim(par = c(0.02, 0.0098989898989899, 0.0098989898989899, 0.0098989898989899, :
L-BFGS-B needs finite values of 'fn'
I found out that this error is due to 'fn=Inf', might be due to one of the propabilities becoming zero, therefore fn=-log(0) = Inf. Is there any way to solve this problem?
Thanks for the help.

Finite mixture of tweedie

I'm trying to estimate a finite mixture of tweedie (or compound Poisson-gamma) distributions. I have scoured any resources I can think of, without finding any resources on how to do this.
I am currently trying to use the flexmix package in R writing a different M-step driver, as outlined in the flexmix vignette on pages 12-14. Here is my code, which relies on the cplm package:
tweedieClust <- function(formula = .~.,offset = NULL){
require(tweedie)
require(cplm)
require(plyr)
require(dplyr)
retval <- new("FLXMC", weighted = TRUE, formula = formula, dist = "tweedie",
name = "Compound Poisson Clustering")
retval#defineComponent <- expression ({
predict <- function(x, ...) {
pr <- mu
}
logLik <- function(x, y, ...){
dtweedie(y, xi = p, mu = mu, phi = phi) %>%
log
}
new("FLXcomponent",
parameters=list(coef=coef),
logLik=logLik, predict=predict,
df=df)
})
retval#fit <- function (x, y, w, component) {
fit <- cpglm(formula = y ~ x, link = "log", weights=w, offset=offset)
with(list(coef = coef(fit), df = ncol(x),mu = fit$fitted.values,
p = fit$p, phi = fit$phi),
eval(retval#defineComponent))
}
retval
}
However, this results in the following error:
Error in dtweedie(y, xi = p, mu = mu, phi = phi) :
binary operation on non-conformable arrays
Has anyone done or seen a finite mixture of tweedie distributions? Can you point me in the right direction to accomplish this, using flexmix or otherwise?
The problem is somewhere in the weights part, if you remove it, it works:
tweedieClust <- function(formula = .~.,offset = NULL){
require(tweedie)
require(statmod)
require(cplm)
require(plyr)
require(dplyr)
retval <- new("FLXMC", weighted = F, formula = formula, dist = "tweedie",
name = "Compound Poisson Clustering")
retval#defineComponent <- expression ({
predict <- function(x, ...) {
pr <- mu
}
logLik <- function(x, y, ...){
dtweedie(y, xi = p, mu = mu, phi = phi) %>%
log
}
new("FLXcomponent",
parameters=list(mu=mu,xi=p,phi=phi),
logLik=logLik, predict=predict,
df=df)
})
retval#fit <- function (x, y, w, component) {
fit <- cpglm(formula = End~.,data=dmft, link = "log")
with(list(df = ncol(x), mu = fit$fitted.values,
p = fit$p, phi = fit$phi),
eval(retval#defineComponent))
}
retval
}
example:
library(flexmix)
data("dmft", package = "flexmix")
m1 <- flexmix(End ~ .,data=dmft, k = 4, model = tweedieClust())

Fixing an error in a function that uses `replicate()` in R?

I have an R function called RR. I'm wondering how to fix the following error:
Error in rbinom(1, size = n, prob = p) :
promise already under evaluation: recursive default argument reference or
earlier problems?
RR = function(n, p, n.sim){
fun <- function(n = n, p = p){
x <- rbinom(1, size = n, prob = p)
res <- binom.test(x, n, p)[[4]]
c(Lower = res[1], Upper = res[2])
}
sim <- t(replicate(n.sim, fun()))
mean(sim[,1] <= p & p<= sim[,2])
}
# Example of use:
RR(n = 15, p = .5, n.sim = 5)
R throws this error message when you define defaults for a function that have the same name as the function parameters, and then call that function from within another function with the same parameter names. So function(x = x) is generally not a good idea. If you just change fun to
fun <- function(n2 = n, p2 = p) your code runs without issues.
I do not completely understand myself why this happens, but it is easy to avoid.

Resources