"non-finite value supplied by optim" when using fitCopula - r

when I try to do an AIC test on different copulas, R keeps giving me this error message.
Error in optim(start, logL, lower = lower, upper = upper, method = optim.method, :
non-finite value supplied by optim
but in my code, I didn't use the function optim and some give the other warnings.
Warning in fitCopula.ml(copula, u = data, method = method, start = start, : possible convergence problem: optim() gave code=52
The error message gives the NA result while the warning message gives the number that seems on the right track.
here are my codes.
AIC.result <- function(EC,copulafunction){
AIC<- matrix(nrow=length(colnames(EC)),ncol=length(colnames(EC)),byrow=T)
for (i in 1:length(colnames(EC))) {
for (j in 1:length(colnames(EC))) {
if(i==j){
AIC[i,j] <-1
}else{
u <- pobs(as.matrix(EC[,i]))
v <- pobs(as.matrix(EC[,j]))
fit<- fitCopula(copulafunction, cbind(u,v),method="ml")
AIC[i,j] <-AIC(fit)
}
}
}
mean((AIC-length(colnames(EC)))/2)
}
EC is the returns of different countries, and copulafunction is different type of copulas. And the Clayton copula and rotated Clayton copula give the error message while the rest gives the warning messages. The weirdest thing is in my case, EC contains 7 countries and it worked smoothly. When I applied to the DC which has 6 countries, the errors and warnings came. Is anyone know why?

First of all, if you want to only find the AIC for your model, then I think the fitCopula function returns it to you by default. If not then, the easy and direct way is to use the BICopEst function from the R package VineCopula. It returns the AIC and BIC. The error message is may due to fitting a wrong copula to your data, which sometimes leads the function to not converge, hence the error or warning. So, you should try another copula family. The best way to select the most appropriate copula for your data is to apply the BICselect() function in the VineCopula package. It will select the best bivariate copula among a wide range of a list based on AIC. Hence, it works for your case. Also, you can set another selection criteria supplied in the function.

Related

Evir :: gev() optim non-finite finite difference error with certain data type

I am trying to fit a distribution to my max scores to get a significance threshold. I am using evir::gev() to do so. When I pull my values directly from the object I have, which contains the extreme values, this fitting method throws an error. If I import the values as a vector that I have defined by hand, there is no error. From what I can tell, the data object is virtually the same in both runs, but clearly is being handled differently. (both are a vector of doubles)
This code works:
data<- c(5.401319,6.580631,6.120880,5.686255,6.640302,6.990672,5.797920,6.902248,5.694203,6.853788)
print(data)
typeof(data)
fit<- evir::gev(data)
This does not::
data<- permuted_scans$max.statistics$LOD
print(data)
typeof(data)
fit<- evir::gev(data)
Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = data) : non-finite finite-difference value 1
R Notebook of error
I'm not sure about where the error was stemming from, exactly, but when I simulated more datapoints for my permuted scans, the error disappeared. I assume the gev() function was fitting poorly with the small sample size (only 10 values) and was getting negative numbers. I'm not sure why the error did not appear with the vector inputted by hand.

MLE using nlminb in R - understand/debug certain errors

This is my first question here, so I will try to make it as well written as possible. Please be overbearing should I make a silly mistake.
Briefly, I am trying to do a maximum likelihood estimation where I need to estimate 5 parameters. The general form of the problem I want to solve is as follows: A weighted average of three copulas, each with one parameter to be estimated, where the weights are nonnegative and sum to 1 and also need to be estimated.
There are packages in R for doing MLE on single copulas or on a weighted average of copulas with fixed weights. However, to the best of my knowledge, no packages exist to directly solve the problem I outlined above. Therefore I am trying to code the problem myself. There is one particular type of error I am having trouble tracing to its source. Below I have tried to give a minimal reproducible example where only one parameter needs to be estimated.
library(copula)
set.seed(150)
x <- rCopula(100, claytonCopula(250))
# Copula density
clayton_density <- function(x, theta){
dCopula(x, claytonCopula(theta))
}
# Negative log-likelihood function
nll.clayton <- function(theta){
theta_trans <- -1 + exp(theta) # admissible theta values for Clayton copula
nll <- -sum(log(clayton_density(x, theta_trans)))
return(nll)
}
# Initial guess for optimization
guess <- function(x){
init <- rep(NA, 1)
tau.n <- cor(x[,1], x[,2], method = "kendall")
# Guess using method of moments
itau <- iTau(claytonCopula(), tau = tau.n)
# In case itau is negative, we need a conditional statement
# Use log because it is (almost) inverse of theta transformation above
if (itau <= 0) {
init[1] <- log(0.1) # Ensures positive initial guess
}
else {
init[1] <- log(itau)
}
}
estimate <- nlminb(guess(x), nll.clayton)
(parameter <- -1 + exp(estimate$par)) # Retrieve estimated parameter
fitCopula(claytonCopula(), x) # Compare with fitCopula function
This works great when simulating data with small values of the copula parameter, and gives almost exactly the same answer as fitCopula() every time.
For large values of the copula parameter, such as 250, the following error shows up when I run the line with nlminb():"Error in .local(u, copula, log, ...) : parameter is NA
Called from: .local(u, copula, log, ...)
Error during wrapup: unimplemented type (29) in 'eval'"
When I run fitCopula(), the optimization is finished, but this message pops up: "Warning message:
In dlogcdtheta(copula, u) :
dlogcdtheta() returned NaN in column(s) 1 for this explicit copula; falling back to numeric derivative for those columns"
I have been able to find out using debug() that somewhere in the optimization process of nlminb, the parameter of interest is assigned the value NaN, which then yields this error when dCopula() is called. However, I do not know at which iteration it happens, and what nlminb() is doing when it happens. I suspect that perhaps at some iteration, the objective function is evaluated at Inf/-Inf, but I do not know what nlminb() does next. Also, something similar seems to happen with fitCopula(), but the optimization is still carried out to the end, only with the abovementioned warning.
I would really appreciate any help in understanding what is going on, how I might debug it myself and/or how I can deal with the problem. As might be evident from the question, I do not have a strong background in coding. Thank you so much in advance to anyone that takes the time to consider this problem.
Update:
When I run dCopula(x, claytonCopula(-1+exp(guess(x)))) or equivalently clayton_density(x, -1+exp(guess(x))), it becomes apparent that the density evaluates to 0 at several datapoints. Unfortunately, creating pseudobservations by using x <- pobs(x) does not solve the problem, which can be see by repeating dCopula(x, claytonCopula(-1+exp(guess(x)))). The result is that when applying the logarithm function, we get several -Inf evaluations, which of course implies that the whole negative log-likelihood function evaluates to Inf, as can be seen by running nll.clayton(guess(x)). Hence, in addition to the above queries, any tips on handling log(0) when doing MLE numerically is welcome and appreciated.
Second update
Editing the second line in nll.clayton as follows seems to work okay:
nll <- -sum(log(clayton_density(x, theta_trans) + 1e-8))
However, I do not know if this is a "good" way to circumvent the problem, in the sense that it does not introduce potential for large errors (though it would surprise me if it did).

R: Using fitdistrplus to fit curve over histogram of discrete data

So I have this discrete set of data my_dat that I am trying to fit a curve over to be able to generate random variables based on my_dat. I had great success using fitdistrplus on continuous data but have many errors when attempting to use it for discrete data.
Table settings:
library(fitdistrplus)
my_dat <- c(2,5,3,3,3,1,1,2,4,6,
3,2,2,8,3,4,3,3,4,4,
2,1,5,3,1,2,2,4,3,4,
2,4,1,6,2,3,2,1,2,4,
5,1,2,3,2)
I take a look at the histogram of the data first:
hist(my_dat)
Since the data's discrete, I decide to try a binomial distribution or the negative binomial distribution to fit and this is where I run into trouble: Here I try to define each:
fitNB3 <- fitdist(my_dat, discrete = T, distr = "nbinom" ) #NaNs Produced
fitB3 <- fitdist(my_dat, discrete = T, distr = "binom")
I receive two errors:
fitNB3 seems to run but notes that "NaNs Produced" - can anyone let me
know why this is the case?
fitB3 doesn't run at all and provides me with the error: "Error in start.arg.default(data10, distr = distname) : Unknown starting values for distribution binom." - can anyone point out why this won't work here? I am unclear about providing a starting number given that the data is discrete (I attempted to use start = 1 in the fitdist function but I received another error: "Error in fitdist(my_dat, discrete = T, distr = "binom", start = 1) : the function mle failed to estimate the parameters, with the error code 100"
I've been spinning my wheels for a while on this but I would be take any feedback regarding these errors.
Don't use hist on discrete data, because it doesn't do what you think it's doing.
Compare plot(table(my_dat)) with hist(my_dat)... and then ponder how many wrong impressions you've gotten doing this before. If you must use hist, make sure you specify the breaks, don't rely on defaults designed for continuous variables.
hist(my_dat)
lines(table(my_dat),col=4,lwd=6,lend=1)
Neither of your models can be suitable as both these distributions start from 0, not 1, and with the size of values you have, p(0) will not be ignorably small.
I don't get any errors fitting the negative binomial when I run your code.
The issue you had with fitting the binomial is you need to supply starting values for the parameters, which are called size (n) and prob (p), so
you'd need to say something like:
fitdist(my_dat, distr = "binom", start=list(size=15, prob=0.2))
However, you will then get a new problem! The optimizer assumes that the parameters are continuous and will fail on size.
On the other hand this is probably a good thing because with unknown n MLE is not well behaved, particularly when p is small.
Typically, with the binomial it would be expected that you know n. In that case, estimation of p could be done as follows:
fitdist(my_dat, distr = "binom", fix.arg=list(size=20), start=list(prob=0.15))
However, with fixed n, maximum likelihood estimation is straightforward in any case -- you don't need an optimizer for that.
If you really don't know n, there are a number of better-behaved estimators than the MLE to be found, but that's outside the scope of this question.

Forward procedure with BIC

I'm trying to select variables for a linear model with forward stepwise algorithm and BIC criterion. As the help file indicates and as I always did, I wrote the following:
model.forward<-lm(y~1,data=donnees)
model.forward.BIC<-step(model.forward,direction="forward", k=log(n), scope=list(lower = ~1, upper = ~x1+x2+x3), data=donnees)
with k=log(n) indicating I'm using BIC. But R returns:
Error in extractAIC.lm(fit, scale, k = k, ...) : object 'n' not found
I never really asked myself the question before but I think that n is supposed to be defined in function step(it s the number of variables in the model at each iteration).... Anyway, the issue never happened to me before! Restarting R doesn't change anything and I admit I have no idea of what can cause this error.
Here is some code to test:
y<-runif(20,0,10)
x1<-runif(20,0,1)
x2<-y+runif(20,0,5)
x3<-runif(20,0,1)-runif(20,0,1)*y
donnees<-data.frame(x1,x2,x3,y)
Any ideas?
step(model.forward,direction="forward",
k=log(nrow(donnees)), scope=list(lower = ~1, upper = ~x1+x2+x3),
data=donnees)
or more generally ...
... k=log(nobs(model.forward)) ...
(for example, if there are NA values in your data, then nobs(model.forward) will be different from nrow(donnees). On the other hand, if you have NA values in your predictors, you're going to run into trouble when doing model selection anyway.)

Estimating Weibull density parameters (error: "...initial value in 'vmmin' is not finite")

I am trying to estimate the shape and scale of a data set.
I used two different ways and for both I got an error message:
First, I tried by moments using the survey package:
survreg(Surv(all.ws)~1, dist="weibull")
I got the error message:
invalid survival times for this distribution
Second, I tried using fitdistr() function:
fitdistr(all.ws, densfun=dweibull, start=list(scale=1, shape=2))
I got an error message:
Error in optim(x=c(2.2, 2.1,1.9....:
initial value in 'vmmin' is not finite
What is wrong with what I am doing?
A google search "fitdistr Weibull Error" shows this exact question was discussed a year ago on the R-help mailing list: http://r.789695.n4.nabble.com/Problems-with-fitdistr-td1334772.html
Some points from that link:
zeros in your data will cause problems
use pelwei() function from package lmom
I had a similar problem when using fitdistr() with a Beta distribution. In that case both ones and zeros in the data produced this error.
Additionally I found that when the limit argument is used in the fitdistr() call, a different error is produced:
eg (where x is a vector of samples containing a 1.0 or 0):
fitdistr(x, "beta", list(shape1 = 1, shape2 = 0.2),lower=0.001)
`Error in stats::optim: L-BFGS-B needs finite values of 'fn'`

Resources