I have a dataframe where a column is a mix of positive and negative numbers and the first entry is NA. I'm trying to run the shape function as
shape(data$col, models = 30, start = 30, end = 400, ci=.90,reverse = TRUE,auto.scale = TRUE)
where the data in 'col' is [NA, -0.2663194135, -3.7665034719, -0.2072122334, 1.5721742718, -9.142419, -8.954330, -5.167314, 11.805930, 9.533830, 7.065835]
but I get an error that says
Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
non-finite value supplied by optim
Can someone help me figure out what it means? I've googled it but haven't found anything concrete
It's not clear what you are trying to do here. Calling shape allows you to see how altering the threshold or nextremes parameters in the gpd function will alter the xi parameter of the resulting generalised Pareto distribution model.
There are a few reasons why the example you supplied doesn't work. Let's first of all show an example of what does work. The exponential distribution is a special case of a GPD with mu = 0 and xi = 0, so a sample drawn from the exponential distribution should do the trick:
library(evir) # For the shape() function
set.seed(69) # Makes this example reproducible
x <- rexp(300) # Random sample of 300 elements drawn from exponential distribution
shape(x)
Fine.
However, your sample contains an NA. What happens if we make a single value NA in our sample?
x[1] <- NA
shape(x)
#> Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
#> non-finite value supplied by optim
So, no NAs allowed.
Unfortunately, you will find that you still get the same error if you remove your NA value. There are two reasons for this. Firstly, you have 9 non-NA samples. What happens if we try a length-9 exponential sample?
shape(rexp(9))
#> Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
#> non-finite finite-difference value [1]
We will find that the model will fail to fit with fewer than about 16 data points.
But that's not the only problem. What if we try to get a plot for data that can't be drawn from a generalized Pareto distribution?
# Maybe a uniform distribution?
shape(runif(300, 1, 10))
#> Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
#> non-finite finite-difference value [1]
#> In addition: Warning message:
#> In sqrt(diag(varcov)) : NaNs produced
#>
So in effect, you need a bigger sample with no NAs, and it needs to conform approximately to a GPD, otherwise the gpd function will throw an error.
I might be able to help if you let us know the bigger picture of what you are trying to do.
Related
I would need to calculate the parameters of a Fréchet distribution.
I am using the packages fitdistrplus and evd of R. But I don't know
what values to initialize the parameters.
library(fitdistrplus)
library(evd)
#Datos
x<-c(19.1,20.2,14.3,19.0,18.8,18.5,20.0,18.6,11.4,15.6,17.4,16.2,15.7,14.3,14.9,14.0,20.2,17.4,18.6,17.0,16.0,12.2,10.8,12.4,10.2,19.8,23.4)
fit.frechet<-fitdist(x,"frechet")
fit.frechet<-fitdist(x,"frechet")
generating the following error
Error in computing default starting values.
Error in manageparam(start.arg = start, fix.arg = fix.arg, obs = data, :
Error in start.arg.default(obs, distname) :
Unknown starting values for distribution frechet. `
When starting the parameters:
fit.frechet2<-fitdist(x,"frechet", start = list(loc=0,scale=1, shape=1))
Output:
Warning messages:
1: In fitdist(x, "frechet", start = list(loc = 0, scale = 1, shape = 1)) :
The dfrechet function should return a vector of with NaN values when input has inconsistent parameters and not raise an error
2: In fitdist(x, "frechet", start = list(loc = 0, scale = 1, shape = 1)) :
The pfrechet function should return a vector of with NaN values when input has inconsistent parameters and not raise an error
3: In sqrt(diag(varcovar)) : NaNs produced
4: In sqrt(1/diag(V)) : NaNs produced
5: In cov2cor(varcovar) :
diag(.) had 0 or NA entries; non-finite result is doubtful
Fitting of the distribution ' frechet ' by maximum likelihood
Parameters:
estimate Std. Error
loc -12128345 40.10705
scale 12128360 40.10705
shape 3493998 NaN
How can I estimate the parameters of the frechet in R?
Well, you could try limit your values and start with some reasonable estimates
F.e.
fit.frechet<-fitdist(x, "frechet", method = "mle", lower = c(0, 0, 0), start = list(loc=1,scale=12, shape=4))
will produce couple of expected warnings, and
print(fit.frechet)
will print somewhat reasonable values
loc 2.146861e-07
scale 1.449643e+01
shape 4.533351e+00
with plot of fit vs empirical
plot(fit.frechet,demp=TRUE)
UPDATE
I would say that Frechet might not be a good fit for your data. I tried Weibull and it looks a lot better, check it yourself
fit.weibull<-fitdist(x, "weibull", method = "mle", lower = c(0, 0))
print(fit.weibull)
plot(fit.weibull, demp=TRUE)
Output is
shape 5.865337
scale 17.837188
One could note that scale parameter is kind of similar and could have been guessed just from histogram. Plot for Weibull fit, given the data it looks quite good
I am trying to fit a panel spatial model in R using the package spml. I first define the NxN weighting matrix as follows
neib <- dnearneigh(coordinates(coord), 0, 50, longlat = TRUE)
dlist <- nbdists(neib, coordinates(coord))
idlist <- lapply(dlist, function(x) 1/x)
w50 <- nb2listw(neib,zero.policy=TRUE, glist=idlist, style="W")
Thus I define two observations to be neighbours if they are distant within a range of 50km at most. The weights attached to each pairs of neighbour observations correspond to the inverse of their distance, so that closer neighbours receive higher weights. I also use the option zero.policy=TRUE so that observations which do not have neighbours are associated with a vector of zero weights.
Once I do this I try to fit the panel spatial model in the following way
mod <- spml(y ~ x , data = data_p, listw = w50, na.action = na.fail, lag = F, spatial.error = "b", model = "within", effect = "twoways" ,zero.policy=TRUE)
but I get the following error and warning messages
Error in lag.listw(listw, u) : Variable contains non-finite values In
addition: There were 50 or more warnings (use warnings() to see the
first 50)
Warning messages: 1: In mean.default(X[[i]], ...) : argument is not
numeric or logical: returning NA
...
50: In mean.default(X[[i]], ...) : argument is not numeric or
logical: returning NA
I believe this to be related to the non-neighbour observations. Can please anyone help me with this? Is there any way to deal with non-neighbour observations besides the zero.policy option?
Many many thanks for helping me.
You should check two things:
1) Make sure that the weight matrix is row-normalized.
2) Treat properly if you have any NA values in the dataset and as well in the W matrix.
I'm trying to find the MLE of distribution whose pdf is specified as 'mixture' in the code. I've provided the code below that gives an error of
"Error in optim(start, f, method = method, hessian = TRUE, ...) :
L-BFGS-B needs finite values of 'fn'"
"claims" is the dataset im using. I tried the same code with just the first two values of "claims" and encountered the same problem, so for a reproducible example the first two values are 1536.77007 and 1946.92409.
The limits on the parameters of the distribution is that 0<.p.<1 and a>0 and b>0, hence the lower and upper bounds in the MLE function. Any help is much appreciated.
#create mixture of two exponential distribution
mixture<-function(x,p,a,b){
d<-p*a*exp(-a*x)+(1-p)*b*exp(-b*x)
d
}
#find MLE of mixture distribution
LL <- function(p,a,b) {
X = mixture(claims,p,a,b)
#
-sum(log(X))
}
mle(LL, start = list(p=0.5,a=1/100,b=1/100),method = "L-BFGS-B", lower=c(0,0,0), upper=c(1,Inf,Inf))
edit: Not really sure why dput(), but anyway,
#first two values of claims put into dput() (the actual values are above)
dput(claims[1:2])
c(307522.103, 195633.5205)
I want to estimate the coefficients for an AR process based on weekly data where the lags occur at t-1, t-52, and t-53. I will naturally lose a year of data to do this.
I currently tried:
lags <- rep(0,54)
lags[1]<- NA
lags[52] <- NA
lags[53] <- NA
testResults <- arima(data,order=c(53,0,0),fixed=lags)
Basically I tried using an ARIMA and shutting off the MA/differencing. I used 0's for the terms I wanted to exclude (plus intercept, and NAs for the terms I wanted.
I get the following error:
Error in optim(init[mask], armafn, method = optim.method, hessian =TRUE, :
non-finite finite-difference value [1]
In addition: Warning message:
In arima(data, order = c(53, 0, 0), fixed = lags) :
some AR parameters were fixed: setting transform.pars = FALSE
I'm hoping there is an easier method or potential solution to this error. I want to avoid creating columns with the lagged variables and simply running a regression. Thanks!
I'm going to find the parameters for a rank-logit model. But the error always shows that there are non-finite finite-difference value. If I change the
"b0<-rep(0,5)" to "b0<-rep(-1,5)", the number after non-finite finite-difference value changes from 2 to 1.
If you need the dataset, I will send it to you by email.
cjll <- function(b){
U <- X%*%b
lSU <- csm%*%exp(U)
lSU <- (lSU!=0)*lSU+(lSU==0)
LL <- sum(Ccsm%*%U-log(lSU))
return(LL)
}
b0 <- rep(0,5)
res <- optim(b0,cjll,method="BFGS",hessian=TRUE,control=list(fnscale=-1))
#Error in optim(b0, cjll, method = "BFGS", hessian = TRUE, control = list(fnscale = -1)) :
# non-finite finite-difference value [2]
b <- res$par
#Error: object 'res' not found
BFGS requires the gradient of the function being minimized. If you don't pass one it will try to use finite-differences to estimate it. Looking at your likelihood function, it could be that the fact that you "split" it by elements equal to 0 and not equal to 0 creates a discontinuity that prevents the numerical gradient from being properly formed. Try using method = "Nelder-Mead" and setting Hessian to FALSE and see if that works. If it does, you can then use the numDeriv package to estimate the gradient and Hessian at the point of convergence if you need them.