mle failed to estimate the parameters with the error code 7 - r

I'm trying to estimate the Weibull-Gamma Distribution parameters, but I'm encountering the following error:
"the function mle failed to estimate the parameters, with the error
code 7"
What do I do?
The Weibull-Gamma Distribution
Density Function
dWeibullGamma <- function(x, alpha, beta, lambda)
{
((alpha*beta)/(lambda))*(x^(alpha-1))*(1+(1/lambda)*x^(alpha))^(-(beta+1))
}
Cumulative Distribution Function
pWeibullGamma <- function(x, alpha, beta, lambda)
{
1-(1+(1/lambda)*x^(alpha))^(-(beta))
}
Hazard Function
hWeibullGamma <- function(x, alpha, beta, lambda)
{
((alpha*beta)/(lambda))*(x^(alpha-1))*(1+(1/lambda)*x^(alpha))^(-(beta+1))/(1+(1/lambda)*x^(alpha))^(-(beta))
}
Survival Function
sWeibullGamma <- function(x,alpha,beta,lambda)
{
(1+(1/lambda)*x^(alpha))^(-(beta))
}
Estimation
paramWG = fitdist(data = dadosp, distr = 'WeibullGamma', start = c(alpha=1.5,beta=1,lambda=1.5), lower= c(0, 0))
summary(paramWG)
Sample:
dadosp = c(240.3,71.9,271.3, 186.3,241,253,287.4,138.3,206.9,176,270.4,73.3,118.9,203.1,139.7,31,269.6,140.2,205.1,133.2,107,354.6,277,27.6,186,260.9,350.4,242.6,292.5, 112.3,242.8,310.7,309.9,53.1,326.5,145.7,271.5, 117.5,264.7,243.9,182,136.7,103.8,188.3,236,419.8,338.6,357.7)

For your sample, the algorithm does not converge when estimating the ML. Fitting a Weibull-Gamma distribution to this data would require an extremely high lambda value. You can solve this problem by estimating log10(lambda) instead of lambda.
You can add lambda <- 10^lambda inside your 4 functions, e.g.
dWeibullGamma <- function(x, alpha, beta, lambda)
{
lambda <- 10^lambda
((alpha*beta)/(lambda))*(x^(alpha-1))*(1+(1/lambda)*x^(alpha))^(-(beta+1))
}
Then, the algorithm seems to converge:
library(fitdistrplus)
paramWG = fitdist(data = data, distr = 'WeibullGamma',
start = list(alpha=1, beta=1, lambda=1), lower = c(0, 0, 0))
summary(paramWG)$estimate
Output:
alpha beta lambda
2.432939 799.631852 8.680802
We see that the estimate of lambda is 10^8.68, hence the convergence problem when not taking the log.
You can also have a look at the fit as follows:
newx <- 0:500
pars <- summary(paramWG)$estimate
pred <- dWeibullGamma(newx, pars["alpha"], pars["beta"], pars["lambda"])
hist(data, freq = FALSE)
lines(newx, pred, lwd = 2)
Note: maybe fitting another distribution would make more sense?

Related

How to fit normal distribution with respect to frequency and intensity in R?

I have a list of data
frequency x1,x2,...,xn
i.e. 10,20,...,5000.
Intensity y1,yx,...,yn
0,0,...,50,60,50,...,0
where I want to fit a normal distribution to the data.
I found some website online such as (http://www.di.fc.ul.pt/~jpn/r/distributions/fitting.html) through the procedure like,
my_data <- rnorm(250, mean=1, sd=0.45)# unkonwn distribution parameters
fit <- fitdistr(my_data, densfun="normal")
but obviously, those methods won't work.
How to fit the above data to a normal distribution?
You can use the maximum likelihood function, mle, to solve this problem. Here is how you would do that:
my_data <- rnorm(250, mean=1, sd=0.45)# unkonwn distribution parameters
logLik <- function(sigma, mu){
ll <- vapply(my_data,
function(x) dnorm(x, mean = mu, sd = sigma),
FUN.VALUE = numeric(1))
-sum(log(ll))
}
mle(logLik, start = list(sigma = 1, mu = 1))
mle requires a log-likehood function that it uses to determine the optimal parameters (which in the case of a normal distribution are mu (mean) and sigma (st. dev.)). It takes the negative sum of the log-likelihood -sum(log(ll)) as part of a numerical procedure to find the best parameters for the distribution. It then returns the estimated parameters:
Call:
mle(minuslogl = logLik, start = list(sigma = 1, mu = 1))
Coefficients:
sigma mu
0.4595003 0.9724402

type I error simulation in R

i am trying to calculate the type i error rate and power for the correlation test for bivariate normal data using Monte Carlo simulation.
But i am getting unexpected values for the type I error and for power. (type I error as 0.864)
i need to know whether i have done some mistake. Can anyone help me?
set.seed(160230)
library("mvtnorm", lib.loc="~/R/win-library/3.4")
sigma= matrix(c(1,0.8,0.8,1),2,2)
mu <- c(0,0)
#bivariate normal data
sim=replicate(n=1000 , rmvnorm(10,mean=mu , sigma = sigma))
pval1=c()
for(i in 1:1000)
{
pval1[i]=cor.test(sim[,1,i],sim[,2,i],method = c("pearson"))$p.value
}
#type1 error rate
mean(pval1<0.05)
#power
mean(pval3>0.05)
Your code is okay but you have set up your simulations wrong.
In your code, you
Simulate bivariate data with a strong correlation, rho=0.8.
Test the hypothesis that H0: rho=0.
Thus, you are simulating data under the alternative hypothesis which is why you get the result of 0.864. This is essentially your power for that particular alternative. You could do the following instead:
First simulate data under the null hypothesis
sigma <- matrix(c(1,0,0,1),2,2)
mu <- c(0,0)
#bivariate normal data under H0
sim <- replicate(n=1000, rmvnorm(10, mean=mu, sigma = sigma))
# Test the actual level under H0
result <- sapply(1:1000, function(i) {
cor.test(sim[,1,i],sim[,2,i],method = c("pearson"))$p.value})
mean(result < 0.05)
which gives a value around 0.05. Under the alternative you can use your code with the correlation 0.8 (or some other number). You can generalise this with the following code to easily get the power for several correlations.
rho <- seq(0, .9, .1)
pwr <- sapply(rho, function(r) {
sigma <- matrix(c(1,r,r,1),2,2)
mu <- c(0,0)
#bivariate normal data
sim <- replicate(n=1000, rmvnorm(10, mean=mu, sigma = sigma))
# Test the actual level
result <- sapply(1:1000, function(i) {
cor.test(sim[,1,i],sim[,2,i],method = c("pearson"))$p.value})
mean(result < 0.05)
})
Then you can see the impact of correlation on the power byt plotting the relationship
plot(rho, pwr, type="l", xlab=expression(rho), ylab="Power")

Computing the marginal likelihood of a Gaussian model in R with integrate()

I am trying to compute the marginal likelihood of a Gaussian model in R. More precisely, I am trying to integrate the likelihood over both a Gaussian prior on mu and a Gaussian prior on sigma, with some observations yi.
In other words, I am trying to compute:
I tried to write this in R using the following function (following a similar SA question here: Quadrature to approximate a transformed beta distribution in R):
marglik <- function(data) {
integrand <-
Vectorize(function(data, mu, sigma) {
prod(dnorm(data, mu, sigma) ) * dnorm(mu, 110, 1) * dnorm(sigma, 10, 1)
} )
integrate(integrand, lower = 0, upper = Inf, mu = 100, sigma = 10)$value
}
Using this function, I can compute the marginal likelihood of the above model for a set of observations:
set.seed(666)
d <- rnorm(100, mean = 107.5, sd = 2.5)
marglik(data = d)
[1] 9.704133e-24
However, the results I obtain with this procedure are quite different from results I obtain with grid approximation, or using other packages/softwares.
My question is then: is it possible to do this double integration with integrate ? If it is, how would you do that ?
integrate() only takes in univariate functions. That is, the function you put in must be one-dimensional.
In general, such a problem is better tackled using specialised tools, either using something bridgesampling, ie. through the bridgesampling package if you have MCMC output or the cubature package if you have more general multivariate integration problems.
However, if we absolutely must do this using integrate() twice, we can make this work, but some errors need to be taken out of the code, and . Something like the following would work, although numerically the result seems to be zero most of the time, which is why you would generally try to obtain the log-marginal likelihood.
marglik <- function(data) {
# Function that integrates over mu for given sigma.
mu_integrand <- Vectorize(function(sigma) {
mu_given_sigma_fun <- Vectorize(function(mu) {
prod(dnorm(data, mu, sigma) ) * dnorm(mu, 110, 1) * dnorm(sigma, 10, 1)
})
integrate(mu_given_sigma_fun, lower = -Inf, upper = Inf)$value
})
integrate(mu_integrand, lower = 0, upper = Inf)$value
}
set.seed(666)
d <- rnorm(100, mean = 110, sd = 10)
marglik(data = d)

How to optimize parameters using genetic algorithms

I'd like to optimize three parameters (gamma, cost and epsilon) in eps-regression (SVR) using GA in R. Here's what I've done.
library(e1071)
data(Ozone, package="mlbench")
a<-na.omit(Ozone)
index<-sample(1:nrow(a), trunc(nrow(a)/3))
trainset<-a[index,]
testset<-a[-index,]
model<-svm(V4 ~ .,data=trainset, cost=0.1, gamma=0.1, epsilon=0.1, type="eps-regression", kernel="radial")
error<-model$residuals
rmse <- function(error) #root mean sqaured error
{
sqrt(mean(error^2))
}
rmse(error)
Here, I set cost, gamma and epsilon to be 0.1 respectively, but I don't think they are the best value. So, I'd like to employ Genetic Algorithm to optimize these parameters.
GA <- ga(type = "real-valued", fitness = rmse,
min = c(0.1,3), max = c(0.1,3),
popSize = 50, maxiter = 100)
Here, I used RMSE as the fitness function. but I think fitness function has to include the parameters that are to be optimized. But, in SVR, the objective function is too complicated to write out with R code, which I tried to find for a LONG time but to no avail. Someone who knows SVR and GA at the same time, someone who has an experience of optimizing SVR parameters using GA, Please help me. please.
In such an application, one passes the parameters whose values are to be optimized (in your case, cost, gamma and epsilon) as parameters of the fitness function, which then runs the model fitting + evaluation function and uses a measure of model performance as a measure of fitness. Therefore, the explicit form of the objective function is not directly relevant.
In the implementation below, I used 5-fold cross-validation to estimate the RMSE for a given set of parameters. In particular, since package GA maximizes the fitness function, I have written the fitness value for a given value of the parameters as minus the average rmse over the cross-validation datasets. Hence, the maximum fitness that can be attained is zero.
Here it is:
library(e1071)
library(GA)
data(Ozone, package="mlbench")
Data <- na.omit(Ozone)
# Setup the data for cross-validation
K = 5 # 5-fold cross-validation
fold_inds <- sample(1:K, nrow(Data), replace = TRUE)
lst_CV_data <- lapply(1:K, function(i) list(
train_data = Data[fold_inds != i, , drop = FALSE],
test_data = Data[fold_inds == i, , drop = FALSE]))
# Given the values of parameters 'cost', 'gamma' and 'epsilon', return the rmse of the model over the test data
evalParams <- function(train_data, test_data, cost, gamma, epsilon) {
# Train
model <- svm(V4 ~ ., data = train_data, cost = cost, gamma = gamma, epsilon = epsilon, type = "eps-regression", kernel = "radial")
# Test
rmse <- mean((predict(model, newdata = test_data) - test_data$V4) ^ 2)
return (rmse)
}
# Fitness function (to be maximized)
# Parameter vector x is: (cost, gamma, epsilon)
fitnessFunc <- function(x, Lst_CV_Data) {
# Retrieve the SVM parameters
cost_val <- x[1]
gamma_val <- x[2]
epsilon_val <- x[3]
# Use cross-validation to estimate the RMSE for each split of the dataset
rmse_vals <- sapply(Lst_CV_Data, function(in_data) with(in_data,
evalParams(train_data, test_data, cost_val, gamma_val, epsilon_val)))
# As fitness measure, return minus the average rmse (over the cross-validation folds),
# so that by maximizing fitness we are minimizing the rmse
return (-mean(rmse_vals))
}
# Range of the parameter values to be tested
# Parameters are: (cost, gamma, epsilon)
theta_min <- c(cost = 1e-4, gamma = 1e-3, epsilon = 1e-2)
theta_max <- c(cost = 10, gamma = 2, epsilon = 2)
# Run the genetic algorithm
results <- ga(type = "real-valued", fitness = fitnessFunc, lst_CV_data,
names = names(theta_min),
min = theta_min, max = theta_max,
popSize = 50, maxiter = 10)
summary(results)
which produces the results (for the range of parameter values that I specified, which may require fine-tuning based on the data):
GA results:
Iterations = 100
Fitness function value = -14.66315
Solution =
cost gamma epsilon
[1,] 2.643109 0.07910103 0.09864132

Manual Maximum-Likelihood Estimation of an AR-Model in R

I am trying to estimate a simple AR(1) model in R of the form y[t] = alpha + beta * y[t-1] + u[t] with u[t] being normally distributed with mean zero and standard deviation sigma.
I have simulated an AR(1) model with alpha = 10 and beta = 0.1:
library(stats)
data<-arima.sim(n=1000,list(ar=0.1),mean=10)
First check: OLS yields the following results:
lm(data~c(NA,data[1:length(data)-1]))
Call:
lm(formula = data ~ c(NA, data[1:length(data) - 1]))
Coefficients:
(Intercept) c(NA, data[1:length(data) - 1])
10.02253 0.09669
But my goal is to estimate the coefficients with ML. My negative log-likelihood function is:
logl<-function(sigma,alpha,beta){
-sum(log((1/(sqrt(2*pi)*sigma)) * exp(-((data-alpha-beta*c(NA,data[1:length(data)-1]))^2)/(2*sigma^2))))
}
that is, the sum of all log-single observation normal distributions, that are transformed by u[t] = y[t] - alpha - beta*y[t-1]. The lag has been created (just like in the OLS estimation above) by c(NA,data[1:length(data)-1]).
When I try to put it at work I get the following error:
library(stats4)
mle(logl,start=list(sigma=1,alpha=5,beta=0.05),method="L-BFGS-B")
Error in optim(start, f, method = method, hessian = TRUE, ...) :
L-BFGS-B needs finite values of 'fn'
My log-likelihood function must be correct, when I try to estimate a linear model of the form y[t] = alpha + beta * x[t] + u[t] it works perfectly.
I just do not see how my initial values lead to a non-finite result? Trying any other initial values does not solve the problem.
Any help is highly appreciated!
This works for me -- basically what you've done but leaving out the first element of the response, since we can't predict it with an AR model anyway.
Simulate:
library(stats)
set.seed(101)
data <- arima.sim(n=1000,list(ar=0.1),mean=10)
Negative log-likelihood:
logl <- function(sigma,alpha,beta) {
-sum(dnorm(data[-1],alpha+beta*data[1:length(data)-1],sigma,log=TRUE))
}
Fit:
library(stats4)
mle(logl,start=list(sigma=1,alpha=5,beta=0.05),method="L-BFGS-B")
## Call:
## mle(minuslogl = logl, start = list(sigma = 1, alpha = 5, beta = 0.05),
## method = "L-BFGS-B")
##
## Coefficients:
## 0.96150573 10.02658632 0.09437847
Alternatively:
df <- data.frame(y=data[-1],ylag1=head(data,-1))
library(bbmle)
mle2(y~dnorm(alpha+beta*ylag1,sigma),
start=list(sigma=1,alpha=5,beta=0.05),
data=df,method="L-BFGS-B")

Resources