I have been doing some data analysis in R and I am trying to figure out how to fit my data to a 3 parameter Weibull distribution. I found how to do it with a 2 parameter Weibull but have come up short in finding how to do it with a 3 parameter.
Here is how I fit the data using the fitdistr function from the MASS package:
y <- fitdistr(x[[6]], 'weibull')
x[[6]] is a subset of my data and y is where I am storing the result of the fitting.
First, you might want to look at FAdist package. However, that is not so hard to go from rweibull3 to rweibull:
> rweibull3
function (n, shape, scale = 1, thres = 0)
thres + rweibull(n, shape, scale)
<environment: namespace:FAdist>
and similarly from dweibull3 to dweibull
> dweibull3
function (x, shape, scale = 1, thres = 0, log = FALSE)
dweibull(x - thres, shape, scale, log)
<environment: namespace:FAdist>
so we have this
> x <- rweibull3(200, shape = 3, scale = 1, thres = 100)
> fitdistr(x, function(x, shape, scale, thres)
dweibull(x-thres, shape, scale), list(shape = 0.1, scale = 1, thres = 0))
shape scale thres
2.42498383 0.85074556 100.12372297
( 0.26380861) ( 0.07235804) ( 0.06020083)
Edit: As mentioned in the comment, there appears various warnings when trying to fit the distribution in this way
Error in optim(x = c(60.7075705026659, 60.6300379017397, 60.7669410153573, :
non-finite finite-difference value [3]
There were 20 warnings (use warnings() to see them)
Error in optim(x = c(60.7075705026659, 60.6300379017397, 60.7669410153573, :
L-BFGS-B needs finite values of 'fn'
In dweibull(x, shape, scale, log) : NaNs produced
For me at first it was only NaNs produced, and that is not the first time when I see it so I thought that it isn't so meaningful since estimates were good. After some searching it seemed to be quite popular problem and I couldn't find neither cause nor solution. One alternative could be using stats4 package and mle() function, but it seemed to have some problems too. But I can offer you to use a modified version of code by danielmedic which I have checked a few times:
thres <- 60
x <- rweibull(200, 3, 1) + thres
EPS = sqrt(.Machine$double.eps) # "epsilon" for very small numbers
llik.weibull <- function(shape, scale, thres, x)
{
sum(dweibull(x - thres, shape, scale, log=T))
}
thetahat.weibull <- function(x)
{
if(any(x <= 0)) stop("x values must be positive")
toptim <- function(theta) -llik.weibull(theta[1], theta[2], theta[3], x)
mu = mean(log(x))
sigma2 = var(log(x))
shape.guess = 1.2 / sqrt(sigma2)
scale.guess = exp(mu + (0.572 / shape.guess))
thres.guess = 1
res = nlminb(c(shape.guess, scale.guess, thres.guess), toptim, lower=EPS)
c(shape=res$par[1], scale=res$par[2], thres=res$par[3])
}
thetahat.weibull(x)
shape scale thres
3.325556 1.021171 59.975470
An alternative: package "lmom". The estimative by L-moments technique
library(lmom)
thres <- 60
x <- rweibull(200, 3, 1) + thres
moments = samlmu(x, sort.data = TRUE)
log.moments <- samlmu( log(x), sort.data = TRUE )
weibull_3parml <- pelwei(moments)
weibull_3parml
zeta beta delta
59.993075 1.015128 3.246453
But I don´t know how to do some Goodness-of-fit statistics in this package or in the solution above. Others packages you can do Goodness-of-fit statistics easily. Anyway, you can use alternatives like: ks.test or chisq.test
Related
I am trying to fit my photosynthesis data to a nls function, which is a nonrectangular hyperbola function. So far, I have some issues with getting the right start value for nls and, therefore, I am getting a lot of errors such as 'singular gradient ', 'NaNs produced', or 'step factor 0.000488281 reduced below 'minFactor' of 0.000976562'. Would you please give some suggestions for finding the best starting values? Thanks in advance!
The codes and data are below:
#Dataframe
PPFD <- c(0,0,0,50,50,50,100,100,100,200,200,200,400,400,400,700,700,700,1000,1000,1000,1500,1500,1500)
Cultivar <- c(-0.7,-0.8,-0.6,0.6,0.5,0.8,2.0,2.0,2.3,3.6,3.7,3.7,5.7,5.5,5.8,9.7,9.6,10.0,14.7,14.4,14.9,20.4,20.6,20.9)
NLRC <-data.frame(PPFD,Cultivar)
#nls regression
reg_nrh <- nls(Cultivar ~ (1/(2*Theta))*(AQY*PPFD+Am-sqrt((AQY*PPFD+Am)^2-4*AQY*Theta*Am*PPFD))-Rd, data = NLRC, start=list(Am = max(NLRC$Cultivar)-min(NLRC$Cultivar), AQY = 0.05, Rd=-min(NLRC$Cultivar), Theta = 1))
#estimated parameters for plotting
Amnrh <- coef(reg_nrh)[1]
AQYnrh <- coef(reg_nrh)[2]
Rdnrh <- coef(reg_nrh)[3]
Theta <- coef(reg_nrh)[4]
#plot
plot(NLRC$PPFD, NLRC$Cultivar, main = c("Cultivar"), xlab="", ylab="", ylim=c(-2,40),cex.lab=1.2,cex.axis=1.5,cex=2)+mtext(expression("PPFD ("*mu*"mol photons "*m^-2*s^-1*")"),side=1,line=3.3,cex=1.5)+mtext(expression(P[net]*" ("*mu*"mol "*CO[2]*" "*m^-2*s^-1*")"),side=2,line=2.5,cex=1.5)
#simulated value
ppfd = seq(from = 0, to = 1500)
pnnrh <- (1/(2*Theta))*(AQYnrh*ppfd+Amnrh-sqrt((AQYnrh*ppfd+Amnrh)^2-4*AQYnrh*Theta*Amnrh*ppfd))- Rdnrh
lines(ppfd, pnnrh, col="Green")
If we
take the maximum of 0 and the expression within the sqrt to avoid taking negative square roots
fix Theta at 0.8
use lm to get starting values for AQY and Am
then it converges
Theta <- 0.8
fm <- lm(Cultivar ~ PPFD, NLRC)
st <- list(AQY = coef(fm)[[2]], Rd = -min(NLRC$Cultivar), Am = coef(fm)[[1]])
fo <- Cultivar ~
(1/(2*Theta))*(AQY*PPFD+Am-sqrt(pmax(0, (AQY*PPFD+Am)^2-4*AQY*Theta*Am*PPFD)))-Rd
reg <- nls(fo, data = NLRC, start = st)
deviance(reg) # residual sum of squares
## [1] 5.607943
plot(Cultivar ~ PPFD, NLRC)
lines(fitted(reg) ~ PPFD, NLRC, col = "red")
(continued after image)
Note that the first model below has only two parameters yet has lower residual sum of squares (lower is better).
reg2 <- nls(Cultivar ~ a * PPFD^b, NLRC, start = list(a = 1, b = 1))
deviance(reg2)
## [1] 5.098796
These have higher residual sum of squares but do have the advantage that they are very simple.
deviance(fm) # fm defined above
## [1] 6.938648
fm0 <- lm(Cultivar ~ PPFD + 0, NLRC) # same as fm except no intercept
deviance(fm0)
## [1] 7.381632
I am trying to fit a Weibull distribution using method of moments to my data in RStudio.
I don't know about the necessary commands and packages one needs to fit distributions such as Weibull or Pareto. Specifically I am trying to estimate the shape parameter k and the
scale λ.
I use this code to generate my data:
a <- rweibull(100, 10, 1)
Here is a function to estimate the Weibull distribution parameters with the method of moments.
weibull_mom <- function(x, interval){
mom <- function(shape, x, xbar){
s2 <- var(x, na.rm = TRUE)
lgamma(1 + 2/shape) - 2*lgamma(1 + 1/shape) - log(xbar^2 + s2) + 2*log(xbar)
}
xbar <- mean(x, na.rm = TRUE)
shape <- uniroot(mom, interval = interval, x = x, xbar = xbar)$root
scale <- xbar/gamma(1 + 1/shape)
list(shape = shape, scale = scale)
}
set.seed(2021) # Make the results reproducible
a <- rweibull(100, 10, 1)
weibull_mom(a, interval = c(1, 1e6))
#$shape
#[1] 9.006623
#
#$scale
#[1] 0.9818155
The maximum likelihood estimates are
MASS::fitdistr(a, "weibull")
# shape scale
# 8.89326148 0.98265852
# (0.69944224) (0.01165359)
#Warning messages:
#1: In densfun(x, parm[1], parm[2], ...) : NaNs produced
#2: In densfun(x, parm[1], parm[2], ...) : NaNs produced
I wanted to set a small dataframe in order to plot myself some points of the incomplete elliptic integral of 1st kind for different values of amplitude phi and modulus k. The function to integrate is 1/sqrt(1 - (k*sin(x))^2) between 0 and phi.Here is the code I imagined:
v.phi <- seq(0, 2*pi, 1)
n.phi <- length(v.phi)
v.k <- seq(-1, +1, 0.5)
n.k <- length(v.k)
k <- rep(v.k, each = n.phi, times = 1)
phi <- rep(v.phi, each = 1, times = n.k)
df <- data.frame(k, phi)
func <- function(x, k) 1/sqrt(1 - (k*sin(x))^2)
df$area <- integrate(func,lower=0, upper=df$phi, k=df$k)
But this generates errors and I am obviously mistaking in constructing the new variable df$area... Could someone put me in the right way?
You can use mapply:
df$area <- mapply(function(phi,k){
integrate(func, lower=0, upper=phi, k=k)$value
}, df$phi, df$k)
However that generates an error because there are some values of k equal to 1 or -1, while the allowed values are -1 < k < 1. You can't evaluate this integral for k = +/- 1.
Note that there's a better way to evaluate this integral: the incomplete elliptic function of the first kind is implemented in the gsl package:
> integrate(func, lower=0, upper=6, k=0.5)$value
[1] 6.458877
> gsl::ellint_F(6, 0.5)
[1] 6.458877
As I said, this function is not defined for k=-1 or k=1:
> gsl::ellint_F(6, 1)
[1] NaN
> gsl::ellint_F(6, -1)
[1] NaN
> integrate(func, lower=0, upper=6, k=1)
Error in integrate(func, lower = 0, upper = 6, k = 1) :
non-finite function value
I am trying to estimate the parameters of the three-parametric Weibull distribution with ML for censored data.
I've worked it out by using the package flexsurv where I've defined an "own" density function.
I've also followed the instructions given in the documentation of the function flexsurv::flexsurvregto build the list with all required information to do the MLE with a customer density function.
In the following you can see what I've done so far.
library(FAdist)
library(flexsurv)
set.seed(1)
thres <- 3500
data <- rweibull(n = 1000, shape = 2.2, scale = 25000) + thres
y <- sample(c(0, 1), size = 1000, replace = TRUE)
df1 <- data.frame(x = data, status = y)
dweib3 <- function(x, shape, scale, thres, log = FALSE) {
dweibull(x - thres, shape, scale, log = log)
}
pweib3 <- function(q, shape, scale, thres, log_p = FALSE) {
pweibull(q - thres, shape, scale, log.p = log_p)
}
# Not required
#qweib3 <- function(p, shape, scale, thres, log.p = FALSE) {
# if (log.p == TRUE) {
# p <- exp(p)
# }
# qwei3 <- thres + qweibull(p, shape, scale)
# return(qwei3)
#}
dweib3 <- Vectorize(dweib3)
pweib3 <- Vectorize(pweib3)
custom.weibull <- list(name = "weib3",
pars = c('shape', 'scale', 'thres'), location = 'scale',
transforms = c(log, log, log),
inv.transforms = c(exp, exp, exp),
inits = function(t) {
c(1.2 / sqrt((var(log(t)))), exp(mean(log(t)) + (.572 / (1.2 / sqrt((var(log(t))))))), .5 * min(t))
}
)
ml <- flexsurvreg(Surv(df1$x, df1$status) ~ 1, data = df1, dist = custom.weibull)
The variable y should represent the status of a unit where 1 is a failure and 0 is an unfailed unit until censoring.
The initial values for shape and scale are taken from the moments which are also defined in the fitdistrpluspackage.
For the threshold parameter there must be a constraining since the threshold must be really smaller than the minimum of the data. Therefore a constraint of threshold is at its max .99 * t_min would be satisfactory (this is something which I haven't implement until now).
The output of the above MLE is the following:
> ml
Call:
flexsurvreg(formula = Surv(df1$x, df1$status) ~ 1, data = df1,
dist = custom.weibull)
Estimates:
est L95% U95% se
shape 2.37e+00 2.12e+00 2.65e+00 1.33e-01
scale 3.52e+04 3.32e+04 3.74e+04 1.08e+03
thres 2.75e+03 1.51e+03 5.02e+03 8.44e+02
N = 1000, Events: 481, Censored: 519
Total time at risk: 25558684
Log-likelihood = -5462.027, df = 3
AIC = 10930.05
The estimated parameters aren't fine even if there is censoring.
I've did this procedure a few times with other randomly generated data... the estimates are always pretty far away from the "truth".
Therefore I need an improvement of my code or another possibility to estimate the parameters of a three-parameter Weibull with MLE.
I am new both to R and statistics. I am playing with maximum likelihood estimation, and I am getting some incorrect results. I want to model x with a simple linear function:
x<-apply(matrix(seq(1,10,1), nrow=1), 1, function(x) 10*x+runif(10,-3,3))
LL<-function(a,b){
R=apply(x,1,function(y) a*y+b)
-sum(log(R))
}
mle(LL, start=list(a=10, b=0))
I am getting the following result:
Coefficients:
a b
43571.957 1338.345
instead of a~10, b~0.
I modified the code according to the suggestions of Spacedman:
set.seed(99)
x<-apply(matrix(seq(1,10,1), nrow=1), 1, function(x) 10*x+runif(10,-3,3))
LL<-function(a,b){
R = x[,1] - a*(1:10) + b
-sum(R^2)
}
library(stats4)
mle(LL, start=list(a=11, b=0.3))
Error in solve.default(oout$hessian) :
Lapack routine dgesv: system is exactly singular: U[1,1] = 0
I do not know how to get rid of this error. Changing the sees and generating the x values again does not help.
There are a couple of things to notice here. To clarify we start by changing the distribution of the error-term from a uniform distribution runif(x, -3, 3) to the std. normal distribution: rnorm(x). We can now easily simulate your data, then set up your (minus) loglikelihood and maximize (minizime) by:
a <- 10
b <- 0
set.seed(99)
x <- apply(matrix(seq(1, 10, 1), nrow=1), 1, function(x) b + a * x + rnorm(10))
minuslogL <- function(a, b) -sum(dnorm(x[, 1] - (b + a * 1:10), log = TRUE))
library(stats4)
mle(minuslogL, start = list(a = 11, b = 0.3))
Call:
mle(minuslogl = minuslogL, start = list(a = 11, b = 0.3))
Coefficients:
a b
9.8732793 0.5922192
Notice that this works well, since the likelihood is smooth and mle() uses "BFGS" for the optimization, eg. a quasi-Newton, gradient approach. Lets try the same with uniform errors:
set.seed(99)
x <- apply(matrix(seq(1, 10, 1), nrow=1), 1, function(x) b + a * x + runif(10, -3, 3))
minuslogL2 <- function(a,b) -sum(dunif(x[, 1] -(a * 1:10 + b), -3, 3, log = TRUE))
mle(minuslogL2, start = list(a = 11, b = 0.3))
Error in optim(start, f, method = method, hessian = TRUE, ...) :
initial value in 'vmmin' is not finite
This fails! Why? Since the uniform-errors restrict the parameter space, you will not get a smooth likelihood. If you move your parameters a,b too far away from the true values, you will get Inf. If you move close enough, you will get the same likelihood (eg. many possible min. values):
> minuslogL2(11, 0.3)
[1] Inf
> minuslogL2(10, 0)
[1] 17.91759
> minuslogL2(10.02, 0.06)
[1] 17.91759
Maximizing this likelihood compares to finding the set: {a,b}: -logL(a, b) == -logL(10, 0), which can be found by a plain search algorithm.