Fit a Weibull cumulative distribution to mass passing data in R - r

I have some particle size mass-passing cumulative data for crushed rock material to which I would like to fit a Weibull distribution using R. I have managed to do this in Excel using WEIBULL.DIST() function using the cumulative switch set to TRUE.
I then used excel SOLVER to derive the alpha and beta parameters using RMSE to get the best fit. I would like to reproduce the result in R.
(see attached spreadsheet here)
The particle data and cumulative mass passing % is are the following vectors
d.mm <- c(20.001,6.964,4.595,2.297,1.741,1.149,
0.871,0.574,0.287,0.082,0.062,0.020)
m.pct <- c(1.00,0.97,0.78,0.49,0.27,0.20,0.14,
0.11,0.07,0.03,0.025,0.00)
This is the plot to which I would like to fit the Weibull result:
plot(log10(d.mm),m.pct)
... computing the function for a vector of diameter values as per the spreadsheet
d.wei <- c(seq(0.01,0.1,0.01),seq(0.2,1,0.1),seq(2,30,1))
The values I've determined as best for the Weibull alpha and beta in Excel using Solver are 1.41 and 3.31 respectively
So my question is how to reproduce this analysis in R (not necessarily the Solver part) but fitting the Weibull to this dataset?

The nonlinear least squares function nls is the R version of the Execl's solver.
The pweibull will calculate the probability distribution for the Weibull distribution. The comments in the code should explain the step-by-step solution
d.mm <- c(20.001,6.964,4.595,2.297,1.741,1.149,
0.871,0.574,0.287,0.082,0.062,0.020)
m.pct <- c(1.00,0.97,0.78,0.49,0.27,0.20,0.14,
0.11,0.07,0.03,0.025,0.00)
#create data frame store data
df<-data.frame(m.pct, d.mm)
#data for prediction
d.wei <- c(seq(0.01,0.1,0.01),seq(0.2,1,0.1),seq(2,30,1))
#solver (provided starting value for solution)
# alpha is used for shape and beta is used for scale
fit<-nls(m.pct~pweibull(d.mm, shape=alpha, scale=beta), data=df, start=list(alpha=1, beta=2))
print(summary(fit))
#extract out shape and scale
print(summary(fit)$parameters[,1])
#predict new values base on model
y<-predict(fit, newdata=data.frame(d.mm=d.wei))
#Plot comparison
plot(log10(d.mm),m.pct)
lines(log10(d.wei),y, col="blue")

Related

R code to find the inverse of Cumulative Distribution Function of a Multivariate Joint Distribution (Copula)

I am new on R and and I am working with Copulas.
I have read the R documentation and so far I understood how to create a copula and to calculate the PdF and CDF.
#Generate Normal Copula
coef_ <- 0.7
mycopula <- normalCopula(coef_, dim = 2)
v <- rCopula(4000, mycopula)
# Compute the density
pdf_ <- dCopula(v, mycopula)
# Compute the CDF
cdf <- pCopula(v, mycopula)
However, I need a function to retrieve the inverse of the CDF of the Multivariate Normal Distribution, as I need to find the 99° percentile.
Anyone knows how to do that? Thanks!
I am not sure if you are still interested. However, you can just use qCopula function. Or simply qnorm(v). This will transfer your data from copula data to original data with standard normal margins.

How does one extract hat values and Cook's Distance from an `nlsLM` model object in R?

I'm using the nlsLM function to fit a nonlinear regression. How does one extract the hat values and Cook's Distance from an nlsLM model object?
With objects created using the nls or nlreg functions, I know how to extract the hat values and the Cook's Distance of the observations, but I can't figure out how to get them using nslLM.
Can anyone help me out on this? Thanks!
So, it's not Cook's Distance or based on hat values, but you can use the function nlsJack in the nlstools package to jackknife your nls model, which means it removes every point, one by one, and bootstraps the resulting model to see, roughly speaking, how much the model coefficients change with or without a given observation in there.
Reproducible example:
xs = rep(1:10, times = 10)
ys = 3 + 2*exp(-0.5*xs)
for (i in 1:100) {
xs[i] = rnorm(1, xs[i], 2)
}
df1 = data.frame(xs, ys)
nls1 = nls(ys ~ a + b*exp(d*xs), data=df1, start=c(a=3, b=2, d=-0.5))
require(nlstools)
plot(nlsJack(nls1))
The plot shows the percentage change in each model coefficient as each individual observation is removed, and it marks influential points above a certain threshold as "influential" in the resulting plot. The documentation for nlsJack describes how this threshold is determined:
An observation is empirically defined as influential for one parameter if the difference between the estimate of this parameter with and without the observation exceeds twice the standard error of the estimate divided by sqrt(n). This empirical method assumes a small curvature of the nonlinear model.
My impression so far is that this a fairly liberal criterion--it tends to mark a lot of points as influential.
nlstools is a pretty useful package overall for diagnosing nls model fits though.

Maximum likelihood lognormal R and SAS

I am converting SAS codes to R and there is a feature of using lognormal distribution in the SAS univariate procedure using histograms and midpoints. The result is a table containing the following variables,
EXPPCT - estimated percent of population in histogram interval determined from optional fitted distribution (here it is lognormal)
OBSPCT - percent of variable values in histogram interval
VAR - variable name
MIDPT - midpoint of histogram interval
There is an option in SAS to consider the MLE of the zeta, theta and sigma parameters while applying the distribution.
Now I was able to figure out the way to do this in R. My only problem arises in the likelihood estimation, when the three parameters are being estimated in SAS. R gives me different values.
I am using the following for MLE in R.
library(fitdistrplus)
set.seed(0)
cd <- rlnorm(40,4)
pars <- coef(fitdist(cd, "lnorm"))
meanlog sdlog
4.0549354 0.8620153
I am using the following for MLE in SAS. (the est option)
proc univariate data = testing;
histogram cd /lognormal (theta = est zeta=est sigma=est)
midpoints = 1 to &maxx. by 100
outhistogram = this;
run;
&maxx denotes the maximum of the input. The results of the run from SAS can be found here.
I am new to statistics and unable to find the method used for the MLE in SAS and have no clue as to how to estimate the same in R.
Thanks in advance.
I found these packages EnvStats and FAdist that let me estimate the threshold parameter and use these parameters to fit the 3 parameter lognormal distribution. Backlin was right about the parameters. Right now, the parameters are not an exact match but the end result is the same as SAS. Thank you vey much.

hurdle models using continuous data and covariates

I was wondering if I get some advice about fitting hurdle models using continuous data and covariates.
I have some continuous data that are generally well fit using a right-skewed distribution such as a Pareto, Gamma, or Weibull distribution. However, there several zeros in my data which are important to my analysis. In addition, I have some categorical (two-level) covariates and would like to model the parameters of a distribution as a function of these covariates in order to formally evaluate their importance (e.g., using AIC).
I have seen examples of hurdle models fit using continuous data but have not yet found any examples of how to incorporate covariates and a model-selection framework. Does anyone have any suggestions as to how to proceed or know of any R packages that allow this procedure? I have included some code below to reproduce the type of data I am working with. The non-zero data are generated via a generalized Pareto distribution from the package texmex. The parameters were estimated directly from my non-zero data. I have also included the code to plot the data in a histogram to see their distribution.
library("texmex")
set.seed(101)
zeros <- rep(0,8)
non_zeros <- rgpd(17, sigm=exp(-10.4856), xi=0.1030, u = 0)
all.data <- c(zeros,non_zeros)
hist(non_zeros,breaks=50,xlim=c(0,0.00015),ylim=c(0,9),main="",xlab="",
col="gray")
hist(zeros,add=TRUE,col="black",breaks=100,xlim=c(0,0.00015),ylim=c(0,9))
legend("topright",legend=c("zeros"),col="black",lwd=8)

R : How to obtain the fitting values from distribution fit?

I fit gamma distribution on empirical distribution function using the $fitdist$ function:
fit = fitdist(data=empdistr,distr="gamma")
I then use the $denscomp$ function to compare data to fitted values:
dc = denscomp(fit)
But I would like to extract from $fit$ or from $dc$ the actual fitted values, i.e. the points of the gamma density (with the fitted parameters) which are displayed in the $denscomp$ function.
Does anybody have an idea of how I can do that.
Thanks in advance!
Use dgamma to predict the density for a given quantile:
dgamma(x, coef(fit)[1], coef(fit)[2])

Resources