Maximum likelihood lognormal R and SAS - r

I am converting SAS codes to R and there is a feature of using lognormal distribution in the SAS univariate procedure using histograms and midpoints. The result is a table containing the following variables,
EXPPCT - estimated percent of population in histogram interval determined from optional fitted distribution (here it is lognormal)
OBSPCT - percent of variable values in histogram interval
VAR - variable name
MIDPT - midpoint of histogram interval
There is an option in SAS to consider the MLE of the zeta, theta and sigma parameters while applying the distribution.
Now I was able to figure out the way to do this in R. My only problem arises in the likelihood estimation, when the three parameters are being estimated in SAS. R gives me different values.
I am using the following for MLE in R.
library(fitdistrplus)
set.seed(0)
cd <- rlnorm(40,4)
pars <- coef(fitdist(cd, "lnorm"))
meanlog sdlog
4.0549354 0.8620153
I am using the following for MLE in SAS. (the est option)
proc univariate data = testing;
histogram cd /lognormal (theta = est zeta=est sigma=est)
midpoints = 1 to &maxx. by 100
outhistogram = this;
run;
&maxx denotes the maximum of the input. The results of the run from SAS can be found here.
I am new to statistics and unable to find the method used for the MLE in SAS and have no clue as to how to estimate the same in R.
Thanks in advance.

I found these packages EnvStats and FAdist that let me estimate the threshold parameter and use these parameters to fit the 3 parameter lognormal distribution. Backlin was right about the parameters. Right now, the parameters are not an exact match but the end result is the same as SAS. Thank you vey much.

Related

R code to find the inverse of Cumulative Distribution Function of a Multivariate Joint Distribution (Copula)

I am new on R and and I am working with Copulas.
I have read the R documentation and so far I understood how to create a copula and to calculate the PdF and CDF.
#Generate Normal Copula
coef_ <- 0.7
mycopula <- normalCopula(coef_, dim = 2)
v <- rCopula(4000, mycopula)
# Compute the density
pdf_ <- dCopula(v, mycopula)
# Compute the CDF
cdf <- pCopula(v, mycopula)
However, I need a function to retrieve the inverse of the CDF of the Multivariate Normal Distribution, as I need to find the 99° percentile.
Anyone knows how to do that? Thanks!
I am not sure if you are still interested. However, you can just use qCopula function. Or simply qnorm(v). This will transfer your data from copula data to original data with standard normal margins.

Threshold for fitting generalized pareto model

I need the R code for setting a threshold while fitting a generalized Pareto distribution.
I've just made a R package which exactly serves this purpose, namely gfiExtremes (now also on CRAN).
remotes::install_github("stla/gfiExtremes", build_vignettes = TRUE)
It allows to perform inference on the quantiles for a generalized Pareto distribution model and on the parameters of the Pareto exceedance distribution, with or without assuming the exceedance threshold is known.
Usage
The data must be given as a numeric vector, say x.
library(gfiExtremes)
gf <- gfigpd2(x, beta = c(0.99, 0.995, 0.999))
summary(gf) # provides estimates and confidence intervals of the beta-quantiles
thresholdEstimate(gf) # an estimate of the threshold
See the examples in the package documentation and the vignette for more info.

Fit a Weibull cumulative distribution to mass passing data in R

I have some particle size mass-passing cumulative data for crushed rock material to which I would like to fit a Weibull distribution using R. I have managed to do this in Excel using WEIBULL.DIST() function using the cumulative switch set to TRUE.
I then used excel SOLVER to derive the alpha and beta parameters using RMSE to get the best fit. I would like to reproduce the result in R.
(see attached spreadsheet here)
The particle data and cumulative mass passing % is are the following vectors
d.mm <- c(20.001,6.964,4.595,2.297,1.741,1.149,
0.871,0.574,0.287,0.082,0.062,0.020)
m.pct <- c(1.00,0.97,0.78,0.49,0.27,0.20,0.14,
0.11,0.07,0.03,0.025,0.00)
This is the plot to which I would like to fit the Weibull result:
plot(log10(d.mm),m.pct)
... computing the function for a vector of diameter values as per the spreadsheet
d.wei <- c(seq(0.01,0.1,0.01),seq(0.2,1,0.1),seq(2,30,1))
The values I've determined as best for the Weibull alpha and beta in Excel using Solver are 1.41 and 3.31 respectively
So my question is how to reproduce this analysis in R (not necessarily the Solver part) but fitting the Weibull to this dataset?
The nonlinear least squares function nls is the R version of the Execl's solver.
The pweibull will calculate the probability distribution for the Weibull distribution. The comments in the code should explain the step-by-step solution
d.mm <- c(20.001,6.964,4.595,2.297,1.741,1.149,
0.871,0.574,0.287,0.082,0.062,0.020)
m.pct <- c(1.00,0.97,0.78,0.49,0.27,0.20,0.14,
0.11,0.07,0.03,0.025,0.00)
#create data frame store data
df<-data.frame(m.pct, d.mm)
#data for prediction
d.wei <- c(seq(0.01,0.1,0.01),seq(0.2,1,0.1),seq(2,30,1))
#solver (provided starting value for solution)
# alpha is used for shape and beta is used for scale
fit<-nls(m.pct~pweibull(d.mm, shape=alpha, scale=beta), data=df, start=list(alpha=1, beta=2))
print(summary(fit))
#extract out shape and scale
print(summary(fit)$parameters[,1])
#predict new values base on model
y<-predict(fit, newdata=data.frame(d.mm=d.wei))
#Plot comparison
plot(log10(d.mm),m.pct)
lines(log10(d.wei),y, col="blue")

nls error during the parameter estimation of power law with exponential cutoff distribution in R

I want to fit mydata with several known distributions, power law with exponential cutoff distribution is one of the candidates.
fitdistr function in package fitdistrplus is one of good methods to use for the parameter estimation using MLE, or MME, or QME.
But power law with exponential cutoff is not the base probability function according to CRAN Task View: Probability Distributions , so I try the nls function.
The pdf of power law with exponential cutoff is f(x;α,λ)=C*x^(−α)*exp(−λ*x)
First, I generate some random values to replace my real data:
data <- rlnorm(1000,0.6,1.23)
h <- hist(data,breaks=1000,plot=FALSE)
x <- h$mids
y <- h$density
Then, I use nls function to conduct parameter estimation:
nls(y~c*x^(-a)*exp(-b*x),start=list(a=1,b=1,c=1))
But it does not work and always throws one of these two errors:
Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model
Or: singular gradient matrix at initial parameter estimates
Before posting, I have read almost all the previous posts and google, there are several reasons for the errors:
bad start values for the nls. I tried a lot, but it does not work.
some negative values or values less than 1 or values equal to Inf may be generated. I tried to do the data cleaning, also, it does not work.
What should I do now? Or are there some other better methods to do the parameter estimation of power law with exponential cutoff? I need your help, thank you!

R : How to obtain the fitting values from distribution fit?

I fit gamma distribution on empirical distribution function using the $fitdist$ function:
fit = fitdist(data=empdistr,distr="gamma")
I then use the $denscomp$ function to compare data to fitted values:
dc = denscomp(fit)
But I would like to extract from $fit$ or from $dc$ the actual fitted values, i.e. the points of the gamma density (with the fitted parameters) which are displayed in the $denscomp$ function.
Does anybody have an idea of how I can do that.
Thanks in advance!
Use dgamma to predict the density for a given quantile:
dgamma(x, coef(fit)[1], coef(fit)[2])

Resources