finding p-value from goodness of fit test in fitdistrplus package in r

finding p-value from goodness of fit test in fitdistrplus package in r - r

I want to fit a distribution to my data. I use fitdistrplus package in r to find the distribution. I can compare the goodness of fit results for different distributions to see which one is more fitted to my data but I don't know how to check the pvalue for goodness of fit test for each of the distributions. The results might show that among gamma, lognormal and exponential, exponential distribution has the lower statistics for anderson darling test but I don't know how to check if pvalue for these tests does not reject the null hypothesis. Is there any built in function in R which gives the pvalues?
Here is a piece of code I used as an example:
d <- sample(100,50)
library(fitdistrplus)
descdist(d)
fitg <- fitdist(d,"gamma")
fitg2 <- fitdist(d,"exp")
gofstat(list(fitg,fitg2))
This code makes 50 random numbers from 0 to 100 and tries to find best fitted model to these data. If descdist(d) shows that gamma and exponential are the two candidates as the best fitted model, fitg and fitg2 finds their related models. the last line compares Ks and anderson darling statistics to show which distribution is most fitted. Distribution with lower value for these tests is the best. However, I dont know how to find p-values for fitg and fitg2 before comparying them. If pvalues show that none of these distributions are not fitted to these data, there is no point to comparing their goodness of fit statistics to my knowledge.
Any help is appreciated.
Thanks

Related

Using GAMLSS, the difference between fitDist() and gamlss()

When using the GAMLSS package in R, there are many different ways to fit a distribution to a set of data. My data is a single vector of values, and I am fitting a distribution over these values.
My question is this: what is the main difference between using fitDist() and gamlss() since they give similar but different answers for parameter values, and different worm plots?
Also, using the function confint() works for gamlss() fitted objects but not for objects fitted with fitDist(). Is there any way to produce confidence intervals for parameters fitted with the fitDist() function? Is there an accuracy difference between the two procedures? Thanks!

m1 <- fitDist()
fits many distributions and chooses the best according to a
generalized Akaike information criterion, GAIC(k), wit penalty k for each
fitted parameter in the distribution, where k is specified by the user,
e.g. k=2 for AIC,
k = log(n) for BIC,
k=4 for a Chi-squared test (rounded from 3.84, the 5% critical value of a Chi-squared distribution with 1 degree of fereedom), which is my preference.
m1$fits
gives the full results from the best to worst distribution according to GAIC(k).

confidence interval of estimates in a fitted hybrid model by spatstat

hybrid Gibbs models are flexible for fitting spatial pattern data, however, I am confused on how to get the confidence interval for the fitted model's estimate. for instance, I fitted a hybrid geyer model including a hardcore and a geyer saturation components, got the estimates:
Mo.hybrid<-Hybrid(H=Hardcore(), G=Geyer(81,1))
my.hybrid<-ppm(my.X~1,Mo.hybrid, correction="bord")
#beta = 1.629279e-06
#Hard core distance: 31.85573
#Fitted G interaction parameter gamma: 10.241487
what I interested is the gamma, which present the aggregation of points. obviously, the data X is a sample, i.e., of cells in a anatomical image. in order to report statistical result, a confidence interval for gamma is needed. however, i do not have replicates for the image data.
can i simlate 10 time of the fitted hybrid model, then refitted them to get confidence interval of the estimate? something like:
mo.Y<-rmhmodel(cif=c("hardcore","geyer"),
par=list(list(beta=1.629279e-06,hc=31.85573),
list(beta=1, gamma=10.241487,r=81,sat=1)), w=my.X)
Y1<-rmh(model=mo.Y, control = list(nrep=1e6,p=1, fixall=TRUE),
start=list(n.start=c(npoint(my.X))))
Y1.fit<-ppm(Y1~1, Mo.hybrid,rbord=0.1)
# simulate and fit Y2,Y3,...Y10 in same way
or:
Y10<-simulate(my.hybrid,nsim=10)
Y1.fit<-ppm(Y10[1]~1, Mo.hybrid,rbord=0.1)
# fit Y2,Y3,...Y10 in same way
certainly, the algorithms is different, the rmh() can control simulated intensity while the simulate() does not.
now the questions are:
is it right to use simualtion to get confidence interval of estimate?
or the fitted model can provide estimate interval that could be extracted?
if simulation is ok, which algorithm is better in my case?

The function confint calculates confidence intervals for the canonical parameters of a statistical model. It is defined in the standard stats package. You can apply it to fitted point process models in spatstat: in your example just type confint(my.hybrid).
You wanted a confidence interval for the non-canonical parameter gamma. The canonical parameter is theta = log(gamma) so if you do exp(confint(my.hybrid) you can read off the confidence interval for gamma.
Confidence intervals and other forms of inference for fitted point process models are discussed in detail in the spatstat book chapters 9, 10 and 13.
The confidence intervals described above are the asymptotic ones (based on the asymptotic variance matrix using the central limit theorem).
If you really wanted to estimate the variance-covariance matrix by simulation, it would be safer and easier to fit the model using method='ho' (which performs the simulation) and then apply confint as before (which would then use the variance of the simulations rather than the asymptotic variance).
rmh.ppm and simulate.ppm are essentially the same algorithm, apart from some book-keeping. The differences observed in your example occur because you passed different arguments. You could have passed the same arguments to either of these functions.

Can I test autocorrelation from the generalized least squares model?

I am trying to use a generalized least square model (gls in R) on my panel data to deal with autocorrelation problem.
I do not want to have any lags for any variables.
I am trying to use Durbin-Watson test (dwtest in R) to check the autocorrelation problem from my generalized least square model (gls).
However, I find that the dwtest is not applicable over gls function while it is applicable to other functions such as lm.
Is there a way to check the autocorrelation problem from my gls model?

Durbin-Watson test is designed to check for presence of autocorrelation in standard least-squares models (such as one fitted by lm). If autocorrelation is detected, one can then capture it explicitly in the model using, for example, generalized least squares (gls in R). My understanding is that Durbin-Watson is not appropriate to then test for "goodness of fit" in the resulting models, as gls residuals may no longer follow the same distribution as residuals from the standard lm model. (Somebody with deeper knowledge of statistics should correct me, if I'm wrong).
With that said, function durbinWatsonTest from the car package will accept arbitrary residuals and return the associated test statistic. You can therefore do something like this:
v <- gls( ... )$residuals
attr(v,"std") <- NULL # get rid of the additional attribute
car::durbinWatsonTest( v )
Note that durbinWatsonTest will compute p-values only for lm models (likely due to the considerations described above), but you can estimate them empirically by permuting your data / residuals.

Q: How to calculate p-values in Cox PH model with maximal test statistic in R?

I am interested in calculating p-values within a Cox PH model based upon the maximal test statistic, to get very robust estimates. Does anyone have experience with this?
I have played around a bit with the R package 'coxphf' that incorporates Firth's penalized likelihood, but it seems to be giving me different coefficients and p-values if I chose firth=FALSE vs. use the standard coxph function in 'survival'.
I do not discount being completely lost on this, so any advice would be useful.
Thanks!

What's the difference between ks test and bootstrap_p for power law fitting?

I want to know the goodness of fit while fitting a power law distribution in R using poweRlaw package.
After estimate_xmin() , I had a p-value 0.04614726. But the bootstrap_p() returns another p-value 0.
So why do these two p-value differ? And how can I judge if it is a power law distribution?
here is the plot when using poweRlaw for fittingpoweRlaw fitting result

You're getting a bit confused. One of the statistics that estimate_xmin returns is the Kolmogorov-Smirnoff statistic (as described in Clauset, Shalizi, Newman (2009)). This statistic is used to estimate the best cut-off value for your model, i.e. xmin. However, this doesn't tell you anything about the model fit.
To assess model suitability is where the bootstrap function comes in.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

finding p-value from goodness of fit test in fitdistrplus package in r - r

Related

Using GAMLSS, the difference between fitDist() and gamlss()

confidence interval of estimates in a fitted hybrid model by spatstat

Can I test autocorrelation from the generalized least squares model?

Q: How to calculate p-values in Cox PH model with maximal test statistic in R?

What's the difference between ks test and bootstrap_p for power law fitting?

Categories

Resources