R codes for Poisson Gamma mixture distribution - r

I have already estimated my parameters namely mu, power variance, dispersion , power, shape and scale parameters of Gamma, i have claims data and i want to fit a Compound Poisson Gamma in R, how do i proceed from here? i have done a little bit of research and found the Tweedie package more precisely the commands ptweedie.inversion or ptweedie.series.Any help and/or guide will be appreciated. Thanks

Related

SE for logistic regression predictions

I have been tasked with calculating the SE for logistic regression point estimates (where all my predictor variables are factors). I typically use ggpredict to estimate my predictions which provides CI's. However, we are comparing our results to estimates from program MARK and we find readers have a better grasp at understanding our plots with SE as opposed to 95% CI's.
Based on reading the package notes, it appears I can simply calculate (conf.high - predicted value)/1.96). Am I correct? Or am I missing something and that is not the correct way to calculate SE for the predicted estimates. If I am wrong, any ideas on how I can do this or do I need to just use CI's?
Thank you very much for your help.

Estimating a probability distribution and sampling from it in Julia

I am trying to use Julia to estimate a continuous univariate distribution using N observed data points (stored as an array of Float64 numbers), and then sample from this estimated distribution. I have no prior knowledge restricting attention to some family of distributions.
I was thinking of using the KernelDensity package to estimate the distribution, but I'm not sure how to sample from the resulting output.
Any help/tips would be much appreciated.
Without any restrictions on the estimated distribution, a natural candidate would be the empirical distribution function (see Wikipedia). For this distribution there are very nice theorems about convergence to actual distribution (see Dvoretzky–Kiefer–Wolfowitz inequality).
With this choice, sampling is especially simple. If dataset is a list of current samples, then dataset[rand(1:length(dataset),sample_size)] is a set of new samples from the empirical distribution. With the Distributions package, it could be more readable, like so:
using Distributions
new_sample = sample(dataset,sample_size)
Finally, Kernel density estimation is also good, but might need a parameter to be chosen (the kernel and its width). This shows a preference for a certain family of distributions. Sampling from a kernel distribution is surprisingly similar to sampling from the empirical distribution: 1. choose a sample from the empirical distributions; 2. perturb each sample using a sample from the kernal function.
For example, if the kernel function is a Normal distribution of width w, then the perturbed sample could be calculated as:
new_sample = dataset[rand(1:length(dataset),sample_size)]+w*randn(sample_size)

What's the difference between ks test and bootstrap_p for power law fitting?

I want to know the goodness of fit while fitting a power law distribution in R using poweRlaw package.
After estimate_xmin() , I had a p-value 0.04614726. But the bootstrap_p() returns another p-value 0.
So why do these two p-value differ? And how can I judge if it is a power law distribution?
here is the plot when using poweRlaw for fittingpoweRlaw fitting result
You're getting a bit confused. One of the statistics that estimate_xmin returns is the Kolmogorov-Smirnoff statistic (as described in Clauset, Shalizi, Newman (2009)). This statistic is used to estimate the best cut-off value for your model, i.e. xmin. However, this doesn't tell you anything about the model fit.
To assess model suitability is where the bootstrap function comes in.

finding p-value from goodness of fit test in fitdistrplus package in r

I want to fit a distribution to my data. I use fitdistrplus package in r to find the distribution. I can compare the goodness of fit results for different distributions to see which one is more fitted to my data but I don't know how to check the pvalue for goodness of fit test for each of the distributions. The results might show that among gamma, lognormal and exponential, exponential distribution has the lower statistics for anderson darling test but I don't know how to check if pvalue for these tests does not reject the null hypothesis. Is there any built in function in R which gives the pvalues?
Here is a piece of code I used as an example:
d <- sample(100,50)
library(fitdistrplus)
descdist(d)
fitg <- fitdist(d,"gamma")
fitg2 <- fitdist(d,"exp")
gofstat(list(fitg,fitg2))
This code makes 50 random numbers from 0 to 100 and tries to find best fitted model to these data. If descdist(d) shows that gamma and exponential are the two candidates as the best fitted model, fitg and fitg2 finds their related models. the last line compares Ks and anderson darling statistics to show which distribution is most fitted. Distribution with lower value for these tests is the best. However, I dont know how to find p-values for fitg and fitg2 before comparying them. If pvalues show that none of these distributions are not fitted to these data, there is no point to comparing their goodness of fit statistics to my knowledge.
Any help is appreciated.
Thanks

How do I designate a negative binomial error distribution in a GLM using R?

I'm constructing a model using the glm() function in R. Let's say that I know that my data have an error distribution that fits a negative binomial distribution.
When I search the R manual for the various families, family=binomial is offered as an option, but negative binomial is not.
In the same section of the R manual (family), NegBinomial is linked in the "See also" section, but it is presented in the context of binomial coefficients (and I'm not even sure what this is referring to).
So, to summarize, I'm hoping to find syntax that would be analogous to glm(y~x, family=negbinomial, data=d,na.omit).
With an unknown overdispersion parameter, the negative binomial is not part of the negative exponential family, so can't be fitted as a standard GLM (or by glm()). There is a glm.nb() function in the MASS package that can help you ...
library(MASS)
glm.nb(y~x, ...)
If you happen to have a known/fixed overdispersion parameter (e.g. if you want to fit a geometric distribution model, which has theta=1), you can use the negative.binomial family from MASS:
glm(y~x,family=negative.binomial(theta=1), ...)
It might not hurt if MASS::glm.nb were in the "See Also" section of ?glm ...
I don't believe theta is the overdispersion parameter. Theta is a shape parameter for the distribution and overdispersion is the same as k, as discussed in The R Book (Crawley 2007). The model output from a glm.nb() model implies that theta does not equal the overdispersion parameter:
Dispersion parameter for Negative Binomial(0.493) family taken to be 0.4623841
The dispersion parameter is a different value than theta.

Resources