How to transform data after fitting a distribution with gamlss? - r

I have a data set where observations come from highly distinct groups. Each group may have a wildly different distribution, so I am trying to find the best distribution using fitdist from fitdistrplus, then use gamlssML from the gamlss package to find the best parameters.
My issue is with transforming the data after this step. For some of the distributions, like the Box-Cox t, I can find the equation for normalizing the data using the BCT coefficients, but for many of these distributions I cannot.
Does gamlss have a function that normalizes the data after fitting? Their documentation only provides the transformations for a small number of distributions https://www.gamlss.com/wp-content/uploads/2018/01/DistributionsForModellingLocationScaleandShape.pdf
Thanks a lot

The normalised data values (for any distribution) are exactly equal to the residuals from a gamlss fit,
m1 <- gamlss()
which can be accessed by
residuals(m1) or
m1$residuals

Related

Weighted mixture model of two distributions where weight depends on the value of the distribution?

I'm trying to replicate the precipitation mixture model from this paper: http://dx.doi.org/10.1029/2006WR005308
f(r) is the gamma PDF, g(r) is the generalized Pareto PDF, and w(r) is the weighting function, which depends on the value r being considered. I've looked at R packages like distr and mixtools that handle mixture models, but I only see examples where w is a constant, and I haven't found any implementations where the mixture is a function of the value. I'm struggling to create valid custom functions to represent h(r) so if someone could point me to a package that would be super helpful.

Using GAMLSS, the difference between fitDist() and gamlss()

When using the GAMLSS package in R, there are many different ways to fit a distribution to a set of data. My data is a single vector of values, and I am fitting a distribution over these values.
My question is this: what is the main difference between using fitDist() and gamlss() since they give similar but different answers for parameter values, and different worm plots?
Also, using the function confint() works for gamlss() fitted objects but not for objects fitted with fitDist(). Is there any way to produce confidence intervals for parameters fitted with the fitDist() function? Is there an accuracy difference between the two procedures? Thanks!
m1 <- fitDist()
fits many distributions and chooses the best according to a
generalized Akaike information criterion, GAIC(k), wit penalty k for each
fitted parameter in the distribution, where k is specified by the user,
e.g. k=2 for AIC,
k = log(n) for BIC,
k=4 for a Chi-squared test (rounded from 3.84, the 5% critical value of a Chi-squared distribution with 1 degree of fereedom), which is my preference.
m1$fits
gives the full results from the best to worst distribution according to GAIC(k).

Extract sample variance from svykm (survey package by Lumley) for complex survey analysis

In order to compare two survival curves at a fixed point in time and perform basically a two sample test, I need to extract the sample variance of the estimate at a given point in time.
For an object created with the svykm function from Thomas Lumley's survey package in R, this should be accessible in the varlog list. Do the entries in this list constitute the transformed variances on the log scale or the untransformed variances?
I have read the documentation provided for the survey package, but did not fully come to a conclusion. I note that confidence intervals are computed on the log(survival) scale, following the default in survival package and their bounds are given as exp(log(x$surv)+1.96*sqrt(x$varlog)) and exp(log(x$surv)-1.96*sqrt(x$varlog)) in the R package documentation.
They are variances on the log scale.

R: functions to determine distance of multivariate data to normal distribution

I have a multivariate data and I am interested to compute the distance of complete data to multivariate normal distribution. I want to use R. I have seen some functions like shapiro-wilk test etc. But from them I can only understand if p-value is less <0.05 it does not follow normal distribution. But I want to know how much it is far from the normal distribution. Can anyone please refer me to some functions that I can refer to for use.
Use the mqqnorm function from the RVAideMemoire package. It shows, among others, Mahalanobis distances. From the function example:
x <- 1:30+rnorm(30)
y <- 1:30+rnorm(30,1,3)
mqqnorm(cbind(x,y))

hurdle models using continuous data and covariates

I was wondering if I get some advice about fitting hurdle models using continuous data and covariates.
I have some continuous data that are generally well fit using a right-skewed distribution such as a Pareto, Gamma, or Weibull distribution. However, there several zeros in my data which are important to my analysis. In addition, I have some categorical (two-level) covariates and would like to model the parameters of a distribution as a function of these covariates in order to formally evaluate their importance (e.g., using AIC).
I have seen examples of hurdle models fit using continuous data but have not yet found any examples of how to incorporate covariates and a model-selection framework. Does anyone have any suggestions as to how to proceed or know of any R packages that allow this procedure? I have included some code below to reproduce the type of data I am working with. The non-zero data are generated via a generalized Pareto distribution from the package texmex. The parameters were estimated directly from my non-zero data. I have also included the code to plot the data in a histogram to see their distribution.
library("texmex")
set.seed(101)
zeros <- rep(0,8)
non_zeros <- rgpd(17, sigm=exp(-10.4856), xi=0.1030, u = 0)
all.data <- c(zeros,non_zeros)
hist(non_zeros,breaks=50,xlim=c(0,0.00015),ylim=c(0,9),main="",xlab="",
col="gray")
hist(zeros,add=TRUE,col="black",breaks=100,xlim=c(0,0.00015),ylim=c(0,9))
legend("topright",legend=c("zeros"),col="black",lwd=8)

Resources