How to perform survival analysis using a mixture distribution - r

I want to use a mixture of Gamma distribution as a parametric model for survival analysis on censored data using R. In the "flexsurv" package there are different distributions but I couldn't find a Gamma mixture distribution. In that package, it states that:
"Any user-defined parametric distribution can be fitted, given at least an R function defining the probability density or hazard."
https://cran.r-project.org/web/packages/flexsurv/flexsurv.pdf
Is there a way to directly define a Gamma mixture distribution (with a pre-specified number of components) in a parametric way to directly use this package for the maximum likelihood estimation?
data <- Surv(ages, censored)
fit_gammamixture <- flexsurvreg(data~1, dist=???)
I've found this paper regarding survival analysis with a mixture of gamma distributions but it is hard to understand and implement the algorithm presented here.
Modeling Censored Lifetime Data Using a Mixture of Gammas Baseline
https://projecteuclid.org/download/pdf_1/euclid.ba/1340371053

Related

Weighted mixture model of two distributions where weight depends on the value of the distribution?

I'm trying to replicate the precipitation mixture model from this paper: http://dx.doi.org/10.1029/2006WR005308
f(r) is the gamma PDF, g(r) is the generalized Pareto PDF, and w(r) is the weighting function, which depends on the value r being considered. I've looked at R packages like distr and mixtools that handle mixture models, but I only see examples where w is a constant, and I haven't found any implementations where the mixture is a function of the value. I'm struggling to create valid custom functions to represent h(r) so if someone could point me to a package that would be super helpful.

What is the difference between psm and cph in RMS package in R

I have a survival data, but I'm not sure what's the different between psm and cph.
How can I determine the model?
Different model will establish the different nomogram, but I'm not really sure which model I should use?
library(rms)
f2 <- psm(Surv(follow_time_5y, DEATH_5y) ~ age+ID_SEX+MH_CCI_total_score,
data =sci_20190505, dist='lognormal')
f2 <- cph(Surv(follow_time_5y, DEATH_5y) ~ age+ID_SEX+MH_CCI_total_score,
data =sci_20190505,x=TRUE,y=TRUE,surv=TRUE, time.inc=1825)
Depending of what you want.
PSM:
psmis a modification of Therneau survreg function for fitting the accelerated failure time family of parametric survival models.psmuses therms class for automatic anova, fastbw, calibrate, validate, and other functions.Hazard.psm,Survival.psm,Quantile.psm, and Mean.psmcre-ate S functions that evaluate the hazard, survival, quantile, and mean (expected value) functions analytically, as functions of time or probabilities and the linear predictor values.
CPH:
Modification of Therneau coxph function to fit the Cox model and its
extension, the Andersen-Gill model. The latter allows for interval
time-dependent covariables, time-dependent strata, and repeated
events. The Survival method for an object created by cph returns
an S function for computing estimates of the survival function.
The Quantile method for cph returns an S function for computing quantiles
of survival time (median, by default).
So to answer your question:
"Whats the difference?"
The difference is in the model used.
psm (parametric survival model) uses a survival model based on functions and their parameters.
A good paper for parametric survival is this
cph (Cox Proportional Hazards Model and Extensions) is using the cox model (and the Anderson-Gill model) which is based on the hazard functions.
You can check the wikipedia article here

Estimating a probability distribution and sampling from it in Julia

I am trying to use Julia to estimate a continuous univariate distribution using N observed data points (stored as an array of Float64 numbers), and then sample from this estimated distribution. I have no prior knowledge restricting attention to some family of distributions.
I was thinking of using the KernelDensity package to estimate the distribution, but I'm not sure how to sample from the resulting output.
Any help/tips would be much appreciated.
Without any restrictions on the estimated distribution, a natural candidate would be the empirical distribution function (see Wikipedia). For this distribution there are very nice theorems about convergence to actual distribution (see Dvoretzky–Kiefer–Wolfowitz inequality).
With this choice, sampling is especially simple. If dataset is a list of current samples, then dataset[rand(1:length(dataset),sample_size)] is a set of new samples from the empirical distribution. With the Distributions package, it could be more readable, like so:
using Distributions
new_sample = sample(dataset,sample_size)
Finally, Kernel density estimation is also good, but might need a parameter to be chosen (the kernel and its width). This shows a preference for a certain family of distributions. Sampling from a kernel distribution is surprisingly similar to sampling from the empirical distribution: 1. choose a sample from the empirical distributions; 2. perturb each sample using a sample from the kernal function.
For example, if the kernel function is a Normal distribution of width w, then the perturbed sample could be calculated as:
new_sample = dataset[rand(1:length(dataset),sample_size)]+w*randn(sample_size)

What's the difference between ks test and bootstrap_p for power law fitting?

I want to know the goodness of fit while fitting a power law distribution in R using poweRlaw package.
After estimate_xmin() , I had a p-value 0.04614726. But the bootstrap_p() returns another p-value 0.
So why do these two p-value differ? And how can I judge if it is a power law distribution?
here is the plot when using poweRlaw for fittingpoweRlaw fitting result
You're getting a bit confused. One of the statistics that estimate_xmin returns is the Kolmogorov-Smirnoff statistic (as described in Clauset, Shalizi, Newman (2009)). This statistic is used to estimate the best cut-off value for your model, i.e. xmin. However, this doesn't tell you anything about the model fit.
To assess model suitability is where the bootstrap function comes in.

finding p-value from goodness of fit test in fitdistrplus package in r

I want to fit a distribution to my data. I use fitdistrplus package in r to find the distribution. I can compare the goodness of fit results for different distributions to see which one is more fitted to my data but I don't know how to check the pvalue for goodness of fit test for each of the distributions. The results might show that among gamma, lognormal and exponential, exponential distribution has the lower statistics for anderson darling test but I don't know how to check if pvalue for these tests does not reject the null hypothesis. Is there any built in function in R which gives the pvalues?
Here is a piece of code I used as an example:
d <- sample(100,50)
library(fitdistrplus)
descdist(d)
fitg <- fitdist(d,"gamma")
fitg2 <- fitdist(d,"exp")
gofstat(list(fitg,fitg2))
This code makes 50 random numbers from 0 to 100 and tries to find best fitted model to these data. If descdist(d) shows that gamma and exponential are the two candidates as the best fitted model, fitg and fitg2 finds their related models. the last line compares Ks and anderson darling statistics to show which distribution is most fitted. Distribution with lower value for these tests is the best. However, I dont know how to find p-values for fitg and fitg2 before comparying them. If pvalues show that none of these distributions are not fitted to these data, there is no point to comparing their goodness of fit statistics to my knowledge.
Any help is appreciated.
Thanks

Resources