MLE of Gamma Distribution from mgcv::GAM and fitdistrplus - r

I've fitted a GAM to some data. However, I'm having trouble understanding how the information about the returned distribution from mgcv::GAM relates to that fitted by fitdistrplus.
From the output of summary(GAMObject), I deduce that the (Dispersion?) 'Scale est': 0.0020408 is approximately the inverse of Alpha.
>GAMObject$family
Family: Gamma
Link function: identity
>MASS::gamma.shape(GAMObject)
Alpha=486.660679
SE = 3.060256
>fitdistrplus::fitdist(Data,"gamma","mle")
Fitting of the distribution ' gamma ' by maximum likelihood
Parameters:
estimate Std. Error
shape 13.823388 0.08592579
rate 1.796375 0.01137114
>plot(fit.gamma)
Here's the output from the GAM.
>mgcv::gam.check(GAMObject)

Related

error flexible calibration curve with val.prob.ci.2 in LASSO logistic regression model (internal calibration) in R

i want to calculate a flexible calibration curve after developing a logistic regression model with cv.glmnet function using the LASSO in R.
Here's part of my code:
install.packages("glmnet")
install.packages("CalibrationCurves")
#building model with 4 predictors, binomial outcome, 10 fold cross-validation for identifying lambda.min
cv.fit <-cv.glmnet(x=data.matrix(data[,2:5]),y=data$outcome, alpha=1,standardize=TRUE,intercept=TRUE,type.measure="deviance",nfolds=10,weights=WeightsTest2)
#calculating predicted probabilities of model in original dataset
pred_original_logodds <- c(predict(cv.fit, newx=data.matrix(data[,2:5]), s="lambda.min",type="response"))
#calculating probabilities
predict_original <- exp(pred_original_logodds)/(1+exp(pred_original_logodds))
#Fit calibration plot using val.prob.ci.2 function
CalibrationCurves::val.prob.ci.2(p = predict_original, y = data$outcome[enter image description here][1])
I get the following warning message: Warning: collapsing to unique 'x' values.
#Output
A 95% confidence interval is given for the calibration intercept, calibration slope and c-statistic.
Dxy C (ROC) R2 D
0.5737520 0.7868760 0.2907194 0.2128636
D:Chi-sq D:p U U:Chi-sq
127.6538494 0.0000000 0.5822965 348.4664350
U:p Q Brier Intercept
0.0000000 -0.3694329 0.2689491 -1.4232385
Slope Emax Brier scaled Eavg
6.7630170 0.4582670 -0.4947364 0.3431554
ECI
13.4254419
The flexible calibration curve is not properly fitted - i think due to "unique x values". What does this warning message mean?
Regards,
Max

How to use the gamma distribution equation

I am using R to fit a GLM with Gamma distribution (link inverse). I would like to use the equation of the model to get other values of my predictors, knowing the response value. I know that the equation of a gamma distribution with link inverse is 1/μ = b0 + b1x1i, can you confirm that I should substitute to μ the mean of my response value?

How do I specify the dispersion parameter when computing the confidence interval for a GLM?

I have a model of exponential decay in the form Y = exp{a + bX + cW}. In R, I represent this as a generalized linear model (GLM) using a gamma random component with log link function.
fitted <- glm(Y ~ X + W, family=Gamma(link='log'))
I know from this post that for the standard errors to really represent an exponential rather than gamma random component, I need to specify the dispersion parameter as being 1 when I call summary.
summary(fitted, dispersion=1)
summary(fitted) # not the same!
Now, I want to find the 95% confidence intervals for my estimates of a, b, c. However, there seems to be no way to specify the dispersion parameter for the confint, even though I know it should affect the confidence interval (because it affects the standard error).
confint(fitted)
confint(fitted, dispersion=1) # same as the last confint :(
So, in order to get the confidence intervals corresponding to an exponential rather than gamma random component, how do I specify the dispersion parameter when computing the confidence interval for a GLM?

auto.arima produces non-gaussian residual

I'm using R's auto.arimafunction - but it seems like that it does not produce gaussian errors all the time. I cannot find any documentation that it does some bootstrapping of the prediction error (if the error is not gaussian), or what it does if the error is not gaussian?
Estimation does not require Gaussian errors, even when a Gaussian likelihood is being used. A Gaussian likelihood is almost the same as least squares and will give consistent estimates for any error distribution with finite variance.
The only time that the distribution of residuals really matters is when producing prediction intervals. If the residuals are not Gaussian, the default prediction intervals will not necessarily have the correct coverage. But then you can set bootstrap=TRUE and get bootstrapped prediction intervals which are based on the empirical distribution of the residuals.

Estimating mean time to failure with survreg/flexsurvreg with standard error

I am trying to estimate the mean time to failure for a Weibull distribution fitted to some survival data with flexsurvreg from the flexsurv package. I need to be able to estimate the standard error for use in a simulation model.
Using flexsurvreg with the lung data as an example;
require(flexsurv)
lungS <- Surv(lung$time,lung$status)
lungfit <- flexsurvreg(lungS~1,dist="weibull")
lungfit
Call:
flexsurvreg(formula = lungS ~ 1, dist = "weibull")
Maximum likelihood estimates:
est L95% U95%
shape 1.32 1.14 1.52
scale 418.00 372.00 469.00
N = 228, Events: 165, Censored: 63
Total time at risk: 69593
Log-likelihood = -1153.851, df = 2
AIC = 2311.702
Now, calculating the mean is just a case of plugging in the estimated parameter values into the standard formula, but is there an easy way of getting out the standard error of this estimate? Can survreg do this?
In flexsurv version 0.2, if x is the fitted model object, then x$cov is the covariance matrix of the parameter estimates, with positive parameters on the log scale. You could then use the asymptotic normal property of maximum likelihood estimators. Simulate a large number of multivariate normal vectors, with the estimates as means, and this covariance matrix (using e.g. rmvnorm from the mvtnorm package). This gives you replicates of the parameter estimates under sampling uncertainty. Calculate the corresponding mean survival for each replicate, then take the SD or quantiles of the resulting sample to get the standard error or a confidence interval.

Resources