Estimating mean time to failure with survreg/flexsurvreg with standard error - r

I am trying to estimate the mean time to failure for a Weibull distribution fitted to some survival data with flexsurvreg from the flexsurv package. I need to be able to estimate the standard error for use in a simulation model.
Using flexsurvreg with the lung data as an example;
require(flexsurv)
lungS <- Surv(lung$time,lung$status)
lungfit <- flexsurvreg(lungS~1,dist="weibull")
lungfit
Call:
flexsurvreg(formula = lungS ~ 1, dist = "weibull")
Maximum likelihood estimates:
est L95% U95%
shape 1.32 1.14 1.52
scale 418.00 372.00 469.00
N = 228, Events: 165, Censored: 63
Total time at risk: 69593
Log-likelihood = -1153.851, df = 2
AIC = 2311.702
Now, calculating the mean is just a case of plugging in the estimated parameter values into the standard formula, but is there an easy way of getting out the standard error of this estimate? Can survreg do this?

In flexsurv version 0.2, if x is the fitted model object, then x$cov is the covariance matrix of the parameter estimates, with positive parameters on the log scale. You could then use the asymptotic normal property of maximum likelihood estimators. Simulate a large number of multivariate normal vectors, with the estimates as means, and this covariance matrix (using e.g. rmvnorm from the mvtnorm package). This gives you replicates of the parameter estimates under sampling uncertainty. Calculate the corresponding mean survival for each replicate, then take the SD or quantiles of the resulting sample to get the standard error or a confidence interval.

Related

fitdistrplus vs MASS - difference in standard error outputs of estimates

I am trying to fit a Gamma distribution to data. Since data vector is huge, I am not able to copy paste the vector here, but following are some summary statisitics-
Min. 1st Qu. Median Mean 3rd Qu. Max.
11.96 170.41 792.28 1983.93 2511.30 42039.76
I tried to fit Gamma distribution using fitdistrplus package -
fitdist(df %>% pull(x)/100, "gamma", start=list(shape = 1, rate = 0.1), lower=0.01)
I get estimates of parameters, but standard errors are NA (as well as correlation matrix between parameters) -
Fitting of the distribution ' gamma ' by maximum likelihood
Parameters :
estimate Std. Error
shape 0.56172991 NA
rate 0.02846644 NA
Loglikelihood: -1582.601 AIC: 3169.202 BIC: 3177.244
Correlation matrix:
[1] NA
However, when I do the same with fitdistr from MASS, it gives out standard errors -
fitdistr(df$x/100, "gamma", start=list(shape = 1, rate = 0.1), lower=0.01)
shape rate
0.561910739 0.028481215
(0.032652615) (0.002494628)
My questions are following -
Obviously estimates are coming out as same, but why is one giving out standard errors but another is unable to do so?
The fitdist function from fitdistrplus gives out correlation matrix between parameters. But I am unable to understand what inferences I could deduce from it regarding the quality of fit?

standard error & variance of model predictions using merTools::predictInterval

I would like to estimate the standard error and variance of predictions from a linear mixed model. I'm using merTools::predictInterval to estimate prediction intervals because I want to include some of the uncertainty in the random effects (in addition to the uncertainty in the fixed effects). Is it acceptable to use the simulations from merTools::predictInterval to estimate the se and variance of predictions? If so, how should I calculate them? I can think of 2 ways:
To get variance corresponding to the prediction interval (i.e. including residual variance), I would first get the simulated predictions:
predictions <- merTools::predictInterval(...,
include.resid.var = TRUE,
returnSims = TRUE)
1. Then I could estimate variance using the normal approximation (calculate the distance between the fit and the upper/lower interval and then divide that by 1.96):
var1 <- ((predictions$upr - predictions$lwr)/2/1.96)^2
2. Or I could just take the variance of the simulated values:
var2 <- apply(X = attr(x = predictions, which = 'sim.results'), MARGIN = 1, FUN = var)
The SE would then be the square root of the variance. To get the SE and/or variance relating to the confidence interval, I could repeat this with include.resid.var = FALSE in the merTools::predictInterval call.
Is either of these methods acceptable? Is one preferable to the other?

How to estimate the odds ratio with CI for X in a logistic regression containing the square of X using R?

I am trying to calculate odds ratios in R for variables with not only linear but also with quadratic terms in logistic regression. Let's say there is X and X^2 in the model. I know how to get the odds ratio (for a unit change of X) when X takes a specific value, but I don't know how to calculate confidence interval for this estimate. I found this reference how it's done in SAS: http://support.sas.com/kb/35/189.html , but I would like to do it in R. Any suggestions?
#BenBolker
Here is an example:
mydata <-read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata <- transform(mydata, gpaSquared=gpa^2,greSquared=gre^2)
model <- glm(admit ~ gpa + gpaSquared + gre , family = binomial(logit), data = mydata)
In this example the odds ratio for gpa depends on the actual value of gpa (e.g. the effect of a unit change in gpa if gpa=4). I can calculate the log odds for gpa=5 and gpa=4 and get the odds ratio from those, but I don't know how to get CI for the OR. (please ignore that in the example the squared term is not stat. significant...)
m <- glm(x~X1^2+X2,data,family=binomial(link="logit"))
summary(m)
confint(m) # 95% CI for the coefficients using profiled log-likelihood
confint.default(m) ## CIs using standard errors
exp(coef(m)) # exponentiated coefficients
exp(confint(m)) # 95% CI for exponentiated coefficients

residuals in R using auto.arima and forecast package

I am fitting a model using the auto.arima function in package forecast. I get a model that is AR(1), for example. I then extract residuals from this model. How does this generate the same number of residuals as the original vector? If this is an AR(1) model then the number of residuals should be 1 less than the dimensionality of the original time series. What am I missing?
Example:
require(forecast)
arprocess = as.numeric(arima.sim(model = list(ar=.5), n=100))
#auto.arima(arprocess, d=0, D=0, ic="bic", stationary=T)
# Series: arprocess
# ARIMA(1,0,0) with zero mean
# Coefficients:
# ar1
# 0.5198
# s.e. 0.0867
# sigma^2 estimated as 1.403: log likelihood=-158.99
# AIC=321.97 AICc=322.1 BIC=327.18
r = resid(auto.arima(arprocess, d=0, D=0, ic="bic", stationary=T))
> length(r)
[1] 100
Update: Digging into the code of auto.arima, I see that it uses Arima which in turn uses stats:::arima. Therefore the question is really how does stats:::arima compute residuals for the very first observation?
The residuals are the actual values minus the fitted values. For the first observation, the fitted value is the estimated mean of the process. For subsequent observations, the fitted value is $\phi$ times the previous observation, assuming an AR(1) process had been estimated.

Any simple way to get regression prediction intervals in R?

I am working on a big data set having over 300K elements, and running some regression analysis trying to estimate a parameter called Rate using the predictor variable Distance. I have the regression equation. Now I want to get the confidence and prediction intervals. I can easily get the confidence intervals for the coefficients by the command:
> confint(W1500.LR1, level = 0.95)
2.5 % 97.5 %
(Intercept) 666.2817393 668.0216072
Distance 0.3934499 0.3946572
which gives me the upper and lower bounds for the CI of the coefficients. Now I want to get the same upper and lower bounds for the Prediction Intervals. Only thing I have learnt so far is that, I can get the prediction intervals for specific values of Distance (say 200, 500, etc.) using the code:
predict(W1500.LR1, newdata, interval="predict")
This is not useful for me because I have over 300K different distance values, requiring to run this code for each of them. Any simple way to get the prediction intervals like the confint command I showed above?
Had to make up my own data but here you go
x = rnorm(300000)
y = jitter(3*x,1000)
fit = lm(y~x)
#Prediction intervals
pred.int = predict(fit,interval="prediction")
#Confidence intervals
conf.int = predict(fit,interval="confidence")
fitted.values = pred.int[,1]
pred.lower = pred.int[,2]
pred.upper = pred.int[,3]
plot(x[1:1000],y[1:1000])
lines(x[1:1000],fitted.values[1:1000],col="red",lwd=2)
lines(x[1:1000],pred.lower[1:1000],lwd=2,col="blue")
lines(x[1:1000],pred.upper[1:1000],lwd=2,col="blue")
So as you can see your prediction is for predicting new data values and not for constructing intervals for the beta coefficients. So the confidence intervals you actually want would be obtained in the same fashion from conf.int.

Resources