Find R-square value of Weibull fit (Survival model) in R - r

I have a survival object (S) for which I am doing a weibull fit using the survreg function and weibull distribution in R.
S = Surv(data$ValueX, data$ValueY)
W = Survreg(S ~ 1, data=data, dist="weibull")
How do I extract the R-square value of the Weibull fit which is essentially a linear line? Or is there a function to calculate the correlation coefficient value Rho?
Basically, I want to calculate the goodness of fit.

Look at pam.censor in the PAmeasures package which produces an R^2 like statistic. Using the ovarian dataset from the survival package:
library(PAmeasures)
library(survival)
fit.s <- survreg(Surv(futime, fustat) ~ age, data = ovarian, dist="weibull" )
p <- predict(fit.s, type = "response")
with(ovarian, pam.censor(futime, p, fustat))
For the ovarian data with an age regressor we get a value of only 0.0915 .
Another idea is that for a Weibull model with no covariates we have S(t) = exp(- (lambda * t)^p) so log(-log(S(t))) is linear in log(t) hence we could use the R squared of the corresponding regression to measure how well the model fits to a Weibull.
library(survival)
fit1 <- survfit(Surv(futime, fustat) ~ 1, data = ovarian)
sum1 <- summary(fit1, times = ovarian$futime)
fo <- log(-log(surv)) ~ log(time)
d <- as.data.frame(sum1[c("time", "surv")])
fit.lm <- lm(fo, d)
summary(fit.lm)$r.sq
plot(fo, d)
abline(fit.lm)
For the ovarian data without covariates the R^2 at 93% is high but the plot does suggest systematic departures from linearity so it may not really be Weibull.
Other
Not sure if this is of interest but the eha package has the check.dist function which can be used for a visual comparison of a parametric baseline hazard model to a cox proportional hazard model. See the documentation as well as:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5233524/
Using the ovarian dataset from survival:
library(eha)
library(surival)
fit.c <- coxreg(Surv(futime, fustat) ~ age, data = ovarian)
fit.p <- phreg(Surv(futime, fustat) ~ age, data = ovarian, dist = "weibull")
check.dist(fit.c, fit.p)
The survAUC package has three functions that provide r squared type statistics for cox proportional hazard models (OXS, Nagelk and XO).

Related

Obtaining individual slopes from an lme4 object in R

I'm new to lme4 package in R. In my example below, I was wondering if it might be possible to obtain the gender slopes (i.e., differences) for each dep after fitting my glmer model?
dat <- data.frame(dep = rep(LETTERS[1:6],each=2), gender = rep(c("Ma","Fe"),6),
admit=c(512,89,353,17,120,202,138,131,53,94,22,24),
reject=c(313,19,207,8,205,391,279,244,138,299,351,317))
lme4::glmer(cbind(admit,reject) ~ gender+dep + (gender|dep), data=dat, family=binomial)
In lme4 you can get the estimated slopes from ranef, but in your model you will need to sum the global and unit specific terms, as in the example below.
library(lme4)
dat <- data.frame(dep = rep(LETTERS[1:6],each=2), gender = rep(c("Ma","Fe"),6),
admit=c(512,89,353,17,120,202,138,131,53,94,22,24),
reject=c(313,19,207,8,205,391,279,244,138,299,351,317))
mod1 <- glmer(cbind(admit,reject) ~ gender+dep + (gender|dep), data=dat, family=binomial)
summary(mod1)
ran_gender <- ranef(mod1)$dep
fe_mod1 <- fixef(mod1)
slopes <- fe_mod1[[2]] + ran_gender[,2]
slopes

Inter- or extrapolate from a cubic spline cox-proportional hazard model using RMS package

I am using the cph function in the RMS package to assess the association of WEIGHT with an EVENT outcome. I am using cubic spline with 3 knots.
How can I use the model to predict the hazard ratio for weights that were not included in the original data? i.e. How can I use the model to inter- or extrapolate?
Here is an example:
set.seed(123)
library(rms)
WT <- rnorm(10, 30, 10)
EVENT <- sample(c(0,1), replace=TRUE, size=10)
TIME <- c(seq(1,10,1))
df <- as.data.frame(cbind(TIME,EVENT,WT))
fit <- cph(Surv(TIME, EVENT==1) ~ rcs(WT, 3), data = df)
fit
d = datadist(df)
options(datadist = 'd')
Predict(fit, WT) # this predicts hazard for WT included in the data
How can I use the model fit to predict hazrad at WT==70 for example?
I am using R studio.

Getting estimated means after multiple imputation using the mitml, nlme & geepack R packages

I'm running multilevel multiple imputation through the package mitml (using the panimpute() function) and am fitting linear mixed models and marginal models through the packages nlme and geepack and the mitml:with() function.
I can get the estimates, p-values etc for those through the testEstimates() function but I'm also looking to get estimated means across my model predictors. I've tried the emmeans package, which I normally use for getting estimated means when running nlme & geepack without multiple imputation but doing so emmeans tell me "Can't handle an object of class “mitml.result”".
I'm wondering is there a way to get pooled estimated means from the multiple imputation analyses I've run?
The data frames I'm analyzing are longitudinal/repeated measures and in long format. In the linear mixed model I want to get the estimated means for a 2x2 interaction effect and in the marginal model I'm trying to get estimated means for the 6 levels of 'time' variable. The outcome in all models is continuous.
Here's my code
# mixed model
fml <- Dep + time ~ 1 + (1|id)
imp <- panImpute(data=Data, formula=fml, n.burn=50000, n.iter=5000, m=100, group = "treatment")
summary(imp)
plot(imp, trace="all")
implist <- mitmlComplete(imp, "all", force.list = TRUE)
fit <- with(implist, lme(Dep ~ time*treatment, random = ~ 1|id, method = "ML", na.action = na.exclude, control = list(opt = "optim")))
testEstimates(fit, var.comp = TRUE)
confint.mitml.testEstimates(testEstimates(fit, var.comp = TRUE))
# marginal model
fml <- Dep + time ~ 1 + (1|id)
imp <- panImpute(data=Data, formula=fml, n.burn=50000, n.iter=5000, m=100)
summary(imp)
plot(imp, trace="all")
implist <- mitmlComplete(imp, "all", force.list = TRUE)
fit <- with(implist, geeglm(Dep ~ time, id = id, corstr ="unstructured"))
testEstimates(fit, var.comp = TRUE)
confint.mitml.testEstimates(testEstimates(fit, var.comp = TRUE))
is there a way to get pooled estimated means from the multiple imputation analyses I've run?
This is not a reprex without Data, so I can't verify this works for you. But emmeans provides support for mira-class (lists of) models in the mice package. So if you fit your model in with() using the mids rather than mitml.list class object, then you can use that to obtain marginal means of your outcome (and any contrasts or pairwise comparisons afterward).
Using example data found here, which uncomfortably loads an external workspace:
con <- url("https://www.gerkovink.com/mimp/popular.RData")
load(con)
## imputation
library(mice)
ini <- mice(popNCR, maxit = 0)
meth <- ini$meth
meth[c(3, 5, 6, 7)] <- "norm"
pred <- ini$pred
pred[, "pupil"] <- 0
imp <- mice(popNCR, meth = meth, pred = pred, print = FALSE)
## analysis
library(lme4) # fit multilevel model
mod <- with(imp, lmer(popular ~ sex + (1|class)))
library(emmeans) # obtain pooled estimates of means
(em <- emmeans(mod, specs = ~ sex) )
pairs(em) # test comparison

How to check for overdispersion in a GAM with negative binomial distribution?

I fit a Generalized Additive Model in the Negative Binomial family using gam from the mgcv package. I have a data frame containing my dependent variable y, an independent variable x, a factor fac and a random variable ran. I fit the following model
gam1 <- gam(y ~ fac + s(x) + s(ran, bs = 're'), data = dt, family = "nb"
I have read in Negative Binomial Regression book that it is still possible for the model to be overdisperesed. I have found code to check for overdispersion in glm but I am failing to find it for a gam. I have also encountered suggestions to just check the QQ plot and standardised residuals vs. predicted residuals, but I can not decide from my plots if the data is still overdisperesed. Therefore, I am looking for an equation that would solve my problem.
A good way to check how well the model compares with the observed data (and hence check for overdispersion in the data relative to the conditional distribution implied by the model) is via a rootogram.
I have a blog post showing how to do this for glm() models using the countreg package, but this works for GAMs too.
The salient parts of the post applied to a GAM version of the model are:
library("coenocliner")
library('mgcv')
## parameters for simulating
set.seed(1)
locs <- runif(100, min = 1, max = 10) # environmental locations
A0 <- 90 # maximal abundance
mu <- 3 # position on gradient of optima
alpha <- 1.5 # parameter of beta response
gamma <- 4 # parameter of beta response
r <- 6 # range on gradient species is present
pars <- list(m = mu, r = r, alpha = alpha, gamma = gamma, A0 = A0)
nb.alpha <- 1.5 # overdispersion parameter 1/theta
zprobs <- 0.3 # prob(y == 0) in binomial model
## simulate some negative binomial data from this response model
nb <- coenocline(locs, responseModel = "beta", params = pars,
countModel = "negbin",
countParams = list(alpha = nb.alpha))
df <- setNames(cbind.data.frame(locs, nb), c("x", "yNegBin"))
OK, so we have a sample of data drawn from a negative binomial sampling distribution and we will now fit two models to these data:
A Poisson GAM
m_pois <- gam(yNegBin ~ s(x), data = df, family = poisson())
A negative binomial GAM
m_nb <- gam(yNegBin ~ s(x), data = df, family = nb())
The countreg package is not yet on CRAN but it can be installed from R-Forge:
install.packages("countreg", repos="http://R-Forge.R-project.org")
Then load the packages and plot the rootograms:
library("countreg")
library("ggplot2")
root_pois <- rootogram(m_pois, style = "hanging", plot = FALSE)
root_nb <- rootogram(m_nb, style = "hanging", plot = FALSE)
Now plot the rootograms for each model:
autoplot(root_pois)
autoplot(root_nb)
This is what we get (after plotting both using cowplot::plot_grid() to arrange the two rootograms on the same plot)
We can see that the negative binomial model does a bit better here than the Poisson GAM for these data — the bottom of the bars are closer to zero throughout the range of the observed counts.
The countreg package has details on how you can add an uncertain band around the zero line as a form of goodness of fit test.
You can also compute the Pearson estimate for the dispersion parameter using the Pearson residuals of each model:
r$> sum(residuals(m_pois, type = "pearson")^2) / df.residual(m_pois)
[1] 28.61546
r$> sum(residuals(m_nb, type = "pearson")^2) / df.residual(m_nb)
[1] 0.5918471
In both cases, these should be 1; we see substantial overdispersion in the Poisson GAM, and some under-dispersion in the Negative Binomial GAM.

How to plot the survival curve generated by survreg (package survival of R)?

I’m trying to fit and plot a Weibull model to a survival data. The data has just one covariate, cohort, which runs from 2006 to 2010. So, any ideas on what to add to the two lines of code that follows to plot the survival curve of the cohort of 2010?
library(survival)
s <- Surv(subSetCdm$dur,subSetCdm$event)
sWei <- survreg(s ~ cohort,dist='weibull',data=subSetCdm)
Accomplishing the same with the Cox PH model is rather straightforward, with the following lines. The problem is that survfit() doesn’t accept objects of type survreg.
sCox <- coxph(s ~ cohort,data=subSetCdm)
cohort <- factor(c(2010),levels=2006:2010)
sfCox <- survfit(sCox,newdata=data.frame(cohort))
plot(sfCox,col='green')
Using the data lung (from the survival package), here is what I'm trying to accomplish.
#create a Surv object
s <- with(lung,Surv(time,status))
#plot kaplan-meier estimate, per sex
fKM <- survfit(s ~ sex,data=lung)
plot(fKM)
#plot Cox PH survival curves, per sex
sCox <- coxph(s ~ as.factor(sex),data=lung)
lines(survfit(sCox,newdata=data.frame(sex=1)),col='green')
lines(survfit(sCox,newdata=data.frame(sex=2)),col='green')
#plot weibull survival curves, per sex, DOES NOT RUN
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
lines(survfit(sWei,newdata=data.frame(sex=1)),col='red')
lines(survfit(sWei,newdata=data.frame(sex=2)),col='red')
Hope this helps and I haven't made some misleading mistake:
copied from above:
#create a Surv object
s <- with(lung,Surv(time,status))
#plot kaplan-meier estimate, per sex
fKM <- survfit(s ~ sex,data=lung)
plot(fKM)
#plot Cox PH survival curves, per sex
sCox <- coxph(s ~ as.factor(sex),data=lung)
lines(survfit(sCox,newdata=data.frame(sex=1)),col='green')
lines(survfit(sCox,newdata=data.frame(sex=2)),col='green')
for Weibull, use predict, re the comment from Vincent:
#plot weibull survival curves, per sex,
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
lines(predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
lines(predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
The trick here was reversing the quantile orders for plotting vs predicting. There is likely a better way to do this, but it works here. Good luck!
An alternative option is to make use of the package flexsurv. This offers some additional functionality over the survival package - including that the parametric regression function flexsurvreg() has a nice plot method which does what you ask.
Using lung as above;
#create a Surv object
s <- with(lung,Surv(time,status))
require(flexsurv)
sWei <- flexsurvreg(s ~ as.factor(sex),dist='weibull',data=lung)
sLno <- flexsurvreg(s ~ as.factor(sex),dist='lnorm',data=lung)
plot(sWei)
lines(sLno, col="blue")
You can plot on the cumulative hazard or hazard scale using the type argument, and add confidence intervals with the ci argument.
This is just a note clarifying Tim Riffe's answer, which uses the following code:
lines(predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
lines(predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
The reason for the two mirror-image sequences, seq(.01,.99,by=.01) and seq(.99,.01,by=-.01), is because the predict() method is giving quantiles for the event distribution f(t) - that is, values of the inverse CDF of f(t) - while a survival curve is plotting 1-(CDF of f) versus t. In other words, if you plot p versus predict(p), you'll get the CDF, and if you plot 1-p versus predict(p) you'll get the survival curve, which is 1-CDF. The following code is more transparent and generalizes to arbitrary vectors of p values:
pct <- seq(.01,.99,by=.01)
lines(predict(sWei, newdata=list(sex=1),type="quantile",p=pct),1-pct,col="red")
lines(predict(sWei, newdata=list(sex=2),type="quantile",p=pct),1-pct,col="red")
In case someone wants to add a Weibull distribution to the Kaplan-Meyer curve in the ggplot2 ecosystem, we can do the following:
library(survminer)
library(tidyr)
s <- with(lung,Surv(time,status))
fKM <- survfit(s ~ sex,data=lung)
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
pred.sex1 = predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01))
pred.sex2 = predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01))
df = data.frame(y=seq(.99,.01,by=-.01), sex1=pred.sex1, sex2=pred.sex2)
df_long = gather(df, key= "sex", value="time", -y)
p = ggsurvplot(fKM, data = lung, risk.table = T)
p$plot = p$plot + geom_line(data=df_long, aes(x=time, y=y, group=sex))
In case you'd like to use the survival function itself S(t) (instead of the inverse survival function S^{-1}(p) used in other answers here) I've written a function to implement that for the case of the Weibull distribution (following the same inputs as the pec::predictSurvProb family of functions:
survreg.predictSurvProb <- function(object, newdata, times){
shape <- 1/object$scale # also equals 1/exp(fit$icoef[2])
lps <- predict(object, newdata = newdata, type = "lp")
surv <- t(sapply(lps, function(lp){
sapply(times, function(t) 1 - pweibull(t, shape = shape, scale = exp(lp)))
}))
return(surv)
}
You can then do:
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
times <- seq(min(lung$time), max(lung$time), length.out = 1000)
new_dat <- data.frame(sex = c(1,2))
surv <- survreg.predictSurvProb(sWei, newdata = new_dat, times = times)
lines(times, surv[1, ],col='red')
lines(times, surv[2, ],col='red')

Resources