We can use survminer to plot the survival function or cumulative hazard function, but I cannot see a way to use it to plot the hazard function.
For example,
library(survival)
library(tidyverse)
library(survminer)
data(lung)
# Run Kaplan-Meier on the data
mod.lung <- survfit(Surv(time, status) ~ 1, data = lung)
# Kaplan-Meier Survival Curve
ggsurvplot(mod.lung)
# Cumulative Hazard
ggsurvplot(mod.lung, fun = function(y) -log(y))
Since the cumulative hazard function is H(t) = -log(S(t)) then I just need to add in fun = function(y) -log(y) to get the cumulative hazard plot.
The hazard function is h(t) = -d/dt log(S(t)), and so I am unsure how to use this to get the hazard function in a survminer plot.
An alternative definition of the hazard function is h(t) = f(t)/S(t), however, I'm unsure how to use this to get the plot.
I have found ways to get a hazard plot using ggplot2, for example
survival.table1 <- broom::tidy(mod.lung) %>% filter(n.event > 0)
survival.table1 <- survival.table1 %>% mutate(hazard = n.event / (n.risk * (lead(time) - time)))
ggplot() +
geom_step(data = survival.table1, aes(x = time, y = hazard)) +
labs(x = "Time", y = "Hazard")
However, I mainly wish to find a way with the survminer package, partly to have some consistency.
Thanks
In the rms package, you can using the survplot function with "what" parameter specified as "hazard" to plot the hazard function verse time.
https://rdrr.io/cran/rms/man/survplot.html
A couple of years late, but here for others.
survplot can only be used to plot the hazard if the estimate was created by the psm function. The psm function of the rms library fits the accelerated failure time family of parametric survival models (defaulting to a Weibull distribution). Other available distributions are in the documentation for the survreg package:
These include "weibull", "exponential", "gaussian", "logistic","lognormal" and "loglogistic". Otherwise, it is assumed to be a user defined list conforming to the format described in survreg.distributions
library(rms)
mod.lung <- psm(Surv(time, status) ~ 1, data = lung)
survplot(mod.lung, what="hazard")
For non parametric survival models, the muhaz library might be more useful. This example uses the default "epanechnikov" boundary kernel function. You may wish to explore different bandwidth options - see the muhaz package documentation.
library(muhaz)
mod.lung <- muhaz(lung$time, lung$status - 1) # status must be 1 for failure and 0 for censored
plot(mod.lung)
Alternatively, to apply B-splines instead of kernel density smoothing to the hazard, have a look at the bshazard library
library(bshazard)
mod.lung <- bshazard(Surv(time, status) ~ 1, data = lung)
plot(mod.lung)
Related
I want to add at-risk table and 95% confidence intervals to adjusted survival curves. Something like survival curves below (Link). I saw some relevant code here but it doesn't mention what I need.
Another question, would it be correct if I used the obtained weight variable (from iptw package) in the adjustment (as I have a column named weight in my actual dataset.
Here is my code:
library(survival);library(survminer)
data(lung);names(lung)
#fit <- coxph( Surv(time, status==2) ~ ph.karno + strata(sex), data = lung )
lung$sex <- ifelse(lung$sex == 1, "Male", "Female")
fit <- coxph(Surv(time, status) ~ ph.ecog + age +strata(sex), data = lung)
ggadjustedcurves(fit,
variable = "sex",
data = lung,
method = "average",
palette = c("#E69F00", "#56B4E9"),
size = 1.3,
legend = "right",
legend.title = expression(bold("Legend title")),
xlab = "Time",font.legend = 12) + theme(legend.text.align = 0.5)
Short Answer: Currently there is no R-Package that allows you to directly plot confounder-adjusted survival curves with confidence intervals. However there is a function called ate in the riskRegression R-Package which can be used to calculate adjusted survival probability estimates at some points in time with 95% confidence intervals. If you do some coding you can definitly use that one to get what you want.
Alternatively you can wait a little longer. I am currently working on an R-Package that implements various methods of confounder-adjusted survival curves (and cumulative incidence functions when there are competing risks) complete with confidence intervals, hypothesis tests, risk tables and so on.
UPDATE (09.05.2022):
The adjustedCurves package can now be used to obtain adjusted survival curves with confidence intervals. For your example:
library(survival)
library(devtools)
# install adjustedCurves from github, load it
devtools::install_github("/RobinDenz1/adjustedCurves")
library(adjustedCurves)
# fit required model
lung$sex <- as.factor(ifelse(lung$sex == 1, "Male", "Female"))
lung$status <- lung$status - 1
fit <- coxph(Surv(time, status) ~ ph.ecog + age + strata(sex), data=lung,
x=TRUE)
# calculate and plot curves
adj <- adjustedsurv(data=lung, variable="sex", ev_time="time",
event="status", method="direct",
outcome_model=fit, conf_int=TRUE)
plot(adj, conf_int=TRUE)
I am a beginner, trying to do survival analysis using machine learning on the lung cancer dataset. I know how to do the survival analysis using the Cox proportional hazard model. Cox proportional hazard model provides us the hazard ratios, which are nothing but the exponential of the regression coefficients. I wonder if, we can do the same thing using machine learning. As a beginner, I am trying survivalsvm from the R language. Please see the link for this. I am using the inbuilt cancer data for doing survival analysis. Following is the R code, given at this link.
library(survival)
library(survivalsvm)
set.seed(123)
n <- nrow(veteran)
train.index <- sample(1:n, 0.7 * n, replace = FALSE)
test.index <- setdiff(1:n, train.index)
survsvm.reg <- survivalsvm(Surv(diagtime, status) ~ .,
subset = train.index, data = veteran,
type = "regression", gamma.mu = 1,
opt.meth = "quadprog", kernel = "add_kernel")
print(survsvm.reg)
pred.survsvm.reg <- predict(object = survsvm.reg,
newdata = veteran, subset = test.index)
print(pred.survsvm.reg)
Can anyone help me to get the hazard ratios or survival curve for this dataset? Also, how to interpret the output of this function
This question is kind of old now but I'm going to answer anyway because this is a difficult problem and I struggled with {survivalsvm} when I first used it.
So depending on the type argument you get different outputs. In your case type = "regression" means you are plotting Shivaswamy's (hope i spelt correctly) SVCR which predicts the time until an event takes place, so these are survival time predictions.
In order to convert this to a survival curve you have to make some assumptions about the shape of the survival distribution. So for example, let's say you think the survival time is Normally distributed with N(mu, sigma). Then you can use your predicted survival time as mu and either predict or make an assumption about sigma.
Below is an example using your code and my {distr6} package, which enables quick computation of many distributions and printing and plotting of functions:
library(survival)
library(survivalsvm)
set.seed(123)
n <- nrow(veteran)
train.index <- sample(1:n, 0.7 * n, replace = FALSE)
test.index <- setdiff(1:n, train.index)
survsvm.reg <- survivalsvm(Surv(diagtime, status) ~ .,
subset = train.index, data = veteran,
type = "regression", gamma.mu = 1,
opt.meth = "quadprog", kernel = "add_kernel")
print(survsvm.reg)
pred.survsvm.reg <- predict(object = survsvm.reg,
newdata = veteran, subset = test.index)
# load distr6
library(distr6)
# create a vector of normal distributions each with
# mean as the predicted time and with variance 1
# `decorators = "ExoticStatistics"` adds survival function
v = VectorDistribution$new(distribution = "Normal",
params = data.frame(mean = as.numeric(pred.survsvm.reg$predicted)),
shared_params = list(var = 1),
decorators = "ExoticStatistics")
# survival function evaluated at times = 1:10
v$survival(1:10)
# plot survival function for first individual
plot(v[1], fun = "survival")
# plot hazard function for first individual
plot(v[1], fun = "hazard")
I am attempting to create an adjusted survival curve (from a Cox model) and would like to display this information as cumulative events.
I have attempted this:
library(survival)
data("ovarian")
library(survminer)
model<-coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian)
gplot<-ggadjustedcurves(model) ## Expected plot of adjusted survival curve
Because the "fun=" still has not been implemented in ggadjustedcurves I took the advice of a user on this page and extracted the elements into plotdata and created a new column as shown below.
plotdata<-gplot$data
plotdata%<>%
mutate(new=1-surv) ## 1-survival probability
I am new to R environment and ggplot so how can I then plot the new adjusted survival curve with the new created column and keep the theme of the original plot (contained in gplot).
Thanks!
Edit:
My current solution is as follows.
library(rms)
model<-coxph(Surv(futime, fustat) ~ age+ strata(rx), data=ovarian)
survfit(model, conf.type = "plain", conf.int = 1)
plot(survfit(model), conf.int = T,col = c(1,2), fun='event')
This achieves the survival curve I wanted however I am not sure if the confidence bars are really the standard errors (+/-1). I supplied 1 to the conf.int argument and believe this to create the standard errors in this way since conf.type is specified as plain.
How can I further customize this plot as the base graph looks rather bland! How do I get a display as close as possible to the survminer curves?
You can use the adjustedCurves package instead, which allows both plotting confidence intervals and naturally includes an option to display cumulative incidence functions. First, install it using:
devtools::install_github("https://github.com/RobinDenz1/adjustedCurves")
Now you can use:
library(adjustedCurves)
library(survival)
library(riskRegression)
# needs to be a factor
ovarian$rx <- factor(ovarian$rx)
# needs to include x=TRUE
model <- coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian, x=TRUE)
adj <- adjustedsurv(data=ovarian,
event="fustat",
ev_time="futime",
variable="rx",
method="direct",
outcome_model=model,
conf_int=TRUE)
plot(adj, cif=TRUE, conf_int=TRUE)
Which produces:
I would probably not use this method here, though. Simulation studies have shown that the cox-regression based method performs badly in small sample sizes. You might want to take a look at method="iptw" or method="aiptw" inside the adjustedCurves package instead.
My apologies if I'm missing something obvious. I've been thoroughly enjoying working with h2o in the last few days using R interface. I would like to evaluate my model, say a random forest, by plotting an ROC. The documentation seems to suggest that there is a straightforward way to do that:
Interpreting a DRF Model
By default, the following output displays:
Model parameters (hidden)
A graph of the scoring history (number of trees vs. training MSE)
A graph of the ROC curve (TPR vs. FPR)
A graph of the variable importances
...
I've also seen that in python you can apply roc function here. But I can't seem to be able to find the way to do the same in R interface. Currently I'm extracting predictions from the model using h2o.cross_validation_holdout_predictions and then use pROC package from R to plot the ROC. But I would like to be able to do it directly from the H2O model object, or, perhaps, a H2OModelMetrics object.
Many thanks!
A naive solution is to use plot() generic function to plot a H2OMetrics object:
logit_fit <- h2o.glm(colnames(training)[-1],'y',training_frame =
training.hex,validation_frame=validation.hex,family = 'binomial')
plot(h2o.performance(logit_fit),valid=T),type='roc')
This will give us a plot:
But it is hard to customize, especially to change the line type, since the type parameter is already taken as 'roc'. Also I have not found a way to plot multiple models' ROC curves together on one plot. I have come up with a method to extract true positive rate and false positive rate from the H2OMetrics object and use ggplot2 to plot the ROC curves on one plot by myself. Here is the example code(uses a lot of tidyverse syntax):
# for example I have 4 H2OModels
list(logit_fit,dt_fit,rf_fit,xgb_fit) %>%
# map a function to each element in the list
map(function(x) x %>% h2o.performance(valid=T) %>%
# from all these 'paths' in the object
.#metrics %>% .$thresholds_and_metric_scores %>%
# extracting true positive rate and false positive rate
.[c('tpr','fpr')] %>%
# add (0,0) and (1,1) for the start and end point of ROC curve
add_row(tpr=0,fpr=0,.before=T) %>%
add_row(tpr=0,fpr=0,.before=F)) %>%
# add a column of model name for future grouping in ggplot2
map2(c('Logistic Regression','Decision Tree','Random Forest','Gradient Boosting'),
function(x,y) x %>% add_column(model=y)) %>%
# reduce four data.frame to one
reduce(rbind) %>%
# plot fpr and tpr, map model to color as grouping
ggplot(aes(fpr,tpr,col=model))+
geom_line()+
geom_segment(aes(x=0,y=0,xend = 1, yend = 1),linetype = 2,col='grey')+
xlab('False Positive Rate')+
ylab('True Positive Rate')+
ggtitle('ROC Curve for Four Models')
Then the ROC curve is:
you can get the roc curve by passing the model performance metrics to H2O's plot function.
shortened code snippet which assumes you created a model, call it glm, and split your dataset into train and validation sets:
perf <- h2o.performance(glm, newdata = validation)
h2o.plot(perf)
full code snippet below:
h2o.init()
# Run GLM of CAPSULE ~ AGE + RACE + PSA + DCAPS
prostatePath = system.file("extdata", "prostate.csv", package = "h2o")
prostate.hex = h2o.importFile(path = prostatePath, destination_frame = "prostate.hex")
glm = h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), training_frame = prostate.hex, family = "binomial", nfolds = 0, alpha = 0.5, lambda_search = FALSE)
perf <- h2o.performance(glm, newdata = prostate.hex)
h2o.plot(perf)
and this will produce the following:
There is not currently a function in H2O R or Python client to plot the ROC curve directly. The roc method in Python returns the data neccessary to plot the ROC curve, but does not plot the curve itself. ROC curve plotting directly from R and Python seems like a useful thing to add, so I've created a JIRA ticket for it here: https://0xdata.atlassian.net/browse/PUBDEV-4449
The reference to the ROC curve in the docs refers to the H2O Flow GUI, which will automatically plot a ROC curve for any binary classification model in your H2O cluster. All the other items in that list are in fact available directly in R and Python, however.
If you train a model in R, you can visit the Flow interface (e.g. localhost:54321) and click on a binomial model to see it's ROC curves (training, validation and cross-validated versions). It will look like this:
Building off #Lauren's example, after you run model.performance you can extract all necessary information for ggplot from perf#metrics$thresholds_and_metric_scores. This code produces the ROC curve, but you can also add precision, recall to the selected variables for plotting the PR curve.
Here is some example code using the same model as above.
library(h2o)
library(dplyr)
library(ggplot2)
h2o.init()
# Run GLM of CAPSULE ~ AGE + RACE + PSA + DCAPS
prostatePath <- system.file("extdata", "prostate.csv", package = "h2o")
prostate.hex <- h2o.importFile(
path = prostatePath,
destination_frame = "prostate.hex"
)
glm <- h2o.glm(
y = "CAPSULE",
x = c("AGE", "RACE", "PSA", "DCAPS"),
training_frame = prostate.hex,
family = "binomial",
nfolds = 0,
alpha = 0.5,
lambda_search = FALSE
)
# Model performance
perf <- h2o.performance(glm, newdata = prostate.hex)
# Extract info for ROC curve
curve_dat <- data.frame(perf#metrics$thresholds_and_metric_scores) %>%
select(c(tpr, fpr))
# Plot ROC curve
ggplot(curve_dat, aes(x = fpr, y = tpr)) +
geom_point() +
geom_line() +
geom_segment(
aes(x = 0, y = 0, xend = 1, yend = 1),
linetype = "dotted",
color = "grey50"
) +
xlab("False Positive Rate") +
ylab("True Positive Rate") +
ggtitle("ROC Curve") +
theme_bw()
Which produces this plot:
roc_plot
I’m trying to fit and plot a Weibull model to a survival data. The data has just one covariate, cohort, which runs from 2006 to 2010. So, any ideas on what to add to the two lines of code that follows to plot the survival curve of the cohort of 2010?
library(survival)
s <- Surv(subSetCdm$dur,subSetCdm$event)
sWei <- survreg(s ~ cohort,dist='weibull',data=subSetCdm)
Accomplishing the same with the Cox PH model is rather straightforward, with the following lines. The problem is that survfit() doesn’t accept objects of type survreg.
sCox <- coxph(s ~ cohort,data=subSetCdm)
cohort <- factor(c(2010),levels=2006:2010)
sfCox <- survfit(sCox,newdata=data.frame(cohort))
plot(sfCox,col='green')
Using the data lung (from the survival package), here is what I'm trying to accomplish.
#create a Surv object
s <- with(lung,Surv(time,status))
#plot kaplan-meier estimate, per sex
fKM <- survfit(s ~ sex,data=lung)
plot(fKM)
#plot Cox PH survival curves, per sex
sCox <- coxph(s ~ as.factor(sex),data=lung)
lines(survfit(sCox,newdata=data.frame(sex=1)),col='green')
lines(survfit(sCox,newdata=data.frame(sex=2)),col='green')
#plot weibull survival curves, per sex, DOES NOT RUN
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
lines(survfit(sWei,newdata=data.frame(sex=1)),col='red')
lines(survfit(sWei,newdata=data.frame(sex=2)),col='red')
Hope this helps and I haven't made some misleading mistake:
copied from above:
#create a Surv object
s <- with(lung,Surv(time,status))
#plot kaplan-meier estimate, per sex
fKM <- survfit(s ~ sex,data=lung)
plot(fKM)
#plot Cox PH survival curves, per sex
sCox <- coxph(s ~ as.factor(sex),data=lung)
lines(survfit(sCox,newdata=data.frame(sex=1)),col='green')
lines(survfit(sCox,newdata=data.frame(sex=2)),col='green')
for Weibull, use predict, re the comment from Vincent:
#plot weibull survival curves, per sex,
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
lines(predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
lines(predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
The trick here was reversing the quantile orders for plotting vs predicting. There is likely a better way to do this, but it works here. Good luck!
An alternative option is to make use of the package flexsurv. This offers some additional functionality over the survival package - including that the parametric regression function flexsurvreg() has a nice plot method which does what you ask.
Using lung as above;
#create a Surv object
s <- with(lung,Surv(time,status))
require(flexsurv)
sWei <- flexsurvreg(s ~ as.factor(sex),dist='weibull',data=lung)
sLno <- flexsurvreg(s ~ as.factor(sex),dist='lnorm',data=lung)
plot(sWei)
lines(sLno, col="blue")
You can plot on the cumulative hazard or hazard scale using the type argument, and add confidence intervals with the ci argument.
This is just a note clarifying Tim Riffe's answer, which uses the following code:
lines(predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
lines(predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
The reason for the two mirror-image sequences, seq(.01,.99,by=.01) and seq(.99,.01,by=-.01), is because the predict() method is giving quantiles for the event distribution f(t) - that is, values of the inverse CDF of f(t) - while a survival curve is plotting 1-(CDF of f) versus t. In other words, if you plot p versus predict(p), you'll get the CDF, and if you plot 1-p versus predict(p) you'll get the survival curve, which is 1-CDF. The following code is more transparent and generalizes to arbitrary vectors of p values:
pct <- seq(.01,.99,by=.01)
lines(predict(sWei, newdata=list(sex=1),type="quantile",p=pct),1-pct,col="red")
lines(predict(sWei, newdata=list(sex=2),type="quantile",p=pct),1-pct,col="red")
In case someone wants to add a Weibull distribution to the Kaplan-Meyer curve in the ggplot2 ecosystem, we can do the following:
library(survminer)
library(tidyr)
s <- with(lung,Surv(time,status))
fKM <- survfit(s ~ sex,data=lung)
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
pred.sex1 = predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01))
pred.sex2 = predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01))
df = data.frame(y=seq(.99,.01,by=-.01), sex1=pred.sex1, sex2=pred.sex2)
df_long = gather(df, key= "sex", value="time", -y)
p = ggsurvplot(fKM, data = lung, risk.table = T)
p$plot = p$plot + geom_line(data=df_long, aes(x=time, y=y, group=sex))
In case you'd like to use the survival function itself S(t) (instead of the inverse survival function S^{-1}(p) used in other answers here) I've written a function to implement that for the case of the Weibull distribution (following the same inputs as the pec::predictSurvProb family of functions:
survreg.predictSurvProb <- function(object, newdata, times){
shape <- 1/object$scale # also equals 1/exp(fit$icoef[2])
lps <- predict(object, newdata = newdata, type = "lp")
surv <- t(sapply(lps, function(lp){
sapply(times, function(t) 1 - pweibull(t, shape = shape, scale = exp(lp)))
}))
return(surv)
}
You can then do:
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
times <- seq(min(lung$time), max(lung$time), length.out = 1000)
new_dat <- data.frame(sex = c(1,2))
surv <- survreg.predictSurvProb(sWei, newdata = new_dat, times = times)
lines(times, surv[1, ],col='red')
lines(times, surv[2, ],col='red')