Produce a facet with an overall group in ggsurvplot in R - r

In R, I am using ggsurvplot_facet to produce survival curves plotted for groups sex as a facet stratified by a variable ecog. However, I would like to have an overall group as well in the same facet as well. Is this possible?
ggsurvplot_add_all did not help.
Here is some example data:
library(survminer)
lung$ecog <- ifelse(lung$ph.ecog == 0, 0, 1)
fit <- surv_fit(Surv(time, status) ~ sex, data = lung)
fig_os <- ggsurvplot_facet(fit, data = lung, facet.by = 'ecog')
I need a survival curve for the whole population, independent of ecog.

Cheat a little bit, by row-binding lung to a copy of lung where in the latter ecog has been replaced with a constant.
Cheat, making a copy, setting ecog to 2, row-binding, and changing ecog in the row-bound dataset to a factor.
lung2 <- copy(lung)
lung2$ecog <- 2
lung2 <- rbind(lung,lung2)
lung2$ecog <- factor(lung2$ecog,labels = c("0", "1", "Overall"))
Then use your code above, but using lung2 as the dataset.
fit <- surv_fit(Surv(time, status) ~ sex, data = lung2)
fig_os <- ggsurvplot_facet(fit, data = lung2, facet.by = 'ecog')
Output:

Related

Survminer - include subset of variables in plot

Let's say I want to plot the survival curves using a model of the lung data, that controls for sex and a median split of the age variable (I could also control linearly for age and that would make my problem even worse).
I would like to make a plot of this model only showing the stratification between the levels of the sex factor. If I do what seems to be the standard, however, I get 4 instead of two survival curves.
library(survival)
library(survminor)
reg_lung <- lung %>% mutate(age_cat = ifelse(age > 63, "old", "young"))
lung_fit <- survfit(Surv(time, status) ~ age_cat + sex, data = reg_lung)
ggsurvplot(lung_fit, data = reg_lung)
resulting survival plot
That is to say, I would like to the difference sex makes while holding the influence of age fixed (either as factor old/young or linearly).
You can fit your model with coxph and define sex as strata:
lung_fit <- coxph(Surv(time, status) ~ age_cat + strata(sex), data = reg_lung)
ggsurvplot(survfit(lung_fit), data = reg_lung)

(R) Adding Confidence Intervals To Plots

I am using R. I am following this tutorial over here (https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/ ) and I am trying to adapt the code for a similar problem.
In this tutorial, a statistical model is developed on a dataset and then this statistical model is used to predict 3 news observations. We then plot the results for these 3 observations:
#load libraries
library(survival)
library(dplyr)
library(ranger)
library(data.table)
library(ggplot2)
#use the built in "lung" data set
#remove missing values (dataset is called "a")
a = na.omit(lung)
#create id variable
a$ID <- seq_along(a[,1])
#create test set with only the first 3 rows
new = a[1:3,]
#create a training set by removing first three rows
a = a[-c(1:3),]
#fit survival model (random survival forest)
r_fit <- ranger(Surv(time,status) ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss, data = a, mtry = 4, importance = "permutation", splitrule = "extratrees", verbose = TRUE)
#create new intermediate variables required for the survival curves
death_times <- r_fit$unique.death.times
surv_prob <-data.frame(r_fit$survival)
avg_prob <- sapply(surv_prob, mean)
#use survival model to produce estimated survival curves for the first three observations
pred <- predict(r_fit, new, type = 'response')$survival
pred <- data.table(pred)
colnames(pred) <- as.character(r_fit$unique.death.times)
#plot the results for these 3 patients
plot(r_fit$unique.death.times, pred[1,], type = "l", col = "red")
lines(r_fit$unique.death.times, r_fit$survival[2,], type = "l", col = "green")
lines(r_fit$unique.death.times, r_fit$survival[3,], type = "l", col = "blue")
From here, I would like to try an add confidence interval (confidence regions) to each of these 3 curves, so that they look something like this:
I found a previous stackoverflow post (survfit() Shade 95% confidence interval survival plot ) that shows how to do something similar, but I am not sure how to extend the results from this post to each individual observation.
Does anyone know if there is a direct way to add these confidence intervals?
Thanks
If you create your plot using ggplot, you can use the geom_ribbon function to draw confidence intervals as follows:
ggplot(data=...)+
geom_line(aes(x=..., y=...),color=...)+
geom_ribbon(aes(x=.. ,ymin =.., ymax =..), fill=.. , alpha =.. )+
geom_line(aes(x=..., y=...),color=...)+
geom_ribbon(aes(x=.. ,ymin =.., ymax =..), fill=.. , alpha =.. )
You can put + after geom_line and repeat the same steps for each observation.
You can also check:
Having trouble plotting multiple data sets and their confidence intervals on the same GGplot. Data Frame included and
https://bookdown.org/ripberjt/labbook/appendix-guide-to-data-visualization.html

R: Plot Individual Predictions

I am using the R programming language. I am trying to follow this tutorial :https://rdrr.io/cran/randomForestSRC/man/plot.competing.risk.rfsrc.html
This tutorial shows how to use the "survival random forest" algorithm - an algorithm used to analyze survival data. In this example, the "follic" data set is used, the survival random forest algorithm is used to analyze the instant hazard of observation experiencing "status 1" vs "status 2" (this is called "competing risks).
In the code below, the survival random forest model is trained on the follic data set using all observations except the last two observations. Then, this model is used to predict the hazards of the last two observations:
#load library
library(randomForestSRC)
#load data
data(follic, package = "randomForestSRC")
#train model on all observations except the last 2 observations
follic.obj <- rfsrc(Surv(time, status) ~ ., follic[c(1:539),], nsplit = 3, ntree = 100)
#use model to predict the last two observations
f <- predict(follic.obj, follic[540:541, ])
#plot individual curves - does not work
plot.competing.risk(f)
However, this seems to produce the average hazards for the last two observations experiencing "status 1 vs status 2".
Is there a way to plot the individual hazards of the first observation and the second observation?
Thanks
EDIT1:
I know how to do this for other functions in this package, e.g. here you can plot these curves for 7 observations at once:
data(veteran, package = "randomForestSRC")
plot.survival(rfsrc(Surv(time, status)~ ., veteran), cens.model = "rfsrc")
## pbc data
data(pbc, package = "randomForestSRC")
pbc.obj <- rfsrc(Surv(days, status) ~ ., pbc)
## use subset to focus on specific individuals
plot.survival(pbc.obj, subset = c(3, 10))
This example seems to show the predicted survival curves for 7 observations (plus the confidence intervals - the red line is the average) at once. But I still do not know how to do this for the "plot.competing.risk" function.
EDIT2:
I think there might be an indirect way to solve this - you can predict each observation individually:
#use model to predict the last two observations individually
f1 <- predict(follic.obj, follic[540, ])
f2 <- predict(follic.obj, follic[541, ])
#plot individual curves
plot.competing.risk(f1)
plot.competing.risk(f2)
But I was hoping there was a more straightforward way to do this. Does anyone know how?
One possible way is to modify the function plot.competing.risk for individual line, and plot over a for loop for overlapping individual lines, as shown below.
#use model to predict the last three observations
f <- predict(follic.obj, follic[539:541, ])
x <- f
par(mfrow = c(2, 2))
for (k in 1:3) { #k for type of plot
for (i in 1:dim(x$chf)[1]) { #i for all individuals in x
#cschf <- apply(x$chf, c(2, 3), mean, na.rm = TRUE) #original group mean
cschf = x$chf[i,,] #individual values
#cif <- apply(x$cif, c(2, 3), mean, na.rm = TRUE) #original group mean
cif = x$cif[i,,] #individual values
cpc <- do.call(cbind, lapply(1:ncol(cif), function(j) {
cif[, j]/(1 - rowSums(cif[, -j, drop = FALSE]))
}))
if (k==1)
{matx = cschf
range = range(x$chf)
}
if (k==2)
{matx = cif
range = range(x$cif)
}
if (k==3)
{matx = cpc
range = c(0,1) #manually assign, for now
}
ylab = c("Cause-Specific CHF","Probability (%)","Probability (%)")[k]
matplot(x$time.interest, matx, type='l', lty=1, lwd=3, col=1:2,
add=ifelse(i==1,F,T), ylim=range, xlab="Time", ylab=ylab) #ADD tag for overlapping individual lines
}
legend <- paste(c("CSCHF","CIF","CPC")[k], 1:2, " ")
legend("bottomright", legend = legend, col = (1:2), lty = 1, lwd = 3)
}

How to plot survival relative to general population with age on the X-axis (left-truncated data)?

I am trying to compare the survival in my study cohort with the survival in the Dutch general population (matched for age and sex). I created a rate table of the Dutch population.
library(relsurv)
setwd("")
nldpop <- transrate.hmd("mltper_1x1.txt","fltper_1x1.txt")
Then, I wanted to create a plot of the survival of my cohort (observed) and the survival of the population (expected) with age on the X-axis. However, the 'survexp' function does not seem to support a (start,stop,event)-format. Only with the normal (futime, event)-format it works, see below, but then I have follow-up time on the X-axis. Does anyone know how to get the age on the X-axis instead of follow-up time?
# Observed and expected survival with time on X-axis
fit <- survfit(Surv(futime, event)~1)
efit <- survexp(futime ~ 1, rmap = list(year=(date_entry), age=(age_entry), sex=(sex)),
ratetable=nldpop)
plot(fit)
lines(efit)
You didn't provide your example data, so i used survival::mgus data for this. Your problem may be due to incorrectly specifying variable names in the rmap option. See plot here
library(relsurv)
nldpop <- transrate.hmd("mltper_1x1.txt", "fltper_1x1.txt")
mgus2 <- mgus %>% mutate(date_year = dxyr + 1900)
fit <- survfit(Surv(futime, death) ~ 1, data = mgus2)
efit <- survexp(Surv(futime, death) ~ 1, data = mgus2,
ratetable = nldpop, rmap = list(age = age*365.25, year = date_year, sex = sex))
plot(fit)
lines(efit)

Plotting predicted survival curves for continuous covariates in ggplot

How can I plot survival curves for representative values of a continuous covariate in a cox proportional hazards model? Specifically, I would like to do this in ggplot using a "survfit.cox" "survfit" object.
This may seem like a question that has already been answered, but I have searched through everything in SO with the terms 'survfit' and 'newdata' (plus many other search terms). This is the thread that comes closest to answering my question so far: Plot Kaplan-Meier for Cox regression
In keeping with the reproducible example offered in one of the answers to that post:
url <- "http://socserv.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
df <- read.table(url, header = TRUE)
library(dplyr)
library(ggplot2)
library(survival)
library(magrittr)
library(broom)
# Identifying the 25th and 75th percentiles for prio (continuous covariate)
summary(df$prio)
# Cox proportional hazards model with other covariates
# 'prio' is our explanatory variable of interest
m1 <- coxph(Surv(week, arrest) ~
fin + age + race + prio,
data = df)
# Creating new df to get survival predictions
# Want separate curves for the the different 'fin' and 'race'
# groups as well as the 25th and 75th percentile of prio
newdf <- df %$%
expand.grid(fin = levels(fin),
age = 30,
race = levels(race),
prio = c(1,4))
# Obtain the fitted survival curve, then tidy
# into a dataframe that can be used in ggplot
survcurv <- survfit(m1, newdata = newdf) %>%
tidy()
The problem is, that once I have this dataframe called survcurv, I cannot tell which of the 'estimate' variables belongs to which pattern because none of the original variables are retained. For example, which of the 'estimate' variables represents the fitted curve for 30 year old, race = 'other', prio = '4', fin = 'no'?
In all other examples i've seen, usually one puts the survfit object into a generic plot() function and does not add a legend. I want to use ggplot and add a legend for each of the predicted curves.
In my own dataset, the model is a lot more complex and there are a lot more curves than I show here, so as you can imagine seeing 40 different 'estimate.1'..'estimate.40' variables makes it hard to understand what is what.
Thanks for providing a well phrased question and a good example. I'm a little surpirsed that tidy does a relatively poor job here of creating sensible output. Please see below for my attempt at creating some plottable data:
library(tidyr)
newdf$group <- as.character(1:nrow(newdf))
survcurv <- survfit(m1, newdata = newdf) %>%
tidy() %>%
gather('key', 'value', -time, -n.risk, -n.event, -n.censor) %>%
mutate(group = substr(key, nchar(key), nchar(key)),
key = substr(key, 1, nchar(key) - 2)) %>%
left_join(newdf, 'group') %>%
spread(key, value)
And the create a plot (perhaps you'd like to use geom_step instead, but there is not step shaped ribbon, unfortunately):
ggplot(survcurv, aes(x = time, y = estimate, ymin = conf.low, ymax = conf.high,
col = race, fill = race)) +
geom_line(size = 1) +
geom_ribbon(alpha = 0.2, col = NA) +
facet_grid(prio ~ fin)
Try defining your survcurv like this:
survcurv <-
lapply(1:nrow(newdf),
function(x, m1, newdata){
cbind(newdata[x, ], survfit(m1, newdata[x, ]) %>% tidy)
},
m1,
newdf) %>%
bind_rows()
This will include all of the predictor values as columns with the predicted estimates.

Resources