I am trying to plot a survival plot and have run into an issue when trying to move my survfit function into a main function, where I can call it for different data sets. When I run the code
fit<- survfit(Surv(time, status) ~ sex, data = lung)
allsurv <- function(fit){
ggsurvplot(
fit,
pval = TRUE,
pval.coord = c(200, 0.10),
conf.int = TRUE,
xlab = "Days",
ggtheme = theme_light(),
surv.median.line = "hv",
legend.labs = c("Female","Male"),
legend.title = "",
palette = c("#8C3F4D","#3E606F")) +
scale_y_continuous(expand = c(0.02, 0.02),breaks = seq(from = 0, to = 1, by = 0.1),labels=percent) +
scale_x_continuous(expand = c(0.006, 0.006),
limits = c(0,366*12), breaks = seq(0, 4392, 100))
}
allsurv(fit)
The function is plotted normally
However when I call survfit from a function:
fit_all <- function(x){
survfit(Surv(time, status) ~ sex, data = x)
}
allsurv(fit_all(lung))
I receive an error: " Error in eval(fit$call$data) : object 'x' not found "
Any ideas to what I am doing wrong ?
Survminer includes a function surv_fit that acts as a wrapper around survfit. If you use surv_fit instead of survfit, the "call" of the returned object will include the whole data frame instead of just data = x. That works better when calling ggsurvplot inside a function:
https://www.rdocumentation.org/packages/survminer/versions/0.4.6/topics/surv_fit
allsurv <- function(fit){
ggsurvplot(
fit,
pval = TRUE,
pval.coord = c(200, 0.10),
conf.int = TRUE,
xlab = "Days",
ggtheme = theme_light(),
surv.median.line = "hv",
legend.labs = c("Female","Male"),
legend.title = "",
palette = c("#8C3F4D","#3E606F"))
}
fit_all <- function(x){
surv_fit(Surv(time, status) ~ sex, data = x)
}
allsurv(fit_all(lung))
I managed to get it to work. For anyone else with the same issue. The problem seems to be caused by what calling the function on its own or from a different function returns.
fit_all(lung)
which returns Call: survfit(formula = Surv(time, status) ~ sex, data = x)
fit
which returns Call: survfit(formula = Surv(time, status) ~ sex, data = lung)
It seems that ggsurvplot treats the data in the call as the data for the plot and when called from another function ' data = x'. It seems the way to get past that is to define the data before the survfit function in the ggsurvplot.
allsurv <- function(fit, x){
ggsurvplot(
data = x,
fit,
pval = TRUE,
pval.coord = c(200, 0.10),
conf.int = TRUE,
xlab = "Days",
ggtheme = theme_light(),
surv.median.line = "hv",
legend.labs = c("Female","Male"),
legend.title = "",
palette = c("#8C3F4D","#3E606F")) +
scale_y_continuous(expand = c(0.02, 0.02),breaks = seq(from = 0, to = 1, by = 0.1),labels=percent) +
scale_x_continuous(expand = c(0.006, 0.006),
limits = c(0,366*12), breaks = seq(0, 4392, 100))
}
allsurv(fit_all(lung), lung)
Related
Given a data frame in R with different columns that could work as dependent variables, I'm trying to create a function that receives the data frame 'df', list or vector with dependent variables 'vars', a time variable 'time' and a status variable 'status' that returns both survival results using 'survfit' and a kaplan-meier curve using ggsurvplot.
The intention is avoiding too much copying and paste code.
Take the data below as an example:
library(ggplot2)
library(survival)
library("dplyr")
df <- lung %>%
transmute(time,
status, # censoring status 1=censored, 2=dead
Age = age,
Sex = factor(sex, labels = c("Male", "Female")),
ECOG = factor(lung$ph.ecog),
`Meal Cal` = as.numeric(meal.cal))
# help(lung)
# Turn status into (0=censored, 1=dead)
df$status <- ifelse(df$status == 2, 1, 0)
I certainly can do survival analyses like this:
fit <- survfit(Surv(time, status) ~ ECOG, data = df)
ggsurvplot(fit,
pval = TRUE, pval.coord = c(750, 0.3),
conf.int = FALSE,
surv.median.line = "hv",
legend = c(0.8, 0.6),
legend.title = "",
risk.table = "absolute",
risk.table.y.text = FALSE,
xlab = "Time (days)", ylab = "Survival",
palette="jco",
title="Overall Survival", font.title = c(16, "bold", "black"),
)
However, I'd have to copy and paste everything again if I want to do the same with Sex. So I'd like to create a function in R that takes as inputs a data frame 'df', a list of dependent variables 'vars', a time variable 'time', and a status variable 'status' and returns both survival results using 'survfit' and a Kaplan-Meier curve using 'ggsurvplot', like the following:
vars <- c("ECOG", "Sex")
surv_plot_func <- function(df, vars, time, status) {
results_list <- lapply(vars, function(var, time, status) {
# Fit a survival model
fit <- survfit(Surv(as.numeric(df[[time]]), as.logical(df[[status]])) ~ as.factor(df[[var]]), data = df)
# Plot the Kaplan-Meier curve using ggsurvplot
ggsurv <- ggsurvplot(fit, pval = TRUE, conf.int = TRUE,
risk.table = TRUE, legend.title = "",
surv.median.line = "hv", xlab = "Time", ylab = "Survival Probability")
# Return the fit and ggsurv as a list
list(fit = fit, ggsurv = ggsurv)
})
# Return the list of results
results_list
}
res_list <- surv_plot_func(df, vars, "time", "status")
However, it didn't work. Any ideas?
The codes below works for me.
surv_plot_func <- function(df, vars, time, status) {
results_list <- lapply(vars, function(var, time, status){
# Creating a formula as a string
form <- paste0('Surv(time, status)~',var)
# Fit a survival model
fit <- survfit(as.formula(form), data=df)
# Plot the Kaplan-Meier curve using ggsurvplot
ggsurv <- ggsurvplot(fit, pval = TRUE, conf.int = TRUE,
risk.table = TRUE, legend.title = "",
surv.median.line = "hv", xlab = "Time", ylab = "Survival Probability")
# Return the fit and ggsurv as a list
list(fit = fit, ggsurv = ggsurv)
})
# Return the list of results
return(results_list)
}
I am ploting two survival curves in combination using ggsurvplot_combine: one for the overall survival and another one for survival by a specific variable. I would like to also show the p-values for the survival comparison of each combination in the same plot. I managed to get the p-values from pairwise_survdiff, but printing the table from $p.value on the combined survival plots has been challenging. I managed to get what I want using grid and gridExtra, but it is quite annoying to add the table in the bottom left of the survival plot (I have to add the position manually). Is there any better way to do this using survminer?
This is an example of the figure that I am attempting to generate (it does not have the overall survival though):
Here is a reprex of what I am attempting to produce:
require(survminer)
require(survival)
require(grid)
require(gridExtra)
data(myeloma)
#Create color object
mycolors1<-c('red3','blue3','green4','darkmagenta','goldenrod4','darkorange','deeppink',
'gray60','darkcyan','darkturquoise')
#Create survival plots
f1<-survfit(Surv(time,event)~1,data=myeloma)
f2<-survfit(Surv(time,event)~myeloma$chr1q21_status,data=myeloma)
fit<-list(Overall = f1, Treatment = f2)
print(ggsurvplot_combine(fit,data=myeloma,pal=c('black',mycolors1[1:nlevels(myeloma$chr1q21_status)])
,legend.title=" ",legend.labs=c('Overall',levels(myeloma$chr1q21_status))
,conf.int=F,title= 'Survival by molecular group',xlab='Time'
,font.main = 20,font.x = 15,font.y = 15,ylab='Cumulative Survival probability'
,risk.table=T,tables.col = "strata"
,risk.table.height = 0.25,ggtheme = theme_bw(),size = 0.75))
#Add pairwise comparison table for survival
pushViewport(viewport(x = 0.25, y = 0.36,just = c("left", "top"),height = 0.05, width = 0.1))
grid.draw(grid.table(symnum(pairwise_survdiff(Surv(time, event) ~ chr1q21_status, data = myeloma)$p.value
,cutpoints = c(0, 0.0001, 0.001, 0.01, 0.05, 0.1, 1)
,symbols = c("", "", "**", "", "+", "ns ")
,abbr.colnames = F, na = 'N/A')
,theme=ttheme_minimal(
core=list(bg_params = list(fill = 'white', col='black')),
colhead=list(fg_params=list(col='white',fontface=2),
bg_params = list(fill = mycolors1[1:(nlevels(myeloma$molecular_group)-1)], col=NA)),
rowhead=list(fg_params=list(col='white',fontface=2),
bg_params = list(fill = c('white',mycolors1[2:nlevels(myeloma$molecular_group)]), col=NA)))
))
This is a test code that will help you.
library(survival)
library(survminer)
data(aml)
aml$x <- as.character(aml$x)
aml[10,3] <- 'SuperMaintained'
aml[11,3] <- 'SuperMaintained'
aml[22,3] <- 'SuperMaintained'
aml[23,3] <- 'SuperMaintained'
aml$x <- factor(aml$x, levels = c('Nonmaintained','Maintained','SuperMaintained'))
fit <- survfit(Surv(time, status) ~ x, data = aml)
res=pairwise_survdiff(Surv(time, status) ~ x, data = aml)
table <- res$p.value
p1 <- ggsurvplot(fit, conf.int = FALSE, surv.median.line = c('hv'), data = aml, pval = TRUE, risk.table = FALSE)
p1$plot +
annotate(geom = "table", x = 140, y = 0.9, label = list(as.data.frame(table)))
I would like to attach labels using geom_dl to a Kaplan–Meier (KM) plot produced by ggsurvplot. It works fine if I use the plot part of ggsurvplot.
library(survival)
library(survminer)
library(directlabels)
fit <- survfit(Surv(time, status) ~ disease, data = kidney)
surv_km <- ggsurvplot(fit, risk.table = TRUE)
surv_km$plot + geom_dl(aes(label = gsub('disease=','',strata)),
method = list(dl.trans(x = x + .2), "last.points"))
However, I get an error If I use the complete surv_km
surv_km + geom_dl(aes(label = gsub('disease=','',strata)),
method = list(dl.trans(x = x + .2), "last.points"))
>Error in surv_km + geom_dl(aes(label = gsub("disease=", "", strata)), :
non-numeric argument to binary operator
In addition: Warning message:
Incompatible methods ("+.ggsurv", "+.gg") for "+"
Use %++% instead of +. See ?add_ggsurvplot for more details.
fit <- survfit(Surv(time, status) ~ disease, data = kidney)
surv_km <- ggsurvplot(fit, risk.table = TRUE, legend = "none")
surv_km %++% geom_dl(aes(label = gsub('disease=','',strata)),
method = list(dl.trans(x = x + .2), "last.points"))
To remove labels at the end of the table, we can assign surv_km[["table"]][["layers"]][[2]] to NULL, e.g.
surv_km <- surv_km %++%
geom_dl(aes(label = gsub('disease=','',strata)),
method = list(dl.trans(x = x + .2), "last.points"))
surv_km[["table"]][["layers"]][[2]]<-NULL
Finally to remove disease= in graph's and/or table's legend labels, use gsub with names(fit$strata), as so
attr(fit$strata, "names") = gsub("disease=","",attr(fit$strata, "names"))
#then repeat above steps
I have used 'predict' find a fit line for a linear model(lm) I have created. Because the lm was built on only 2 data points and needs to have a positive slope, I have forced it to go thru the origin (0,0). I have also weighted the function by the number of observations underlying each data point.
Question 1: (SOLVED -see comment by #Gregor)
Why does the predicted line lie so much closer to my second data point (B) than my first data point (A), when B has fewer underlying observations? Did I code something wrong here when weighting the model?
Question 2:
Plotting GLM (link=logit) now, but how can still I force this through 0,0? I've tried adding formula = y~0+x in several places, none of which seem to work.
M <- data.frame("rate" = c(0.4643,0.2143), "conc" = c(300,6000), "nr_dead" = c(13,3), "nr_surv" = c(15,11), "region" = c("A","B"))
M$tot_obsv <- (M$nr_dead+M$nr_surv)
M_conc <- M$conc
M_rate <- M$rate
M_tot_obsv <- M$tot_obsv
#**linear model of data, force 0,0 intercept, weighted by nr. of observations of each data point.**
M_lm <- lm(data = M, rate~0+conc, weights = tot_obsv)
#**plot line using "predict" function**
x_conc <-c(600, 6700)
y_rate <- predict(M_lm, list(conc = x_conc), weights = tot_obsv, type = 'response')
plot(x = M$conc, y = M$rate, pch = 16, ylim = c(0, 0.5), xlim = c(0,7000), xlab = "conc", ylab = "death rate")
lines(x_conc, y_rate, col = "red", lwd = 2)
#**EDIT 1:**
M_glm <- glm(cbind(nr_dead, nr_surv) ~ (0+conc), data = M, family = "binomial")
#*plot using 'predict' function*
binomial_smooth <- function(formula = (y ~ 0+x),...) {
geom_smooth(method = "glm", method.args = list(family = "binomial"), formula = (y ~ 0+x), ...)
}
tibble(x_conc = c(seq(300, 7000, 1), M$conc), y_rate = predict.glm(M_glm, list(conc = x_conc), type = "response")) %>% left_join(M, by = c('x_conc' = 'conc')) %>%
ggplot(aes(x = x_conc, y = y_rate)) + xlab("concentration") + ylab("death rate") +
geom_point(aes(y = rate, size = tot_obsv)) + binomial_smooth(formula = (y ~ 0+x)) + theme_bw()
I would like to know how to plot lift curves in MLR especially for a Benchmark experiment with multiple algorithms and tasks. Help with ROC curve plotting will also be appreciated.
Thanks.
I am not a mlr user but here is a general way.
First some data:
Two class problem
iris2 = iris[iris$Species!="setosa",]
iris2$Species = factor(iris2$Species)
1st model:
log_model = glm(Species~., data = iris2, family = "binomial")
prob = predict(log_model, iris2, type = "response") #get the logistic regression prob
2nd model:
library(e1071)
svm_model = svm(Species~., data = iris2, probability = TRUE)
prob_svm = predict(svm_model, iris2, probability = TRUE)
prob_svm = attr(prob_svm , "probabilities")[,2] #get the probability for svm model
make a data frame from classes (1/0 coding) and additional columns for predicted probabilities for each model
for_lift = data.frame(Class = as.factor(ifelse(iris2$Species == "versicolor", 1, 0)), glm = prob, svm = prob_svm)
make a lift object
library(caret)
lift_obj = lift(Class ~ glm+svm, data = for_lift)
xyplot(lift_obj, auto.key = list(columns = 2,
lines = TRUE,
points = FALSE))
You can use the same data frame to plot ROC curves
library(pROC)
plot(pROC::roc(response = for_lift$Class,
predictor = for_lift$glm,
levels=c(0, 1)),
lwd=1.5)
plot(
pROC::roc(response = for_lift$Class,
predictor = for_lift$svm ,
levels=c(0, 1)),
add=T, lty=2, lwd=1.5)
legend(0.9, 0.9, c("logistic", "svm"), lty = c(1,2))
You can also check the ROCR package: https://cran.r-project.org/web/packages/ROCR/ROCR.pdf it has methods to plot both types of plots
Additionally if you are a ggplot2 user you can use the lift_obj to plot lift and ROC curves with it also.
library(ggplot2)
p1 = ggplot(lift_obj$data)+
geom_line(aes(CumTestedPct, CumEventPct, color = liftModelVar))+
xlab("% Samples tested")+
ylab("% Samples found")+
scale_color_discrete(guide = guide_legend(title = "method"))+
geom_polygon(data = data.frame(x = c(0, lift_obj$pct, 100, 0),
y = c(0, 100, 100, 0)),
aes(x = x, y = y), alpha = 0.1)
p2 = ggplot(lift_obj$data)+
geom_line(aes(1-Sp , Sn, color = liftModelVar))+
scale_color_discrete(guide = guide_legend(title = "method"))
library(cowplot)
plot_grid(p1, p2, labels=c("lift", "ROC"))