Dynamic legend labels for ggsurvplot - r

I have a general question related to dynamic labels and titles for ggsurvplot.
Is it possible to provide dynamic legend labels and legend title to the ggsurvplot function when using it in nested list using map from package purrr?
The dataset I used is the NCCTG Lung Cancer Data lung{survival} and slightly adapted include ECOG scores below 3
The code below dynamically creates survival plots for each of the prognostic factors (sex and ECOG score, defined in the varnames). However, I can't figure out how to add dynamic title,legend.labs and legend.title to the arguments within the ggsurvplot function as they need to be provided as a character vector that should change with each varname (sex and ph.ecog).
Any help would be greatly appreciated.
Thanks!
HTJ
varnames <- c("sex", "ph.ecog")
lung <- lung %>% filter(ph.ecog<3)
res.tabs <- lung %>%
gather(key=VAR, value, varnames) %>%
nest( data = -VAR) %>% mutate(
fit = map(data, ~ coxph(Surv(time, status) ~ value, data = .x)),
survfit = map(data, ~ survfit(Surv(time, status) ~ value , data = .x, conf.type ="log")),
plots = ggsurvplot(survfit, data, conf.int = TRUE, title = "???", legend.labs = "???", legend.title = "???"))

Related

Equal sign changes rendering of legend labels in autoplot of a survfit object

I am using the survival package to make Kaplan-Mayer estimates of survival curves by group and then I plot out the said curves using packages ggfortify and survminer. All works fine except the legend labels for plotting. I want to present N sizes of groups in the legend labels. I thought that adding the N size to the grouping variable itself using paste0 was a good way to go. In my case it is easier than to use something like scale_fill_discrete("", labels = legend_labeller_for_plot).
library(dplyr)
library(ggplot2)
library(survival)
library(survminer)
library(ggfortify)
set.seed = 100
data <- data.frame(
time = rlnorm(20),
event = as.integer(runif(20) < 0.5),
group = ifelse(runif(20) > 0.5,
"group A",
"group B")
)
# Plotting survival curves without N sizes in the legend
fit <- survfit(
with(data, Surv(time, event)) ~ group,
data)
autoplot(fit)
# Adding N sizes to the data and plotting
data_new <- data %>%
group_by(group) %>% mutate(N = n()) %>%
ungroup() %>%
mutate(group_with_N = paste0(group, ", N = ", N))
fit_new <- survfit(
with(data, Surv(time, event)) ~ group_with_N,
data_new)
autoplot(fit_new)
When I try to add N sizes to the groups variable, the part with "N =" in the grouping variable disappears, i.e. the group variable isn't displayed on the legend labels as expected.
For comparison, what I expect is something like the following using Iris data:
What is more, I found that that the culprit is the equali sign =. When I remove the = sign, the legend labels correspond to the grouping variable values.
My question is, why does the equal sign cause this?
An option could be using ggsurvplot where you can specify the legend.labs so you can show your size in the legend like this:
library(dplyr)
library(ggplot2)
library(survival)
library(survminer)
library(ggfortify)
set.seed = 100
data <- data.frame(
time = rlnorm(20),
event = as.integer(runif(20) < 0.5),
group = ifelse(runif(20) > 0.5,
"group A",
"group B")
)
# Adding N sizes to the data and plotting
data_new <- data %>%
group_by(group) %>% mutate(N = n()) %>%
ungroup() %>%
mutate(group_with_N = paste0(group, ", N = ", N))
fit_new <- survfit(
with(data, Surv(time, event)) ~ group_with_N,
data_new)
p <- autoplot(fit_new)
p
# ggsurvplot
ggsurvplot(fit_new, data_new,
legend.labs = unique(sort(data_new$group_with_N)),
conf.int = TRUE)
Created on 2022-08-18 with reprex v2.0.2

Cumulative hazard curve fitting

How technically it possible to prolong the cumulative hazard curves until day 80 if in my original data I have follow-up time until 50 day? The cumulative results estimates will remain the same just that both lines are the same until day 80.
I used to create survival object object
surv = survfit(Surv(Tstart, Tstop, outcome==1)~T, data = data.long, ctype=1, id=id)
and then created a plot:
palette = c("#FF9E29", "#86AA00"),
risk.table = FALSE,
ylim=c(0,2),
xlim=c(0,70),
fun = "cumhaz")
You can transform the fit into a tibble for manual plotting using ggplot. By adding new rows at the maximal time point with the maximal value, geom_step will be extended as desired:
library(tidyverse)
library(survival)
fit <- survfit(Surv(time, status) ~ sex, data = lung)
max_time <- 3000
data <-
tibble(
cumhaz = fit$cumhaz,
stratum = {
fit$strata %>%
as.numeric() %>%
enframe() %>%
mutate(vec = name %>% map2(value, ~ rep(.x, .y))) %>%
pull(vec) %>%
simplify()
},
time = fit$time
)
data %>%
bind_rows(
data %>% group_by(stratum) %>% summarise(cumhaz = max(cumhaz), time = max_time)
) %>%
mutate(stratum = stratum %>% factor()) %>%
ggplot(aes(time, cumhaz, color = stratum)) +
geom_step() +
scale_x_continuous(limits = c(0, max_time))
Created on 2022-04-14 by the reprex package (v2.0.0)

Fail to plot by group on a phyloseq object generated by R package Divnet

I intend to plot an alpha diversity plot of my samples, which consist of three groups.
Alpha diversity is calculated by R package Divnet. It involve using phyloseq as a dependency.
The group of each sample is specified in a sam_data, column "type" in phyloseq object "df_family"
"dv" is list composed of diversity estimates and standard errors.
Here is the code I used:
dv$shannon %>% plot(df_family, color = "type", group = df_family#sam_data$type) +
xlab("sample type") +
ylab("Shannon diversity estimate (family level)") +
coord_cartesian(ylim = c(0,5))`
Here is what I get:
The samples were shown independently, instead of clustered as a group
Here is what I intend to get:
Once you already have your phyloseq object df_family, you can use the function estimate_richness from phyloseq.
You can then join the sample meta data to this data frame of alpha diversities.
Finally, you can use ggplot2 directly to customize your plot accordingly, e.g. to put different sample groups (here SampleType) at your x axis:
library(tidyverse)
library(phyloseq)
# get example phy object
otufile <- system.file("extdata", "GP_otu_table_rand_short.txt.gz", package = "phyloseq")
mapfile <- system.file("extdata", "master_map.txt", package = "phyloseq")
df_family <- import_qiime(otufile, mapfile, trefile, showProgress = FALSE)
alphadiv <-
df_family %>%
estimate_richness() %>%
as_tibble(rownames = "sample_id") %>%
left_join(df_family#sam_data %>% as_tibble(rownames = "sample_id"))
alphadiv
alphadiv %>%
ggplot(aes(x = SampleType, y = Shannon)) +
geom_boxplot()
You can also add additional sample properties in aes e.g. aes(x = SampleType, y = Shannon, color = my_group) if my_group is a sample property column with cells containing f, li, or si.

ggforest (survminer) only selected covariates

I would like to create a forest plot after Cox survival models. However, I would liketo display only some of the covariates in the graph? Does someone know if it is possible? Maybe using ggforest2?
Thanks
library(survival)
library(survminer)
model <- coxph(Surv(time, status) ~ sex + rx + adhere,
data = colon )
ggforest(model)
colon <- within(colon, {
sex <- factor(sex, labels = c("female", "male"))
differ <- factor(differ, labels = c("well", "moderate", "poor"))
extent <- factor(extent, labels = c("submuc.", "muscle", "serosa", "contig."))
})
bigmodel <-
coxph(Surv(time, status) ~ sex + rx + adhere + differ + extent + node4,
data = colon )
ggforest(bigmodel)
The current version of ggforest on my machine does not allow me to select variables to be presented in the plot. However, another package forestmodel::forest_model has covariates = which is supposed to allow users to select variables. However, the current version of forestmodel may not perform this correctly, as you can see from the following graph:
colon <- within(colon, {
sex <- factor(sex, labels = c("female", "male"))
differ <- factor(differ, labels = c("well", "moderate", "poor"))
extent <- factor(extent, labels = c("submuc.", "muscle", "serosa", "contig."))
})
bigmodel <-
coxph(Surv(time, status) ~ sex + rx + adhere + differ + extent + node4,
data = colon )
forest_model(bigmodel, covariates = c("sex", "rx"))
It might be something the original contributor could fix. At some stage, I was able to generate something like this with some minor modification of previous version of the function. However, after I reinstalled the updated package, it no longer works.
EDIT
Another approach would be flexible. It takes two steps. First, collect model information (I use broom::tidy here but you can use other methods. Second, use forestplot::forest_plotto produce the graph. Again, you can also use other Meta analysis package for this.
Let's continue with the above bigmodel
library(forestplot)
library(tidyverse)
# Save model information
df <- broom::tidy(bigmodel, exponentiate = TRUE)
# pick up the first 4 values
df1 <- df[1:4, ] %>%
transmute(
HR = round(estimate, 2),
low = conf.low,
high = conf.high)
row_names <- cbind(c("Name", "Sex", "Lev", "Lev + 5FU", "adhere"),
c("HR", df1$HR))
df1 <- rbind(rep(NA, 4), df1)
forestplot(labeltext = row_names,
df1[,c("HR", "low", "high")],
is.summary=c(FALSE, FALSE, FALSE),
zero = 1,
xlog = TRUE)
This produces the following graph. It may take a little bit more learning to generate a satisfactory graph, but you are in control, relatively.

Predicted vs. Actual plot

I'm new to R and statistics and haven't been able to figure out how one would go about plotting predicted values vs. Actual values after running a multiple linear regression. I have come across similar questions (just haven't been able to understand the code). I would greatly appreciate it if you explain the code.
This is what I have done so far:
# Attach file containing variables and responses
q <- read.csv("C:/Users/A/Documents/Design.csv")
attach(q)
# Run a linear regression
model <- lm(qo~P+P1+P4+I)
# Summary of linear regression results
summary(model)
The plot of predicted vs. actual is so I can graphically see how well my regression fits on my actual data.
It would be better if you provided a reproducible example, but here's an example I made up:
set.seed(101)
dd <- data.frame(x=rnorm(100),y=rnorm(100),
z=rnorm(100))
dd$w <- with(dd,
rnorm(100,mean=x+2*y+z,sd=0.5))
It's (much) better to use the data argument -- you should almost never use attach() ..
m <- lm(w~x+y+z,dd)
plot(predict(m),dd$w,
xlab="predicted",ylab="actual")
abline(a=0,b=1)
Besides predicted vs actual plot, you can get an additional set of plots which help you to visually assess the goodness of fit.
--- execute previous code by Ben Bolker ---
par(mfrow = c(2, 2))
plot(m)
A tidy way of doing this would be to use modelsummary::augment():
library(tidyverse)
library(cowplot)
library(modelsummary)
set.seed(101)
# Using Ben's data above:
dd <- data.frame(x=rnorm(100),y=rnorm(100),
z=rnorm(100))
dd$w <- with(dd,rnorm(100,mean=x+2*y+z,sd=0.5))
m <- lm(w~x+y+z,dd)
m %>% augment() %>%
ggplot() +
geom_point(aes(.fitted, w)) +
geom_smooth(aes(.fitted, w), method = "lm", se = FALSE, color = "lightgrey") +
labs(x = "Actual", y = "Fitted") +
theme_bw()
This will work nicely for deep nested regression lists especially.
To illustrate this, consider some nested list of regressions:
Reglist <- list()
Reglist$Reg1 <- dd %>% do(reg = lm(as.formula("w~x*y*z"), data = .)) %>% mutate( Name = "Type 1")
Reglist$Reg2 <- dd %>% do(reg = lm(as.formula("w~x+y*z"), data = .)) %>% mutate( Name = "Type 2")
Reglist$Reg3 <- dd %>% do(reg = lm(as.formula("w~x"), data = .)) %>% mutate( Name = "Type 3")
Reglist$Reg4 <- dd %>% do(reg = lm(as.formula("w~x+z"), data = .)) %>% mutate( Name = "Type 4")
Now is where the power of the above tidy plotting framework comes to life...:
Graph_Creator <- function(Reglist){
Reglist %>% pull(reg) %>% .[[1]] %>% augment() %>%
ggplot() +
geom_point(aes(.fitted, w)) +
geom_smooth(aes(.fitted, w), method = "lm", se = FALSE, color = "lightgrey") +
labs(x = "Actual", y = "Fitted",
title = paste0("Regression Type: ", Reglist$Name) ) +
theme_bw()
}
Reglist %>% map(~Graph_Creator(.)) %>%
cowplot::plot_grid(plotlist = ., ncol = 1)
Same as #Ben Bolker's solution but getting a ggplot object instead of using base R
#first generate the dd data set using the code in Ben's solution, then...
require(ggpubr)
m <- lm(w~x+y+z,dd)
ggscatter(x = "prediction",
y = "actual",
data = data.frame(prediction = predict(m),
actual = dd$w)) +
geom_abline(intercept = 0,
slope = 1)

Resources