Solved: packeges needed to be updated
Please find My Data below.
I have produced a combined Kaplan-Meier plot of progression-free and overall survival.
I have used the following script
pfs <- survfit(Surv(resp.time, response) ~ 1, conf.type="log", data=w)
os <- survfit(Surv(Follow.up.death, Death) ~ 1, conf.type="log", data=w)
fit <- list(PFS = pfs, OS = os)
ggsurvplot(fit, data = w, combine = TRUE,
risk.table = TRUE,
conf.int = TRUE,
conf.int.style = "ribbon",
censor = TRUE,
tables.theme = theme,
ggtheme = theme,
xlim = c(0,36),
ylim = c(0.25,1),
alpha=0.8,
size=0.7,
conf.int.alpha=c(0.1),
break.x.by = 6,
palette = c("#1C73C2","red"))
Which gave
As you can see, I have designed my own "theme" given by:
theme <- theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_line(colour = "gray98"),
panel.grid.minor = element_line(colour = "gray98"),
panel.border = element_blank(),
panel.background = element_blank())
It seems that when I print the plot, theme is only added to table and not the plot itself although ggtheme=theme.
Can I add the theme somehow to the plot as in the table?
A sample of My Data
w <- structure(list(WHO = c(1L, 3L, 2L, 2L, 2L, 3L, 2L, 3L, 1L, 2L,
3L, 3L, 3L, 1L, 2L, 1L, 2L, 3L, 3L, 1L, 2L, 1L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 4L, 4L, 1L, 4L, 1L, 2L, 1L, 4L, 1L, 4L, 4L, 4L,
4L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 2L, 1L, 2L, 2L, 4L, 4L,
4L, 2L, 4L, 1L, 4L, 4L, 2L, 4L, 4L, 3L, 4L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 2L, 2L, 3L, 3L, 3L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L
), Death = c(0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), Follow.up.death = c(18,
2, 14, 17, 31, 4, 20, 15, 12, 19, 10, 17, 27, 22, 3, 43, 24,
14, 13, 5, 12, 137, 22, 87, 48, 24, 72, 32, 14, 83, 68, 56, 57,
18, 16, 70, 1.9, 69.2, 126.3, 41.6, 17.9, 1.3, 87.4, 4.4, 137.4,
17.5, 95.8, 65.2, 14.8, 98.5, 16.6, 74.9, 10.3, 43.4, 32.5, 4.8,
7.3, 107.8, 6.8, 18.3, 33, 25.2, 49.2, 15.9, 1.2, 42.7, 1, 9,
1.8, 15.6, 8.9, 15, 16.4, 7.7, 75.5, 12.2, 54.8, 22.2, 9.7, 14.3,
5.2, 64.5, 21.8, 0.2, 7.3, 18.7, 5.1, 17.3, 27.4, 16, 24.2, 9.7,
8.2, 5.7, 41.8, 10.6, 22.8, 4.8, 6, 4, 50, 21, 30, 5, 11, 12),
response = c(0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L,
1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L), resp.time = c(18, 2, 13, 17, 22, 2,
6, 5, 12, 8, 3, 2, 1, 21, 2, 43, 4, 2, 4, 5, 0.1, 137, 4,
87, 17, 24, 72, 19, 14, 83, 68, 56, 57, 18, 14, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 15.6, 8.9, 15, 16.4, 7.7, 75.5, 3, 54.8, 22.2, 12, 14.3,
6, 12, 21.8, 6, 3, 12, 3, 6, 3, 3, 12, 9.7, 3, 3, 12, 3,
6, 3, 6, 4, 50, 21, 30, 5, 11, 12)), .Names = c("WHO", "Death",
"Follow.up.death", "response", "resp.time"), class = "data.frame", row.names = c(NA,
-106L))
I think the major and minor gridlines are there, it's just incredibly faint and/or the something in the export settings means the exported image isn't detailed enough to display them.
With your code, modifying some of the parameters:
theme <- theme(axis.line = element_line(colour = "green"),
panel.grid.major = element_line(colour = "red"),
panel.grid.minor = element_line(colour = "blue"),
panel.border = element_blank(),
panel.background = element_blank())
With:
theme <- theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_line(colour = "gray98"),
panel.grid.minor = element_line(colour = "gray98"),
panel.border = element_blank(),
panel.background = element_rect(fill = "#BFD5E3", colour = "#6D9EC1"))
That is really weird.
This is what I get
theme <- theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_line(colour = "red"),
panel.grid.minor = element_line(colour = "red"),
panel.border = element_blank(),
panel.background = element_blank())
But I do get the following: Warning messages:
1: Removed 1 rows containing missing values (geom_path).
2: Removed 1 rows containing missing values (geom_path).
Related
I started with this answer as a base for my ggsurvplot graph, and I'm using their reproducible code for my question as well.
My problem is I can't figure out how to change the font size for the cumulative event table. Based on this documentation, it looks like I should be able to specify cumevents.fontsize = x, but no matter what value I put in for that, the font size does not change. I'm trying to set it to teh same size as the risk.table.fontsize, but you can see that they are different in the graph below.
The graph was computed with this
library(survival)
library(survminer)
library(ggplot2)
fit <- survfit(Surv(p$time.recur.months, p$recurrence) ~ p$simpson.grade, conf.type="log", data=p)
ggsurvplot(
fit,
data = p,
risk.table = TRUE,
risk.table.fontsize = 2,
cumevents = TRUE,
cumevents.fontsize =2,
pval = TRUE,
pval.coord = c(0, 0.25),
conf.int = F,
legend.labs=c("Simpson Grade 1" ,"Simpson Grade 2", "Simpson Grade 3",
"Simpson Grade 4"),
size=c(0.7,0.7,0.7,0.7),
xlim = c(0,100),
alpha=c(0.7),
break.time.by = 10,
xlab="Time in months",
#ylab="Survival probability",
ggtheme = theme_gray(),
risk.table.y.text.col = T,
risk.table.y.text = TRUE,
ylim=c(0,0.5),
palette="Set1"
)
My Data
p <- structure(list(recurrence = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L), time.recur.months = c(NA, NA,
NA, NA, NA, NA, 92L, NA, NA, NA, 74L, NA, NA, NA, 2L, 8L, NA,
NA, NA, NA, 58L, NA, NA, NA, NA, NA, 3L, NA, 4L, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 39L, NA, NA, NA, NA, 15L, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 12L, 56L, 57L, NA, NA, 49L, 17L,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 5L,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 9L, NA,
89L, NA, NA, NA, 8L, 6L, 8L, 4L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 60L, NA, NA, 38L, NA, NA, NA, NA, NA, 90L,
NA, 58L, 54L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 53L, NA, NA, 124L, NA, NA,
NA, NA, NA, NA, 7L, NA), simpson.grade = c(3L, 1L, 1L, 2L, 4L,
1L, 1L, 1L, 2L, 1L, 4L, 1L, 1L, 2L, 1L, 2L, 1L, 4L, 2L, 3L, 2L,
1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 4L, 3L, 1L, 1L, 4L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 1L,
3L, 4L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 4L, 1L, 1L, 1L, 4L, 1L, 1L,
1L, 2L, 1L, 2L, 4L, 4L, 1L, 4L, 4L, 1L, 2L, 1L, 1L, 4L, 4L, 4L,
4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 1L, 2L, 1L, 4L, 1L, 1L, 4L,
4L, 1L, 3L, 1L, 1L, 1L, 3L, 2L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 3L, 1L, 1L,
1L, 4L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 2L, 2L, 2L, 4L, 1L, 4L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 1L)), .Names = c("recurrence",
"time.recur.months", "simpson.grade"), class = "data.frame", row.names
= c(NA,
-176L))
I can't see the documentation for a cumevents.fontsize argument, but a straightforward way to change it is to store the plot and change it directly:
mygg <- ggsurvplot(
fit,
data = p,
risk.table = TRUE,
risk.table.fontsize = 2,
cumevents = TRUE,
cumevents.fontsize =2,
pval = TRUE,
pval.coord = c(0, 0.25),
conf.int = F,
legend.labs=c("Simpson Grade 1" ,"Simpson Grade 2", "Simpson Grade 3",
"Simpson Grade 4"),
size=c(0.7,0.7,0.7,0.7),
xlim = c(0,100),
alpha=c(0.7),
break.time.by = 10,
xlab="Time in months",
#ylab="Survival probability",
ggtheme = theme_gray(),
risk.table.y.text.col = T,
risk.table.y.text = TRUE,
ylim=c(0,0.5),
palette="Set1"
)
So now you can do:
mygg$cumevents$layers[[1]]$aes_params$size <- 2
mygg
or
mygg$cumevents$layers[[1]]$aes_params$size <- 8
mygg
Here is a suggestion how to get a clean table: See here https://github.com/kassambara/survminer/issues/117
p1 <- ggsurvplot(
fit,
data = p,
risk.table = TRUE,
cumevents = TRUE,
risk.table.title = "No. at Risk",
cumevents.title = "Cumulative No. of Events",
risk.table.height = 0.15, cumevents.height = 0.15,
risk.table.y.text = TRUE,
cumevents.y.text = TRUE,
pval = TRUE,
pval.coord = c(0, 0.25),
conf.int = F,
legend.labs=c("Simpson Grade 1" ,"Simpson Grade 2", "Simpson Grade 3", "Simpson Grade 4"),
size=c(0.7,0.7,0.7,0.7),
xlim = c(0,100),
alpha=c(0.7),
break.time.by = 10,
xlab="Time in months",
ylab="Survival probability",
ggtheme = theme_gray(),
risk.table.y.text.col = T,
# risk.table.y.text = TRUE,
ylim=c(0,0.5),
palette="Set1"
)
p1$table <- p1$table + theme_cleantable()
p1$cumevents <- p1$cumevents + theme_cleantable()
p1
I am trying to plot a median time denoted ee$rfs per ee$Ki67, which is marker of many cells that proliferates in a tumor sample, ie. a continuous covariate too.
I have attached my data ee below. I am searching for a solution in either dplyr or ggplot. Obviously, I have sought for help, such as here, but without luck.
My current plot:
With the following
ggplot(ee, (aes(x=Ki67,y=rfs))) +
geom_point(aes(color=as.factor(WHO)),size=6,shape=20,alpha=0.5) +
facet_wrap(.~EOR)
I have tried variations of mutate, group_by, filter and geom_line. I tried geom_smooth but I am concerned that this draws the best fit (?) and not the median.
ee <- structure(list(rfs = c(26.4, 84, 42, 13.2, 18, 33.6, 39.6, 9.6,
16.8, 19.2, 10.8, 7.2, 10.8, 76.8, 58.8, 31.2, 18, 182.4, 20.4,
13.2, 8.4, 2.4, 123.6, 60, 100.8, 82.8, 12, 60, 18, 29.8, 68.3,
27.2, 18.7, 64.9, 6.5, 50.3, 46.4, 29.9, 31.4, 42.7, 31.1, 98.1,
80.9, 24.1, 49.2, 12.2, 20.5, 62.8, 9, 69, 30, 91.79, 8.57, 60.88,
11.5, 56.87, 49.05, 16.95, 4.5, 8.74, 60.06, 37.85, 90.12, 123.76,
47.41, 55.92, 3.09, 27.34, 4.99, 28.06, 26.71, 23.03, 6.34, 79.34,
2.5, 19.32, 9.23, 2.6, 4.34, 45.9, 29.34, 8.58, 29.41, 30.72,
15.97, 37.06, 17.05, 14.29, 5.95, 3.42, 60.58, 19.81, 72.91,
16.99, 7.29, 74.32, 3.35, 39.95, 4.4, 15.44, 2.5, 28.32, 40.15,
57.69, 27.86, 21.59, 10.09, 8.18, 21.59, 3.19, 3.12, 8.25, 14,
14, 2, 23, 15, 9, 9, 28, 14, 23, 21, 26, 24, 63, 25, 34, 26.83333333,
32.4, 28.76666667, 32.93333333, 32.16666667, 10.06666667, 46.66666667,
58.06666667, 29.06666667, 30.33333333, 26.56666667, 24.23333333,
36.5, 31.73333333, 5.733333333, 44.16666667, 46.93333333, 48.5,
64.7, 37.16666667, 21.56666667, 14.8, 53.83333333, 59.06666667,
8.7, 13.43333333, 12.56666667, 65.73333333, 54.83333333, 30.63333333,
5, 65, 7, 12, 14, 6, 15, 36, 99, 16, 87, 6, 33, 3, 3, 11, 24,
24, 15, 10, 28, 18, 14, 29, 20, 12, 42, 31, 14, 18, 29, 39, 62,
62, 46), Ki67 = c(25, 15, 8, 15, 18, 5, 2, 18, 6, 12, 12, 13,
13, 15, 20, 3, 30, 10, 18, 20, 7, 17, 5, 3, 20, 5, 20, 10, 2,
5, 4, 7, 8, 12, 40, 17, 3, 5, 20, 5, 22, 6, 6, 18, 15, 12, 15,
5, 15, 15, 3, 4, 10, 5, 2, 4, 3, 5, 7, 7, 4, 2, 4, 3, 20, 15,
25, 20, 10, 15, 15, 8, 15, 8, 8, 10, 22, 18, 50, 30, 30, 45,
50, 30, 8, 25, 25, 10, 25, 20, 15, 10, 8, 55, 10, 10, 10, 20,
30, 5, 20, 8, 30, 10, 15, 25, 30, 38, 15, 30, 25, 15, 5, 8, 35,
9, 14, 2, 1, 1, 20, 30, 2, 8, 2, 16, 20, 23, 4.5, 2.2, 9.43,
8.95, 6.47, 1.81, 7.27, 12.4, 7.97, 21.99, 8.98, 17.3, 8, 15,
15, 20, 6, 5, 12.5, 3, 20, 20, 11.5, 2.66, 14.7, 9.13, 5, 5,
12, 11, 2, 8, 20, 50, 10, 15, 30, 8, 10, 20, 10, 10, 30, 10,
10, 13, 10, 15, 10, 10, 40, 10, 5, 15, 15, 15, 25, 15, 30, 30,
8, 30, 15, 20, 13), EOR = c(1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L,
1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L,
0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L,
0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L), WHO = c(2L, 2L, 2L, 3L, 3L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 3L, 1L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-193L))
If you need to know how the median of rfs varies with Ki67 (instead of the mean in linear regression), you need to use a quantile regression.
Luckily this is implemented in ggplot2, I first make a plot without the facet to show that the regression line is very similar for the two facets:
library(quantreg)
library(ggplot2)
ggplot(ee,aes(x=Ki67,y=rfs)) +
geom_point(aes(color=as.factor(WHO)),alpha=0.5) +
geom_quantile(aes(linetype=factor(EOR)),quantiles = 0.5,col="black")
And having the facet (you don't need to specify group):
ggplot(ee,aes(x=Ki67,y=rfs)) +
geom_point(aes(color=as.factor(WHO)),alpha=0.5) +
geom_quantile(quantiles = 0.5,lty=8,col="black")+
facet_wrap(.~EOR)
Or you can derive the predictions per group, and plot:
func = function(df,X,id){
fit = rq(rfs ~ Ki67,data=df)
data.frame(Ki67=X,rfs=predict(fit,data.frame(Ki67=X)))
}
Xrange= 1:max(ee$Ki67)
pred = ee %>% group_by(EOR) %>% group_map(~func(.x,Xrange)) %>% bind_rows()
pred$EOR = rep(unique(ee$EOR),each=length(Xrange))
ggplot(ee,aes(x=Ki67,y=rfs)) +
geom_point(aes(color=as.factor(WHO)),alpha=0.5) +
geom_line(data=pred)+
facet_wrap(.~EOR)
If you just want to plot the median rfs value by WHO, Ki67, and EOR, then you can use group_by and summarise, and feed the summarised data to ggplot.
library(tidyverse)
ee %>%
group_by(WHO, Ki67, EOR) %>%
summarise(rfs = median(rfs)) %>%
ggplot(aes(x = Ki67, y = rfs, color = factor(WHO))) +
geom_point() +
facet_wrap( ~ EOR)
I know that there are countless of questions on how to combine plots. However, I have not found a solution that works for my request.
Please, find a Data sample below.
I have four plots, which all are produced by the same script (but loaded from different data). I cannot get it to work with par() when using ggpar for my ggsurvplot.
pfs <- survfit(Surv(resp.time, response) ~ 1, data=w)
os <- survfit(Surv(Follow.up.death, Death) ~ 1, data=w)
fit <- list(PFS = pfs, OS = os)
j <- ggsurvplot(fit, data = w, combine = TRUE,
risk.table = TRUE,
conf.int = TRUE,
conf.int.style = "ribbon",
censor = TRUE,
tables.theme = theme,
ggtheme = theme,
xlim = c(0,24),
ylim = c(0.4,1),
legend.labs=c("PFS (All)","OS (All)"),
alpha=0.8,
size=0.7,
conf.int.alpha=c(0.1),
xlab="Months",
break.x.by = 3,
surv.scale="percent",
palette = c("#1C73C2","red"))
ggpar(j, font.x = c(11, "bold", "black"), font.y = c(11, "bold", "black"),font.tickslab = c(11), font.legend = c(12))
So, I produce four different ggpar-plots and I would like the presented graphically like the demonstrated below
I have tried
ggarrange(aa, bb, cc, dd,
labels = c("A", "B", "C", "D"),
ncol = 2, nrow = 2)
When aa <- ggpar()1, bb <- ggpar()2, cc <- ggpar()3 and dd <- ggpar()4
But I receive this warning:
Argument needs to be of class "ggplot", "gtable", "grob",
"recordedplot", or a function that plots to an R graphicsdevice when
called, but is a ggsurvplotggsurvlist
Here is a sample of my data
# Data sample
w <- structure(list(resp.time = c(18, 2, 13, 17, 22, 2, 6, 5, 12,
8, 3, 2, 1, 21, 2, 43, 4, 2, 4, 5, 0.1, 137, 4, 87, 17, 24, 72,
19, 14, 83, 68, 56, 57, 18, 14, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 15.6, 8.9, 15, 16.4,
7.7, 75.5, 3, 54.8, 22.2, 12, 14.3, 6, 12, 21.8, 6, 3, 12, 3,
6, 3, 3, 12, 9.7, 3, 3, 12, 3, 6, 3, 6, 4, 50, 21, 30, 5, 11,
12, 4, 18, 6, NA, 3), response = c(0L, 1L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 1L, NA, 1L), Death = c(0L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L,
1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, NA, 1L), Follow.up.death = c(18,
2, 14, 17, 31, 4, 20, 15, 12, 19, 10, 17, 27, 22, 3, 43, 24,
14, 13, 5, 12, 137, 22, 87, 48, 24, 72, 32, 14, 83, 68, 56, 57,
18, 16, 70, 1.9, 69.2, 126.3, 41.6, 17.9, 1.3, 87.4, 4.4, 137.4,
17.5, 95.8, 65.2, 14.8, 98.5, 16.6, 74.9, 10.3, 43.4, 32.5, 4.8,
7.3, 107.8, 6.8, 18.3, 33, 25.2, 49.2, 15.9, 1.2, 42.7, 1, 9,
1.8, 15.6, 8.9, 15, 16.4, 7.7, 75.5, 12.2, 54.8, 22.2, 9.7, 14.3,
5.2, 64.5, 21.8, 0.2, 7.3, 18.7, 5.1, 17.3, 27.4, 16, 24.2, 9.7,
8.2, 5.7, 41.8, 10.6, 22.8, 4.8, 6, 4, 50, 21, 30, 5, 11, 12,
4, 18, 6, NA, 3)), class = "data.frame", row.names = c(NA, -111L
))
Thank you in advance.
You can use ggpubr::ggarrange. However, you first need to "build" the plots, i.e., create grobs from them.
library(survival)
library(survminer)
pfs <- survfit(Surv(resp.time, response) ~ 1, data=w)
os <- survfit(Surv(Follow.up.death, Death) ~ 1, data=w)
fit <- list(PFS = pfs, OS = os)
j <- ggsurvplot(fit, data = w, combine = TRUE,
risk.table = TRUE,
conf.int = TRUE,
conf.int.style = "ribbon",
censor = TRUE,
#tables.theme = theme,
#ggtheme = theme,
xlim = c(0,24),
ylim = c(0.4,1),
legend.labs=c("PFS (All)","OS (All)"),
alpha=0.8,
size=0.7,
conf.int.alpha=c(0.1),
xlab="Months",
break.x.by = 3,
surv.scale="percent",
palette = c("#1C73C2","red"))
gg <- survminer:::.build_ggsurvplot(ggpar(j, font.x = c(11, "bold", "black"), font.y = c(11, "bold", "black"),font.tickslab = c(11), font.legend = c(12)))
library(ggpubr)
ggarrange(gg, gg, gg, gg, ncol = 2, nrow = 2)
Please find My Data below.
I wish to produce a combined boxplot as this one:
I have four different patient categories in w$WHO==1,2,3,4 each corresponding to WHO-I, WHO-II, WHO-III and Unknown tumors.
I wish to produce two boxplots per w$WHO
The first boxplot should show time to death as in w$Follow.up.death[w$Death==1]. I would like to color/outline + fill to be blue but the fill should have an alpha=0.2.
The second boxplot should show time to progression as in w$resp.time[w$response==1]. I would like to color/outline + fill to be red but the fill should have an alpha=0.2.
w$Death and w$response does not have the same length.
The corresponding legend should read "Overall survival" and "Progression-free survival".
I tried several manuals but I can't figure how to do this specifically - I have only managed to produce single plots.
Any ideas how to solve this??
A sample of My data
w <- structure(list(WHO = c(1L, 3L, 2L, 2L, 2L, 3L, 2L, 3L, 1L, 2L,
3L, 3L, 3L, 1L, 2L, 1L, 2L, 3L, 3L, 1L, 2L, 1L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 4L, 4L, 1L, 4L, 1L, 2L, 1L, 4L, 1L, 4L, 4L, 4L,
4L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 2L, 1L, 2L, 2L, 4L, 4L,
4L, 2L, 4L, 1L, 4L, 4L, 2L, 4L, 4L, 3L, 4L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 2L, 2L, 3L, 3L, 3L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L
), Death = c(0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), Follow.up.death = c(18,
2, 14, 17, 31, 4, 20, 15, 12, 19, 10, 17, 27, 22, 3, 43, 24,
14, 13, 5, 12, 137, 22, 87, 48, 24, 72, 32, 14, 83, 68, 56, 57,
18, 16, 70, 1.9, 69.2, 126.3, 41.6, 17.9, 1.3, 87.4, 4.4, 137.4,
17.5, 95.8, 65.2, 14.8, 98.5, 16.6, 74.9, 10.3, 43.4, 32.5, 4.8,
7.3, 107.8, 6.8, 18.3, 33, 25.2, 49.2, 15.9, 1.2, 42.7, 1, 9,
1.8, 15.6, 8.9, 15, 16.4, 7.7, 75.5, 12.2, 54.8, 22.2, 9.7, 14.3,
5.2, 64.5, 21.8, 0.2, 7.3, 18.7, 5.1, 17.3, 27.4, 16, 24.2, 9.7,
8.2, 5.7, 41.8, 10.6, 22.8, 4.8, 6, 4, 50, 21, 30, 5, 11, 12),
response = c(0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L,
1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L), resp.time = c(18, 2, 13, 17, 22, 2,
6, 5, 12, 8, 3, 2, 1, 21, 2, 43, 4, 2, 4, 5, 0.1, 137, 4,
87, 17, 24, 72, 19, 14, 83, 68, 56, 57, 18, 14, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 15.6, 8.9, 15, 16.4, 7.7, 75.5, 3, 54.8, 22.2, 12, 14.3,
6, 12, 21.8, 6, 3, 12, 3, 6, 3, 3, 12, 9.7, 3, 3, 12, 3,
6, 3, 6, 4, 50, 21, 30, 5, 11, 12)), .Names = c("WHO", "Death",
"Follow.up.death", "response", "resp.time"), class = "data.frame", row.names = c(NA,
-106L))
Does this come close?
# Wrangling data into a digestable format
v <- w[w$Death == 1,]
vv <- w[w$response == 1,]
df <- data.frame(x = as.factor(c(v$WHO, vv$WHO)),
y = c(v$Follow.up.death, vv$resp.time),
f = rep(c("Death", "Response"), c(nrow(v), nrow(vv))))
df <- df[!is.na(df$x),]
And then making the plot:
ggplot(df) +
geom_boxplot(aes(x, y, fill = f, colour = f)) +
scale_x_discrete(name = "WHO") +
scale_y_continuous(name = "Time") +
# Alpha encoded as hex values as to force to apply on fill only
# Not sure if alpha = 0.2 corresponds to '33'
scale_fill_manual(values = c("#FF000033", "#0000FF33"), name = "Survival",
labels = c("Overall", "Progression-free")) +
scale_colour_manual(values = c("red", "blue"), name = "Survival",
labels = c("Overall", "Progression-free"))
Similar to #tuenbrand's approach, but using dplyr/tidyr to wrangle the data:
library(tidyverse)
df <- w %>%
rename(Overall = "Follow.up.death", `Progression-free` = "resp.time") %>%
gather(key = Survival, value = Time, Overall, `Progression-free`) %>%
filter((Death == 1 & Survival == "Overall") |
(response == 1 & Survival == "Progression-free")) %>%
mutate(WHO = paste("WHO:", WHO))
And then facet on WHO to space the boxplots:
ggplot(df, aes(x = Survival, y = Time, fill = Survival)) +
geom_boxplot() + facet_wrap(~WHO, nrow = 1) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
legend.position = "bottom")
I'm trying to create a subset of data that contains only the rows with missing data in one of my columns.
The data:
data<-structure(list(ID = c(1, 2, 3, 4, 7, 9, 10, 12, 13, 14, 15, 16,
17, 18, 20, 21, 22, 23, 24, 25, 27, 28, 29, 31, 34, 37, 38, 39,
40, 41), QnSinV1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), QnSinV2 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), QnSinV3 = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), QnSize = c(0.032140423, 0.017620319,
NA, -0.093448167, -0.051090375, 0.001188913, NA, -0.144868599,
-0.000260992, 0.008502255, -0.00346349, 0.017208373, 0.004301855,
0.004420431, -0.007564124, NA, 0.174388101, -0.142412328, 0.064935852,
-0.052174354, NA, 0.005180317, 0.05728222, 0.041215822, -0.002449455,
-0.040942923, -0.082284946, -0.173656321, 0.022723036, -0.061326436
), QnWt = c(15.8, 16.5, 11.9, 13.7, 15, 15.3, 13.7, 15.8, 16.3,
15.9, 15.1, 14.5, 14.4, 15.7, 14.4, 13.3, 14.8, 15.1, 15.1, 14.7,
15.8, 17.8, 16.4, 13.4, 15.1, 14.8, 14.2, 12.7, 17.9, 16.2),
QnWtLsCL = c(NA, 0.503030303, 0.596638655, NA, 0.446666667,
0.509803922, 0.408759124, 0.462025316, 0.552147239, 0.509433962,
0.456953642, 0.455172414, 0.506944444, NA, 0.486111111, 0.473684211,
0.513513514, 0.516556291, 0.582781457, 0.537414966, 0.474683544,
0.43258427, 0.432926829, NA, 0.569536424, 0.445945946, 0.485915493,
0.543307087, NA, 0.543209877), ClaustPer = c(NA, 1L, 2L,
NA, 3L, 0L, 2L, 0L, 1L, 0L, 0L, 0L, 1L, NA, 0L, 7L, 1L, 0L,
1L, 0L, 1L, 2L, 2L, NA, 2L, 3L, 2L, 2L, NA, 0L), QnSurvCL = c(0L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L),
ColWtCL = c(NA, 11.7, 7.3, NA, 9.1, 11.1, 9.6, 11.2, 9, 11.2,
12, 11, 10.9, NA, 9.9, 8.6, 10.8, 10.9, 8.7, 10.8, 11.6,
13.7, 10.8, NA, 9.3, 9.6, 9.8, 8.7, NA, 11.1), ColWtCL_6 = c(NA,
57.1, 45, NA, 73.6, NA, NA, NA, 43.8, NA, NA, 71.1, NA, NA,
53.7, NA, 84.4, NA, NA, NA, 56, 56.1, NA, NA, 59.4, NA, 45.7,
NA, NA, NA), ColGrowthCL_6 = c(NA, 4.88034188, 6.164383562,
NA, 8.087912088, NA, NA, NA, 4.866666667, NA, NA, 6.463636364,
NA, NA, 5.424242424, NA, 7.814814815, NA, NA, NA, 4.827586207,
4.094890511, NA, NA, 6.387096774, NA, 4.663265306, NA, NA,
NA), QnSurvCL_6 = c(NA, 1L, NA, NA, 1L, NA, NA, NA, 1L, NA,
NA, 1L, NA, NA, 1L, 0L, 1L, NA, NA, NA, 1L, 1L, NA, NA, 1L,
NA, 1L, NA, NA, NA), IR = c(-0.1919695, 0.0214441, NA, 0.0886954,
0.4221713, 0.0869788, 0.2716466, 0.0289674, -0.0291414, -0.1739616,
-0.0215773, -0.1473209, 0.0370336, 0.254584, 0.0332632, -0.0203844,
0.1524175, -0.051451, -0.0612144, 0.1617955, 0.0354173, 0.0904954,
0.3344705, 0.0990583, 0.1985931, 0.0419539, -0.0159598, 0.1159526,
-0.0057495, -0.1811458), SH = c(1.2064, 1.1093, NA, 0.922,
0.643, 0.9284, 0.7225, 0.9866, 1.0804, 1.2226, 1.0315, 1.1953,
1.007, 0.6991, 1.0264, 1.0265, 0.8865, 1.1184, 1.094, 0.829,
1.0142, 0.9824, 0.6793, 0.9188, 0.7853, 1.0352, 1.0648, 0.9654,
1.0366, 1.2044), HL = c(0.3774, 0.4349, NA, 0.5091, 0.6187,
0.5168, 0.6405, 0.4691, 0.4555, 0.3444, 0.4908, 0.3819, 0.4846,
0.6256, 0.4638, 0.4778, 0.5219, 0.433, 0.447, 0.564, 0.4899,
0.4612, 0.6542, 0.5162, 0.5549, 0.4928, 0.4471, 0.4959, 0.4523,
0.3511), MLH = c(0.534090909090909, 0.5, NA, 0.40506329113924,
0.298507462686567, 0.410958904109589, 0.293103448275862,
0.442105263157895, 0.48, 0.554347826086957, 0.453488372093023,
0.535353535353535, 0.443298969072165, 0.304878048780488,
0.457446808510638, 0.455555555555556, 0.397849462365591,
0.494252873563218, 0.48314606741573, 0.377777777777778, 0.457446808510638,
0.445652173913043, 0.3, 0.412371134020619, 0.354838709677419,
0.464646464646465, 0.474226804123711, 0.43010752688172, 0.46078431372549,
0.541666666666667)), .Names = c("ID", "QnSinV1", "QnSinV2",
"QnSinV3", "QnSize", "QnWt", "QnWtLsCL", "ClaustPer", "QnSurvCL",
"ColWtCL", "ColWtCL_6", "ColGrowthCL_6", "QnSurvCL_6", "IR",
"SH", "HL", "MLH"), row.names = c(1L, 2L, 3L, 4L, 7L, 9L, 10L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 20L, 21L, 22L, 23L, 24L, 25L,
27L, 28L, 29L, 31L, 34L, 37L, 38L, 39L, 40L, 41L), class = "data.frame")
My guess (which doesn't work):
test<-subset(data, data$ColWtCL_6=='NA')
test
You can do it also without subset(). To select NA values you should use function is.na().
data[is.na(data$ColWtCL_6),]
Or with subset()
subset(data,is.na(ColWtCL_6))
A tidyverse approach (package dplyr):
test <-
data %>%
filter(is.na(ColWtCL_6))
If you want to filter based on NAs in multiple columns, please consider using function filter_at() in combinations with a valid function to select the columns to apply the filtering condition and the filtering condition itself.
Example 1: select rows of data with NA in all columns starting with Col:
test <-
data %>%
filter_at(vars(starts_with("Col")), all_vars(is.na(.)))
Example 2: select rows of data with NA in one of the columns starting with Col:
test <-
data %>%
filter_at(vars(starts_with("Col")), any_vars(is.na(.)))
This link from tidyverse documentation is very inspiring: https://dplyr.tidyverse.org/reference/filter_all.html
Here's another solution to find ǸA's across all columns in a dataframe using dplyr:
library(dplyr)
# get column names
colnms <- colnames(df)
# filter
df %>%
filter_at(vars(all_of(colnms)), any_vars(is.na(.)))