Customizing a competing risks plot in R with package "cmprsk" - r

I am trying to customize a plot for competing risks using R and the package cmprsk. Specifically, I want to overwrite the default that for competing events colors are used and for different groups linetypes are used.
Here is my reproducible example:
library(ggplot2)
library(cmprsk)
library(survminer)
# some simulated data to get started
comp.risk.data <- data.frame("tfs.days" = rweibull(n = 100, shape = 1, scale = 1)*100,
"status.tfs" = c(sample(c(0,1,1,1,1,2), size=50, replace=T)),
"Typing" = sample(c("A","B","C","D"), size=50, replace=T))
# fitting a competing risks model
CR <- cuminc(ftime = comp.risk.data$tfs.days,
fstatus = comp.risk.data$status.tfs,
cencode = 0,
group = comp.risk.data$Typing)
# the default plot makes it impossible to identify the groups
ggcompetingrisks(fit = CR, multiple_panels = F, xlab = "Days", ylab = "Cumulative incidence of event",title = "Competing Risks Analysis")+
scale_color_manual(name="", values=c("blue","red"), labels=c("Tumor", "Death without tumor"))
Using ggplot_build() I managed to change the default regarding linetype and color, but I cannot find a way to add a legend.
p2 <- ggcompetingrisks(fit = CR, multiple_panels = FALSE, xlab = "Days", ylab = "Cumulative incidence of event",title = "Death by TCR", ylim = c(0, 1)) +
scale_color_manual(name="", values=c("blue","red"), labels=c("Tumor", "Death without tumor"))
q <- ggplot_build(p2)
q$data[[1]]$colour2 <- ifelse(q$data[[1]]$linetype=="solid","blue", ifelse(q$data[[1]]$linetype==22,"red", ifelse(q$data[[1]]$linetype==42,"green", ifelse(q$data[[1]]$linetype==44,"black", NA))))
q$data[[1]]$linetype <- ifelse(q$data[[1]]$colour=="blue","solid", ifelse(q$data[[1]]$colour=="red","dashed", NA))
q$data[[1]]$colour <- q$data[[1]]$colour2
q$plot <- q$plot + ggtitle("Competing Risks Analysis") + guides(col = guide_legend()) + theme(legend.position = "right")
p2 <- ggplot_gtable(q)
plot(p2)
Does anyone know how to add the legend to a plot manipulated by ggplot_build()? Or an alternative way to plot the competing risks such that color indicated group and linetype indicates event?

You don't need to go down the ggplot_build route. The function ggcompetingrisks returns a ggplot object, which itself contains the aesthetic mappings. You can overwrite these with aes:
p <- ggcompetingrisks(fit = CR,
multiple_panels = F,
xlab = "Days",
ylab = "Cumulative incidence of event",
title = "Competing Risks Analysis")
p$mapping <- aes(x = time, y = est, colour = group, linetype = event)
Now we have reversed the linetype and color aesthetic mappings, we just need to swap the legend labels and we're good to go:
p + labs(linetype = "event", colour = "group")
Note that you can also add color scales, themes, coordinate transforms to p like any other ggplot object.

Related

How to edit ticks and tick labels in ggusrvplot, like in ggplot2?

I want to only have labels every second number, but have the small ticks for every number in my graph. As you can see in the figure I added, the labels are every 2nd tick on the X-axis.
But I want to achieve the result that's on the Y-axis:
With ggplot, this is possible with ggh4x and if_elfse. But I can't find a way how to do this in ggsurvplot. This is my code, for the first picture. The code for the second picture is found here: Code 2
ggsurvplot(fit, data = d,
conf.int = F,
censor = F,
palette = c("green", "purple", "red"),
legend.labs = c("Reference water (pH 7.3)\n(N = 66)",
"Acidic al-poor (pH 5.8)\n(N = 66)",
"Acidic al-rich (pH 5.8)\n(N = 66)"),
legend.title = "Water quality",
xlab = "Days",
xlim = c(1,23),
break.time.by = 2
)
Thank you in advance for yor help.
As ggsurvplot returns a list containing the plot as a ggplot2 object you could achieve your desired result using ggh2x by overriding the x scale as in the example code by #tjebo from Adding minor tick marks to the x axis in ggplot2 (with no labels).
Making use of the default example from ?ggsruvplot:
library(survminer)
library(survival)
library(ggh4x)
fit<- survfit(Surv(time, status) ~ sex, data = lung)
p <- ggsurvplot(fit, data = lung, main = "Survival curve",
xlab = "Days",
xlim = c(1,23))
p$plot +
scale_x_continuous(minor_breaks = seq(0, 24, 1), breaks = seq(0, 24, 2), guide = "axis_minor") +
theme(ggh4x.axis.ticks.length.minor = rel(1))
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.

Is there a way to use R to break chart axis and break linear regression line?

I'm trying to figure out how to modify a scatter-plot that contains two groups of data along a continuum separated by a large gap. The graph needs a break on the x-axis as well as on the regression line.
This R code using the ggplot2 library accurately presents the data, but is unsightly due to the vast amount of empty space on the graph. Pearson's correlation is -0.1380438.
library(ggplot2)
p <- ggplot(, aes(x = dis, y = result[, 1])) + geom_point(shape = 1) +
xlab("X-axis") +
ylab("Y-axis") + geom_smooth(color = "red", method = "lm", se = F) + theme_classic()
p + theme(plot.title = element_text(hjust = 0.5, size = 14))
This R code uses gap.plot to produce the breaks needed, but the regression line doesn't contain a break and doesn't reflect the slope properly. As you can see, the slope of the regression line isn't as sharp as the graph above and there needs to be a visible distinction in the slope of the line between those disparate groups.
library(plotrix)
gap.plot(
x = dis,
y = result[, 1],
gap = c(700, 4700),
gap.axis = "x",
xlab = "X-Axis",
ylab = "Y-Axis",
xtics = seq(0, 5575, by = 200)
)
abline(v = seq(700, 733) , col = "white")
abline(lm(result[, 1] ~ dis), col = "red", lwd = 2)
axis.break(1, 716, style = "slash")
Using MS Paint, I created an approximation of what the graph should look like. Notice the break marks on the top as well as the discontinuity between on the regression line between the two groups.
One solution is to plot the regression line in two pieces, using ablineclip to limit what's plotted each time. (Similar to #tung's suggestion, although it's clear that you want the appearance of a single graph rather than the appearance of facets.) Here's how that would work:
library(plotrix)
# Simulate some data that looks roughly like the original graph.
dis = c(rnorm(100, 300, 50), rnorm(100, 5000, 100))
result = c(rnorm(100, 0.6, 0.1), rnorm(100, 0.5, 0.1))
# Store the location of the gap so we can refer to it later.
x.axis.gap = c(700, 4700)
# gap.plot() works internally by shifting the location of the points to be
# plotted based on the gap size/location, and then adjusting the axis labels
# accordingly. We'll re-compute the second half of the regression line in the
# same way; these are the new values for the x-axis.
dis.alt = dis - x.axis.gap[1]
# Plot (same as before).
gap.plot(
x = dis,
y = result,
gap = x.axis.gap,
gap.axis = "x",
xlab = "X-Axis",
ylab = "Y-Axis",
xtics = seq(0, 5575, by = 200)
)
abline(v = seq(700, 733), col = "white")
axis.break(1, 716, style = "slash")
# Add regression line in two pieces: from 0 to the start of the gap, and from
# the end of the gap to infinity.
ablineclip(lm(result ~ dis), col = "red", lwd = 2, x2 = x.axis.gap[1])
ablineclip(lm(result ~ dis.alt), col = "red", lwd = 2, x1 = x.axis.gap[1] + 33)

ggplot2: solid line for one group, points for the other

I have four series that I would like to plot.
There are 2 models : xg and algo30.
There are two types of data: predicted and observed.
This means we have the following 4 series: "predicted xg","observed xg", "predicted 30", "observed 30".
I want "xg" to be blue, "algo30" to be red.
I also want predicted to be a solid line and observed to be points.
Here is what I mean, using base plot:
library(magrittr)
library(ggplot2)
library(dplyr)
set.seed(123)
gr <- 1:10
obs.xg <- sort(runif(10, 0.5, 1))
obs.30 <- sort(runif(10, 0.5, 1))
pred.xg <- lm(obs.xg~gr) %>% predict() %>% add(rnorm(10,0,.01))
pred.30 <- lm(obs.30~gr) %>% predict() %>% add(rnorm(10,0,.01))
plot(gr, obs.xg, col="darkblue", ylim=range(c(obs.xg,obs.30)), pch=20)
lines(gr, pred.xg, col="darkblue", lwd=2)
points(gr, obs.30, col="firebrick", pch=20)
lines(gr, pred.30, col="firebrick", lwd=2)
legend("bottomright",
pch=c(20,NA,NA,NA,NA),
lty=c(NA,1,NA,1,1),
lwd=c(NA,1,NA,2,2),
col = c("black","black",NA, "darkblue","firebrick"),
legend=c("observé","prédit",NA,"xgboost","algo30"),
bty='n')
Here is my best attempt using ggplot. Notice that the legend doesnt work as I want.
xg.data <- data.frame(model= "xg", decile = seq(1:10), observed = obs.xg, predicted = pred.xg)
algo30.data <- data.frame(model = "algo30",decile = seq(1:10), observed = obs.30, predicted = pred.30)
ggplotdata <- bind_rows(xg.data, algo30.data)
ggplotdata %>%
ggplot( aes(x=decile, y= predicted, color= model))+ geom_line()+
geom_point(aes(x=decile, y= observed, color = model))
Most of the time when making a legend like this I look to override.aes in guide_legend().
The idea here is to make a legend using an additional aesthetic that you don't want mapped onto the plot itself and then using constants instead of a variable for that aesthetic. I used alpha, since both points and lines use that aesthetic.
Then the heavy lifting is done in scale_alpha_manual: removing the legend name, making sure the plot still looks right by setting the values, and then, finally, picking the correct point type and lines along with blanks for the legend.
ggplot(ggplotdata, aes(x=decile, y= predicted, color= model))+
geom_line( aes(alpha = "prédit") )+
geom_point(aes(x=decile, y= observed, alpha = "observé")) +
scale_alpha_manual(name = NULL, values = c(1, 1),
guide = guide_legend(override.aes = list(linetype = c(0, 1), shape = c(16, NA)))) +
scale_color_manual(name = NULL, values = c("firebrick", "darkblue"))

How to plot a formula with a given range?

I am looking to plot the following:
L<-((2*pi*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T)-1))))
all variables except l are constant:
T<-6000
h<-6.626070040*10^-34
c<-2.99792458*10^8
k<-1.38064852*10^-23
l has a range of 20*10^-9 to 2000*10^-9.
I have tried l<-seq(20*10^-9,2000*10^-9,by=1*10^-9), however this does not give me the results I expect.
Is there a simple solution for this in R, or do I have to try in another language?
Thank you.
Looking at the spectral radiance equation wikipedia page, it seems that your formula is a bit off. Your formula multiplies an additional pi (not sure if intended) and the -1 is inside the exp instead of outside:
L <- ((2*pi*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T)-1))))
Below is the corrected formula. Also notice I have converted it into a function with parameter l since this is a variable:
T <- 6000 # Absolute temperature
h <- 6.626070040*10^-34 # Plank's constant
c <- 2.99792458*10^8 # Speed of light in the medium
k <- 1.38064852*10^-23 # Boltzmann constant
L <- function(l){((2*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T))-1)))}
# Plotting
plot(L, xlim = c(20*10^-9,2000*10^-9),
xlab = "Wavelength (nm)",
ylab = bquote("Spectral Radiance" ~(KW*sr^-1*m^-2*nm^-1)),
main = "Plank's Law",
xaxt = "n", yaxt = "n")
xtick <- seq(20*10^-9, 2000*10^-9,by=220*10^-9)
ytick <- seq(0, 4*10^13,by=5*10^12)
axis(side=1, at=xtick, labels = (1*10^9)*seq(20*10^-9,2000*10^-9,by=220*10^-9))
axis(side=2, at=ytick, labels = (1*10^-12)*seq(0, 4*10^13,by=5*10^12))
The plot above is not bad, but I think we can do better with ggplot2:
h <- 6.626070040*10^-34 # Plank's constant
c <- 2.99792458*10^8 # Speed of light in the medium
k <- 1.38064852*10^-23 # Boltzmann constant
L2 <- function(l, T){((2*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T))-1)))} # Plank's Law
classical_L <- function(l, T){(2*c*k*T)/l^4} # Rayleigh-Jeans Law
library(ggplot2)
ggplot(data.frame(l = c(20*10^-9,2000*10^-9)), aes(l)) +
geom_rect(aes(xmin=390*10^-9, xmax=700*10^-9, ymin=0, ymax=Inf),
alpha = 0.3, fill = "lightblue") +
stat_function(fun=L2, color = "red", size = 1, args = list(T = 3000)) +
stat_function(fun=L2, color = "green", size = 1, args = list(T = 4000)) +
stat_function(fun=L2, color = "blue", size = 1, args = list(T = 5000)) +
stat_function(fun=L2, color = "purple", size = 1, args = list(T = 6000)) +
stat_function(fun=classical_L, color = "black", size = 1, args = list(T = 5000)) +
theme_bw() +
scale_x_continuous(breaks = seq(20*10^-9, 2000*10^-9,by=220*10^-9),
labels = (1*10^9)*seq(20*10^-9,2000*10^-9,by=220*10^-9),
sec.axis = dup_axis(labels = (1*10^6)*seq(20*10^-9,2000*10^-9,by=220*10^-9),
name = "Wavelength (\U003BCm)")) +
scale_y_continuous(breaks = seq(0, 4*10^13,by=5*10^12),
labels = (1*10^-12)*seq(0, 4*10^13,by=5*10^12),
limits = c(0, 3.5*10^13)) +
labs(title = "Black Body Radiation described by Plank's Law",
x = "Wavelength (nm)",
y = expression("Spectral Radiance" ~(kWsr^-1*m^-2*nm^-1)),
caption = expression(''^'\U02020' ~'Spectral Radiance described by Rayleigh-Jeans Law, which demonstrates the ultraviolet catastrophe.')) +
annotate("text",
x = c(640*10^-9, 640*10^-9, 640*10^-9, 640*10^-9,
150*10^-9, (((700-390)/2)+390)*10^-9, 1340*10^-9),
y = c(2*10^12, 5*10^12, 14*10^12, 31*10^12,
35*10^12, 35*10^12, 35*10^12),
label = c("3000 K", "4000 K", "5000 K", "6000 K",
"UV", "VISIBLE", "INFRARED"),
color = c(rep("black", 4), "purple", "blue", "red"),
alpha = c(rep(1, 4), rep(0.6, 3)),
size = 4.5) +
annotate("text", x = 1350*10^-9, y = 23*10^12,
label = deparse(bquote("Classical theory (5000 K)"^"\U02020")),
color = "black", parse = TRUE)
Notes:
I created L2 by also making absolute temperature T a variable
For each T, I plot the function L2 using different colors for representation. I've also added a classical_L function to demonstrate classical theory of spectral radiance
geom_rect creates the light blue shaded area for "VISIBLE" light wavelength range
scale_x_continuous sets the breaks of the x axis, while labels sets the axis tick labels. Notice I have multiplied the seq by (1*10^9) to convert the units to nanometer (nm). A second x-axis is added to display the micrometer scale
Analogously, scale_y_continuous sets the breaks and tick labels for y axis. Here I multiplied by (1*10^-12) or (1*10^(-3-9)) to convert from watts (W) to kilowatts (kW), and from inverse meter (m^-1) to inverse nanometer (nm^-1)
bquote displays superscripts correctly in the y axis label
annotate sets the coordinates and text for curve labels. I've also added the labels for "UV", "VISIBLE" and "INFRARED" light wavelengths
ggplot2
Plot from wikipedia:
Image source: https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Black_body.svg/600px-Black_body.svg.png

Plot multipoints and a best fit line

I want to create one plot graph with the Roundrobin and Prediction points, without colors, where the Roundrobin and Prediction type of points are different, and it has a legend. I was want to add a best fit line for the results.
I am having trouble in adding all these features into one graph that has 2 points. I am used to Gnuplot, but I don't know how to do this with R. How I do this with R?
[1] Input data
Inputdata,Roundrobin,Prediction
1,178,188
2,159,185
3,140,175
[2] Script to generate data
no_faults_data <- read.csv("testresults.csv", header=TRUE, sep=",")
# Graph 1
plot(no_faults_data$Inputdata, no_faults_data$Roundrobin,ylim = range(c(no_faults_data$Roundrobin,no_faults_data$Prediction)),xlab="Input data size (MB)", ylab="Makespan (seconds)")
points(no_faults_data$Inputdata, no_faults_data$Prediction)
abline(no_faults_data$Inputdata, no_faults_data$Roundrobin, untf = FALSE, \dots)
abline(no_faults_data$Inputdata, no_faults_data$Prediction, untf = FALSE, \dots)
legend("top", notitle, c("Round-robin","Prediction"), fill=terrain.colors(2), horiz=TRUE)
In base R you will have to create a fitted model first:
robin <- lm(Roundrobin ~ Inputdata, data = no_faults_data)
pred <- lm(Prediction ~ Inputdata, data = no_faults_data)
plot(no_faults_data$Inputdata, no_faults_data$Roundrobin,
ylim = range(c(no_faults_data$Roundrobin,no_faults_data$Prediction)),
xlab = "Input data size (MB)", ylab = "Makespan (seconds)",
col = "green", pch = 19, cex = 1.5)
points(no_faults_data$Inputdata, no_faults_data$Prediction, pch = 22, cex = 1.5)
abline(robin, lty = 1)
abline(pred, lty = 5)
legend(1.1, 155, legend = c("Round-robin","Prediction"), pch = c(19,22), col = c("green","black"),
bty = "n", cex = 1.2)
which gives:
For further customization of the base R plot, see ?par and ?legend.
With ggplot2 you will need to reshape your data into long format:
library(reshape2)
library(ggplot2)
ggplot(melt(no_faults_data, id="Inputdata"),
aes(x=Inputdata, y=value, shape=variable, color=variable)) +
geom_point(size=4) +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal()
which gives:
Used data:
no_faults_data <- read.csv(text="Inputdata,Roundrobin,Prediction
1,178,188
2,159,185
3,140,175", header=TRUE)
You should look into the ggplot2 package for plotting. Maybe not needed for the 3 points data you provided but it makes much nicer plots than the default.
df <- data.frame("Inputdata" = c(1,2,3,1,2,3), "score" = c(178,159,140,188,185,175), "scoreType" = c(rep("Roundrobin",3), rep("Prediction",3)))
p <- ggplot(data=df, aes(x=Inputdata, y=score, group=scoreType, shape = scoreType)) + geom_point(size=5)
p <- p + ggtitle("My Title")
p+stat_smooth(method="lm",se = FALSE)
Here you group by the type of score and let GG plot make the legend for you. stat_smooth is using lm here.

Resources