How to use ggrepel with a survival plot (ggsurvplot)? - r

I would like to add the label of each survival curve at the end of the lines.
I am using ggsurvplot from the Survminer package (which is based on ggplot2).
I don't know how to do it with ggrepel. I didn't find any example with survival data:
require("survival")
library(survminer)
library(ggrepel)
fit<- survfit(Surv(time, status) ~ sex, data = lung)
p=ggsurvplot(fit, data = lung)$plot;
p + geom_label_repel()
The code above throws an error.

The object p you have created contains enough information to generate the labels. p$data is a data frame, and contains a column called strata which you can use here. You need to map the label aesthetic to this column. You will also need to filter a copy of the data to pass to the geom_label_repel layer that contains only the maximum time value for each stratum:
p + geom_label_repel(aes(label = strata),
data = p$data %>%
group_by(strata) %>%
filter(time == max(time)))

Related

Scatterplot using ggplot

I need to create a scatterplot of count vs. depth of 12 species using ggplot.
This is what I have so far:
library(ggplot2)
ggplot(data = ReefFish, mapping = aes(count, depth))
However, how do I use geom_point(), geom_smooth(), and facet_wrap() to include a smoother as well as include just the 12 species I want from the data (ReefFish)? Since I believe what I have right now includes all species from the data.
Here is an example of part of my data:
Since I don't have access to the ReefFish data set, here's an example using the built-in mpg data set about cars. To make it work with your data set, just edit this code to replace manufacturers with species.
Filter the data
First we filter the data so that it only includes the species/manufacturers we're interested in.
# load our packages
library(ggplot2)
library(magrittr)
library(dplyr)
# set up a character vector of the manufacturers we're interested in
manufacturers <- c("audi", "nissan", "toyota")
# filter our data set to only include the manufacturers we care about
mpg_filtered <- mpg %>%
filter(manufacturer %in% manufacturers)
Plot the data
Now we plot. Your code was just about there! You just needed to add the plot elements, you wanted, like so:
mpg_filtered %>%
ggplot(mapping = aes(x = cty,
y = hwy)) +
geom_point() +
geom_smooth() +
facet_wrap(~manufacturer)
Hope that helps, and let me know if you have any issues.

Add scatter points on top of ggsurvplot

After calling ggsurvplot(...) I want to superimpose some points from another data frame df containing two columns time and survival. I'm looking for tips on accomplishing this.
Edit: some code as an example
require("survival")
require("survminer")
fit<- survfit(Surv(time, status) ~ sex, data = lung)
# Basic survival curves
ggsurvplot(fit, data = lung)
# Example points
x <- fit$time
y <- fit$n.risk
How would I superimpose points(x, y) on ggsurvplot plot.
The ggplot-type object is part of the object returned by ggsurvplot() and can be addressed as $plot:
ggplot1 <- ggsurvplot(fit, data = lung)$plot
You can work with it as with a usual ggplot object and add other layers. For your specific example, however, it is not clear how you want to define Y coordinate of your points: fit$n.risk is a number between 1 and 138 while your plot is in 0..1 range. Here is one option:
ggplot1 <- ggsurvplot(fit, data = lung)$plot
df1 <- data.frame(time=fit$time, nRisk=fit$n.risk, nRiskRel=fit$n.risk/max(fit$n.risk))
ggplot1 + geom_point(aes(x=time, y=nRiskRel), data = df1, alpha=0.5, size=3)
You may want to add colors etc.

Indexing separate survival curves

I would like to plot Kaplan-Meier survival estimates for each of two groups in ggplot.
To do so requires getting a separate survival curve for each group. The survfit function in the survival package splits the nicely but I don't know how to index the separate plots to work on them.
Here is sample data:
rearrest<-read.table("http://stats.idre.ucla.edu/stat/examples/alda/rearrest.csv", sep=",", header=T)
This is the curve ungrouped
(sCurve <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~1, data = rearrest)))
It is easy to index elements within this, for example
sCurve$n.event
When I fit the same thing except this time grouped according to the value of the personal variable I get two nice survival curve objects ready to go.
(sCurveA <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~personal, data = rearrest)))
One object is labelled personal=0 and the other personal=1. I have tried indexing with $, [], [[]] both with number-type indexes and named-, all to no avail.
Can anyone help?
sCurveA$strata provides the grouping variable as a vector. You can pull out the key pieces and throw them into a data.frame for ggplot.
df = data.frame(Time = sCurveA$time,
Survival = sCurveA$surv,
Strata = sCurveA$strata)
ggplot(df, aes(Time, Survival, col = Strata)) +
geom_line()

bacterial growth curve (logistic/sigmoid) with multiple explanatory variables in R

Goal: I want to obtain regression (ggplot curves and model parameters) for growth curves with multiple treatments.
I have data for bacterial cultures C={a,b,c,d} growing on nutrient sources N={x,y}.
Their idealized growth curves (measuring turbidity of cell culture every hour) look something like this:
There are 8 different curves to obtain coefficients and curves for. How can I do it in one go for my data frame, feeding the different treatments as different groups for the nonlinear regression?
Thanks!!!
This question is similar to an unanswered question posted here.
(sourcecode for idealized data, sorry it's not elegant as I'm not a computer scientist):
a<-1:20
a[1]<-0.01
for(i in c(1:19)){
a[i+1]<-1.3*a[i]*(1-a[i])
}
b<-1:20
b[1]<-0.01
for(i in c(1:19)){
b[i+1]<-1.4*b[i]*(1-b[i])
}
c<-1:20
c[1]<-0.01
for(i in c(1:19)){
c[i+1]<-1.5*c[i]*(1-c[i])
}
d<-1:20
d[1]<-0.01
for(i in c(1:19)){
d[i+1]<-1.6*d[i]*(1-d[i])
}
sub.data<-cbind(a,b,c,d)
require(reshape2)
data<-melt(sub.data, value.name = "OD600")
data$nutrition<-rep(c("x", "y"), each=5, times=4)
colnames(data)[1:2]<-c("Time", "Culture")
ggplot(data, aes(x = Time, y = OD600, color = Culture, group=nutrition)) +
theme_bw() + xlab("Time/hr") + ylab("OD600") +
geom_point() + facet_wrap(~nutrition, scales = "free")
If you are familiar group_by function from dplyr (included in tidyverse), then you can group your data by Culture and nutrition and create models for each group using broom. I think this vignette is getting at exactly what you are trying to accomplish. Here is the code all in one go:
library(tidyverse)
library(broom)
library(mgcv) #For the gam model
data %>%
group_by(Culture, nutrition) %>%
do(fit = gam(OD600 ~ s(Time), data = ., family=gaussian())) %>% # Change this to whatever model you want (e.g., non-linear regession, sigmoid)
#do(fit = lm(OD600 ~ Time, data = .,)) %>% # Example using linear regression
augment(fit) %>%
ggplot(aes(x = Time, y = OD600, color = Culture)) + # No need to group by nutrition because that is broken out in the facet_wrap
theme_bw() + xlab("Time/hr") + ylab("OD600") +
geom_point() + facet_wrap(~nutrition, scales = "free") +
geom_line(aes(y = .fitted, group = Culture))
If you are ok without one go, break apart the %>% for better understanding. I used GAM which overfits here but you could replace this with whatever model you want, including sigmoid.

Graphing a histogram overlaid with a fitted 2 parameter Weibull function

I would like to plot both a histogram to a fitted Weibull function on the same graph. The code to plot the histogram is:
hist(data$grddia2, prob=TRUE,breaks=5)
The code for the fitted Weibull function is:(Need the MASS package)
fitdistr(data$grddia2,densfun=dweibull,start=list(scale=1,shape=2))
How do I plot both together on the same graph. I've attached the data set.
Also, bonus to anyone who can provide code that can achieve the same thing, but create a graph for each column of data. Many columns within a data set. Would be nice to have all graphs on the same page.
https://www.dropbox.com/s/ra9c2kkk49vyyyc/Diameter%20Distribution.csv?dl=0
Here is the code
library("ggplot2")
library("dplyr")
library("tidyr")
library("MASS")
# Import dataset and filter the column "treeno"
# Use namespace dplyr:: explicitly because of conflict with MASS:: for function "select"
data <- read.csv("Diameter Distribution.csv") %>%
dplyr::select(-treeno)
# Function to provide the Weibull distribution for each column
# The distribution is calculated based on the estimated scale and shape parameters of the input
fitweibull <- function(column) {
x <- seq(0,7,by=0.01)
fitparam <- column %>%
unlist %>%
fitdistr(densfun=dweibull,start=list(scale=1,shape=2))
return(dweibull(x, scale=fitparam$estimate[1], shape=fitparam$estimate[2]))
}
# Apply function for each column then consolidate all in a data.frame
fitdata <-data %>%
apply(2, as.list) %>%
lapply(FUN = fitweibull) %>%
data.frame()
# Display graphs
multiplyingFactor<-10
ggplot() +
geom_histogram(data=gather(data), aes(x=value, group=key, fill=key), alpha=0.2) +
geom_line(data=gather(fitdata), aes(x=rep(seq(0,7,by=0.01),ncol(fitdata)), y=multiplyingFactor*value, group=key, color=key))
And the output figure
Variant: thanks to the wonderful ggplot2 package you can also have the graphs apart just by adding this final line of code
+ facet_wrap(~ key) + theme(legend.position = "none")
Which gives you this other figure:

Resources