ggsurvplot and ggplot lattice ?! Plotting kaplan-meier curve with cumulative incidence function - r

I would like to plot a kaplan meier curve (KM) and cumulative events or cumulative incidence function (CIF) in one plot as a lattice.
I have switched recently from SAS to R, and in SAS you can do it all in one step using a macro (See this image), but I couldn't find something similar in R yet.
Currently, I run a code for two separate graphs. The first plots survfit object using ggsurvplot which results in a KM curve, While the second plots a cuminc object after a number of transformations using ggplot. ggcompetingrisks was not very optimizable, so I don't use it. Also I am interested in plotting one certain competing risk for example death from cancer, and not all competing risks.
Here is an example of my current code using the BMT data-frame from the survminer package.
library(survminer)
library(cmprsk)
data(BMT)
# I'll add the variable Death to plot overall survival.
BMT <- mutate(BMT, death = ifelse (status == 1, 1, 0))
# KM plot:
figKM <- ggsurvplot(survfit(Surv(ftime, death) ~ dis, BMT))
figKM
# CIF plot:
cif <- cuminc(ftime = BMT$ftime, fstatus = BMT$status, group = BMT$dis, cencode = 0)
cifDT <- cif %>%
list_modify("Tests" = NULL) %>%
map_df(`[`, c("time", "est"), .id = "id") %>%
filter(id %in% c("0 1","1 1")) # to keep the incident I want
figCIF <- ggplot (cifDT, aes(x = time, y = est, color = id)) + geom_step(lwd = 1.2)
figCIF
is there a way to put figKM and figCIF together in a lattice plot? May by plotting them differently?

If you look at the contents of your figKM object with class and str you see that the first item in that list is a "plot", so this seems to do what you asked for in your comment:
library(cowplot)
plot_grid(figKM[[1]], figKM[[1]], nrow = 2)
I'm not a tidyverse-user so the map_df is perhaps some clone of the base function Reduce or Map but I don't have enough experience to a) know which package to load, or b) have the ability to figure out what is being done with your piped expressions. Commented code might have been more understandable. I am quite experienced with the survival package.

Related

create and plot a cumulative probability density function with custom bin # and sizes of stock price ROC in R

I want to import daily stock market price data into R from any ticker, and examine one historical time segment of it. Then, from this segment, convert these prices into daily ROC/rateofchange % changes. Next, take this ROC series and create a cumulative probability density function which allows me to set any custom number of sorting bins, and any size limit for each bin. example: 22 bins with .3% limit. Next, plot this CPDF as either a histogram or a scatterplot. The final step would be to do this for 2 different sections of the same stock and plot them next to each other for visual inspection. I have started a code on stock ticker SPY, but I cannot get it to work.
library(quantmod)
library(tidyquant)
library(tidyverse)
# using tidyverse to import a ticker
spy <- tq_get("spy")
spy010422 <- tq_get("spy", get ="stock.prices", from ='2022-01-04', to = '2022-01-24')
str(spy010422)
# getting ROC between prices in the series
spy010422.rtn = ROC(spy010422$close, n = 1, type = c("discrete"), na.pad = TRUE)
str(spy010422.rtn)
# trying to use ggplot and tibble to create an ECDF function
spy010422.rtn %>%
tibble() %>%
ggplot() +
stat_ecdf(aes(.))
# another attempt at running ECDF on the ROC series
spy010422.rtn %>%
ggplot(spy010422.rtn) +
stat_ecdf(aes(close))
# trying to set the number of bins and bin size for the ECDF
spy010422.rtn %>%
mutate(rounded = round(close/.3, 0) *.3,
bin = min_rank(rounded)) %>%
ggplot(aes(close, bin)) +
geom_line()
# next time segment of the ticker spy to compare this to
spy020222 <- tq_get("spy", get ="stock.prices", from ='2022-02-02', to = '2022-02-24')
I couldn't understand what exacly you wanted to plot. Normally a CPDF is just a continuous line, and doesn't have bins to customise. Also "plot this CPDF as either a histogram or a scatterplot" is a weird prhase to me, as one normally plots the histogram/scatterplot of the variable, not of the CPDF of the variable. Given that, I made a function that plots the histogram of the ROC of the ticker, and you can coment if that was what you wanted or not.
The function takes a list of dates in the format list(c(from1, to1), c(from2, to1), ...) (you can add as many intervals as you want), and loops for each interval on this list (with the purrr::map function). For each interation, it creates the histogram costumizing the bins argument. After the loop, the graphs are binded in one figure using the ggpubr::ggarrange function (you must run install.packages("ggpubr") if you don't have the package installed).
library(quantmod)
library(tidyquant)
library(tidyverse)
gg.roc.hist = function(ticker, dates, bins = 30){
map(dates, function(dates){ #loop for each interval in the 'dates' list
df = tq_get(ticker, get ="stock.prices", from = dates[1], to = dates[2]) #get the prices
df$roc = ROC(df$close, n = 1, type = c("discrete"), na.pad = TRUE) #add a column with the ROC
ggplot(df, aes(x = roc)) +
geom_histogram(bins = bins) + #create a histogram changing the bins
labs(title = paste0(dates[1], " to ", dates[2]))}) %>%
ggpubr::ggarrange(plotlist = .) #bind the graphs together
}
Runnig:
gg.roc.hist('spy', list(c('2022-01-04','2022-01-24'), c('2022-02-02', '2022-02-24')), 22)
Yields this graph:

Plotting Chi-square Distribution with ggplot2 in R

I would like to use R to randomly construct chi-square distribution with the degree of freedom of 5 with 100 observations. After doing so, I want to calculate the mean of those observations and use ggplot2 to plot the chi-square distribution with a bar chart. The following is my code:
rm(list = ls())
library(ggplot2)
set.seed(9487)
###Step_1###
x_100 <-data.frame(rchisq(100, 5, ncp = FALSE))
###Step_2###
mean_x <- mean(x_100[,1])
class(x_100)
###Step_3###
plot_x_100 <- ggplot(data = x_100, aes(x = x_100)) +
geom_bar()
plot_x_100
Firstly, I construct a data frame of a random chi-square distribution with df = 5, obs = 100.
Secondly, I calculate the mean value of this chi-square distribution.
At last, I plot the graph with the ggplot2 package.
However, I get the result like the follows:
Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error in is.finite(x) : default method not implemented for type 'list'
I got stuck in this problem for several hours and cannot find any list in my global environment. It would be appreciated if anyone can help me and give me some suggestions.
The problem is that inside the ggplot function you are calling the same dataframe (x_100) as both the data and the x variable inside aes. Remember that in ggplot, inside aes you should indicate the name of the column you wish to map. Additionally, if you want to plot the chi-square distribution I think it might be a better idea to use the geom_histogram instead of geom_bar, as the first one groups the observations into bins.
library(ggplot2)
# Rename the only column of your data frame as "value"
colnames(x_100) <- "value"
plot_x_100 <- ggplot(data = x_100, aes(x = value)) +
geom_histogram(bins = 20)

ggplot2 vs sm package density plot output (and statistical analysis)

Consider the following data frame example
library('ggplot2')
library('sm')
original<-c(1:100,1)
a<-sample(original,100)
b<-rep(1:4,25)
lala<-data.frame(a,b)
My aim is to produce density plots for values in lala$a, according to each group (1,2,3,4) defined in lala$b.
For doing so in ggplot2 I could do the following
plotDensityggplot<-ggplot()+
geom_density(data = lala, aes(a, colour=factor(b)))+
theme_classic()
print(plotDensityggplot)
producing this:
However, when I plot the same data using the 'sm' package to make a formal comparison of the densities using the following code:
sm.density.compare(lala$a,as.numeric(lala$b),model = "equal")
The density curves extend beyond zero in the X-axis, despite there is no value below zero in lala$a
What's going on? - note that this affect the densities reported in the y-axis.
Is the p-value from the permutation test of equality obtained from sm.density.compare a reliable estimate? - thank you!
For what it's worth, you can (more or less) reproduce the sm output in ggplot by pre-computing densities with base R's density (I'm not familiar with sm but I imagine that sm.density calls base R's density at some point as well).
library(tidyverse)
lala %>%
group_by(b) %>%
summarise(tmp = list(map_dfc(c("x", "y"), ~density(a)[.x]))) %>%
unnest() %>%
ggplot(aes(x, y, colour = as.factor(b))) +
geom_line()
I'm not sure how geom_density (or stat_density) tune kernel density estimation parameters, but you seem to have less control over them than in base R's density.

Indexing separate survival curves

I would like to plot Kaplan-Meier survival estimates for each of two groups in ggplot.
To do so requires getting a separate survival curve for each group. The survfit function in the survival package splits the nicely but I don't know how to index the separate plots to work on them.
Here is sample data:
rearrest<-read.table("http://stats.idre.ucla.edu/stat/examples/alda/rearrest.csv", sep=",", header=T)
This is the curve ungrouped
(sCurve <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~1, data = rearrest)))
It is easy to index elements within this, for example
sCurve$n.event
When I fit the same thing except this time grouped according to the value of the personal variable I get two nice survival curve objects ready to go.
(sCurveA <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~personal, data = rearrest)))
One object is labelled personal=0 and the other personal=1. I have tried indexing with $, [], [[]] both with number-type indexes and named-, all to no avail.
Can anyone help?
sCurveA$strata provides the grouping variable as a vector. You can pull out the key pieces and throw them into a data.frame for ggplot.
df = data.frame(Time = sCurveA$time,
Survival = sCurveA$surv,
Strata = sCurveA$strata)
ggplot(df, aes(Time, Survival, col = Strata)) +
geom_line()

Graphing a histogram overlaid with a fitted 2 parameter Weibull function

I would like to plot both a histogram to a fitted Weibull function on the same graph. The code to plot the histogram is:
hist(data$grddia2, prob=TRUE,breaks=5)
The code for the fitted Weibull function is:(Need the MASS package)
fitdistr(data$grddia2,densfun=dweibull,start=list(scale=1,shape=2))
How do I plot both together on the same graph. I've attached the data set.
Also, bonus to anyone who can provide code that can achieve the same thing, but create a graph for each column of data. Many columns within a data set. Would be nice to have all graphs on the same page.
https://www.dropbox.com/s/ra9c2kkk49vyyyc/Diameter%20Distribution.csv?dl=0
Here is the code
library("ggplot2")
library("dplyr")
library("tidyr")
library("MASS")
# Import dataset and filter the column "treeno"
# Use namespace dplyr:: explicitly because of conflict with MASS:: for function "select"
data <- read.csv("Diameter Distribution.csv") %>%
dplyr::select(-treeno)
# Function to provide the Weibull distribution for each column
# The distribution is calculated based on the estimated scale and shape parameters of the input
fitweibull <- function(column) {
x <- seq(0,7,by=0.01)
fitparam <- column %>%
unlist %>%
fitdistr(densfun=dweibull,start=list(scale=1,shape=2))
return(dweibull(x, scale=fitparam$estimate[1], shape=fitparam$estimate[2]))
}
# Apply function for each column then consolidate all in a data.frame
fitdata <-data %>%
apply(2, as.list) %>%
lapply(FUN = fitweibull) %>%
data.frame()
# Display graphs
multiplyingFactor<-10
ggplot() +
geom_histogram(data=gather(data), aes(x=value, group=key, fill=key), alpha=0.2) +
geom_line(data=gather(fitdata), aes(x=rep(seq(0,7,by=0.01),ncol(fitdata)), y=multiplyingFactor*value, group=key, color=key))
And the output figure
Variant: thanks to the wonderful ggplot2 package you can also have the graphs apart just by adding this final line of code
+ facet_wrap(~ key) + theme(legend.position = "none")
Which gives you this other figure:

Resources