How to format multiple ggsurvplots when using arrange_ggsurvplots? - r

I'm trying to create a figure in R which includes multiple kaplan-meier curves and label each plot (e.g. A, B, C). I've been able to make a figure using arrange_ggsurvplots, however, the output is very cramped appearing. How can I format the plots (e.g. change plot dimensions, font size) so that the plots in the output are appropriately sized and readable?
Appreciate your time and help.
segst_g_survfit <- survfit(Surv(time/30.4375, statuslife) ~ segst_g, data = good_data)
ehdz_survfit <- survfit(Surv(time/30.4375, statuslife) ~ Extrahepaticdz, data = good_data)
prioranyldt_survfit <- survfit(Surv(time/30.4375, statuslife) ~ prioranyldt, data = good_data)
hepaticresect <- survfit(Surv(time/30.4375, statuslife) ~ hepaticresection, data = good_data)
tsize_g_survfit <- survfit(Surv(time/30.4375, statuslife) ~ tsize_g, data = good_data)
segstx_g_survfit <- survfit(Surv(time/30.4375, statuslife) ~ segstx_g, data = good_data)
splots <- list()
splots[[1]] <- ggsurvplot(
fit = hepaticresect,
legend = c(0.75,0.8),
legend.labs=c("No Resection","Prior Resection"),
title ="OS vs Prior Hepatic Resection",
pval=TRUE,
pval.coord = c(4,0),
xlab = "Months",
ylab = "Survival Probability", conf.int=FALSE) + labs(tag='A')
splots[[2]] <- ggsurvplot(
fit = prioranyldt_survfit,
legend = c(0.75,0.8),
legend.labs=c("Prior Liver Directed Therapy","No Prior Liver Directed Therapy"),
title ="OS vs Prior Liver Directed Therapy",
pval=TRUE,
pval.coord = c(4,0),
xlab = "Months",
ylab = "Survival probability", conf.int=FALSE) + labs(tag='B')
splots[[3]] <- ggsurvplot(
fit = segst_g_survfit,
legend = c(0.75,0.8),
legend.labs=c("Greater than 2 segments","Less than or equal to 2 segments"),
title ="OS vs Number Segments with Tumor",
pval=TRUE,
pval.coord = c(4,0),
xlab = "Months",
ylab = "Survival Probability", conf.int=FALSE) + labs(tag='C')
splots[[4]] <- ggsurvplot(
fit = segstx_g_survfit,
legend = c(0.75,0.8),
legend.labs=c("Greater than 3 segments","Less than or equal to 3 segments"),
title ="OS vs Hepatic Segments Treated",
pval=TRUE,
pval.coord = c(4,0),
xlab = "Months",
ylab = "Survival Probability", conf.int=FALSE) + labs(tag='D')
splots[[5]] <- ggsurvplot(
fit = tsize_g_survfit,
pval=TRUE,
pval.coord = c(4,0),
title ="OS vs Largest Tumor Size",
legend = c(0.75,0.8),
legend.labs=c("Greater than 4 cm","Less than or equal to 4 cm"),
xlab = "Months",
ylab = "Survival probability", conf.int=FALSE) + labs(tag='E')
splots[[6]] <- ggsurvplot(
fit = ehdz_survfit,
legend = c(0.75,0.8),
legend.labs=c("No Extrahepatic Disease","Extrahepatic Disease"),
title ="OS vs Extrahepatic Disease",
pval=TRUE,
pval.coord = c(4,0),
xlab = "Months",
ylab = "Survival Probability", conf.int=FALSE) + labs(tag='F')
arrange_ggsurvplots(splots,print=TRUE,ncol=2,nrow=3)
Cramped output:

Try piping the output to ggsave, and change the size to the proportions you like
myplot %>% ggsave(device="png", filename="Figure.png", width = 15, height = 5, units = "in")
I think what you are experiencing is simply the display in rstudio being a little cramped.

Related

Non-numeric argument to binary operator in R: Survival Analysis

I'm not sure why I get this error. I get the graph after, sure, but I don't know what is causing the error.
plot(survfit(Surv(time,DEATH_EVENT) ~ hypertension, data=HF), main = "Hypertension Survival Distributions", xlab = "Length of Survival",ylab="Probability of Survival",col=c("blue","red")) +
legend("topright", legend=c("Absent", "Present"),fill=c("blue","red"),bty="n")
Error in plot(survfit(Surv(time, DEATH_EVENT) ~ hypertension, data = HF), :
non-numeric argument to binary operator
This, however, works wonders:
ggsurvplot(survfit(Surv(time,DEATH_EVENT) ~ hypertension, data=HF),
data = HF,
censor.shape="|",
conf.int = FALSE,
ggtheme = theme_bw())
If you alter some of your parameters, it should work as expected:
plot(survfit(Surv(time, DEATH_EVENT) ~ hypertension, data = HF), main = "Hypertension Survival Distributions", xlab = "Length of Survival", ylab = "Probability of Survival", col = c("blue","red"))
legend(x = 1, y = 1, legend = c("Absent", "Present"), col = c("blue","red"), lty = 1)
NB. change legend(x = 1 to "whatever the max x axis value is", e.g. legend(x = 1000 to place the legend in the top right.

Is there a way to find the equation of the density curve that you plot in R?

For context, my code looks like the following right now:
library(readr)
library(fitdistrplus)
newdata <- read_csv("Downloads/newctdata - Sheet1.csv")
hist(newdata$Mean, prob = TRUE, xlab = "Mean Duration of Asymptomatic Infection in Women", ylab = "Frequency", main = "Histogram of Mean Duration of Asymptomatic Infection", col = "steelblue", breaks = 12)
lines(density(newdata$Mean), col = 5, lwd = 4)

Boxplot disappears when adjusting y lim

I want my boxplot to have the y axis to go from 0 to 20,000 , but when I add the argument ylim = c(0,20000), my entire boxplot disappears.
Here is my code:
bp.gender <- boxplot(
music_data.frame$Income ~ music_data.frame$Gender,
xlab = "Gender",
ylab = "Income",
main = "Income distribution for Gender",
col = "red",
ylim = c(0,20000)

how can i make the circles of my plot smaller in R?

This is the code I used:
resources <- read.csv("https://raw.githubusercontent.com/umbertomig/intro-prob-stat-FGV/master/datasets/resources.csv")
res <- subset(resources, select = c("cty_name", "year", "regime",
"oil", "logGDPcp", "illit"))
resNoNA <- na.omit(res)
resNoNAS <- scale(resNoNA[, 3:6])
colMeans(resNoNA[, 3:6])
apply(resNoNA[, 3:6], 2, sd)
cluster2 <- kmeans(resNoNAS, centers = 2)
table(cluster2$cluster)
## this gives standardized answer, which is hard to interpret
cluster2$centers
## better to subset the original data and then compute means
g1 <- resNoNA[cluster2$cluster == 1, ]
colMeans(g1[, 3:6])
g2 <- resNoNA[cluster2$cluster == 2, ]
colMeans(g2[, 3:6])
plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita", ylab = "Illiteracy",
col = cluster2$cluster, cex = resNoNA$oil)
but I wanted to make the circles smaller in order to fit within the limits of the graph
You control the circle diameter with cex= here.
plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita", ylab = "Illiteracy",
col = cluster2$cluster, cex = resNoNA$oil)
plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita", ylab = "Illiteracy",
col = cluster2$cluster, cex = resNoNA$oil/3)
plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita", ylab = "Illiteracy",
col = cluster2$cluster, cex = resNoNA$oil/5)
Realize, however, that if you are using this in some automated report generator (e.g., rmarkdown, shiny), then you may need to adjust the dimensions of the plot to control it from the other angle: update xlim and ylim.

How to add legend in a 3D scatterplot

I have a 3D scatter plot that looks like this
and the code associated with it is as follows
nr = c(114,114,1820,100,100)
acc = c(70.00,45.00,98.89,82.00,74.90)
ti = c(25.00,87.50,0.25,41.40,51.30)
label = c(1, 2, 3, 4, 5)
data = data.frame(nr, acc, ti, label)
library(scatterplot3d)
scatterplot3d(data$nr, data$acc, data$ti, main = "3D Plot - Requirements, Accuracy & Time", xlab = "Number of requirements", ylab = "Accuracy", zlab = "Time", pch = data$label, angle = 45)
Now, I want to add a legend to the bottom right to indicate what those symbols mean
tech <- c('BPL','W','RT','S','WSM')
For instance, the triangle stands for BPL, + for RT and so on
You can try this:
library(scatterplot3d)
# define a plot
s3d <-scatterplot3d(data$nr, data$acc, data$ti, main = "3D Plot - Requirements, Accuracy & Time",
xlab = "Number of requirements", ylab = "Accuracy", zlab = "Time", pch = data$label, angle = 45)
# add a legend
legend("topright",s3d$xyz.convert(18, 0, 12), pch = data$label, yjust=0,
# here you define the labels in the legend
legend = c('BPL','W','RT','S','WSM'), cex = 1.1
)

Resources