I am involved in a project where we are plotting survival curves for an event with a pretty low incidence, and the Kaplan-Meier curves (plotted using survminer) are pretty flat. I do not want to simply zoom in on the Y-axis as I think the incidence rates may then be misinterpreted by the reader. One way to show both the 'true' rate and zoom in on eventual small differences is to do it as NEJM does it:
https://www.nejm.org/na101/home/literatum/publisher/mms/journals/content/nejm/2011/nejm_2011.364.issue-9/nejmoa1007432/production/images/img_large/nejmoa1007432_f1.jpeg.
I have, however, not found a way to do this directly in survminer. For reproducibility's sake, I would like to avoid involving any Adobe software.
Does anyone know a way to get a small, zoomed in version included on top of the original graph? I would like to accomplish this with survminer but tips on any other good ggplot-based KM packages are appreciated.
Small example:
library(survival)
library(survminer)
df <- genfan
df$treat<-sample(c(0,1),nrow(df),replace=TRUE)
fit <- survfit(Surv(hours, status) ~ treat, data = df)
p <- ggsurvplot(fit, data = df, risk.table = TRUE, fun = 'event', ylim = c(0, 1))
p # Normal flat, singular graph
There are a few ways to do this but one suggestion is too make the two plots you have and arrange them with grid.arrange. First make the two plots. Then pull out the risk table and plot separately for the first plot (you cannot put a ggsurvplot object in a grid.arrange). Nest the second plot in plot one with a annotation_custom. Finally, use layout_matrix to specify the dimensions of your plot and put it back together with grid.arrange.
library(survival)
library(survminer)
library(grid)
library(gridExtra)
df <- genfan
df$treat<-sample(c(0,1),nrow(df),replace=TRUE)
fit <- survfit(Surv(hours, status) ~ treat, data = df)
p <- ggsurvplot(fit, data = df, risk.table = TRUE, fun = 'event', ylim = c(0, 1))
#zoomed plot and remove risk table
g <- ggsurvplot(fit, data = df, risk.table = FALSE, fun = 'event', ylim = c(0, .5))
risktab <- p$table
justplot <- p$plot
p2 <- justplot +
annotation_custom(grob = ggplotGrob(g$plot+
theme(legend.position = "none")),
xmin = 60,xmax=Inf,ymin = .5,ymax = Inf)
lay <- rbind(c(1,1),
c(1,1),
c(2,2))
gridExtra::grid.arrange(p2, risktab,
#use layout matrix to set sizes
layout_matrix=lay
)
Related
We are studying different types of clusterization in R, and for the final task we need to show different graphs together. The ones I chose seem to be of different format, and I tried to look around for possible solutions to show them together, but came up short, probably due to my lack of experience (tried a bunch of options I found in similar questions, but nothing quite worked for me, or maybe I did something wrong).
I understand that there are roundabout ways to do this (which, according to some people, are way less troublesome too), and the prof is actually fine with me choosing only the graphs that are easily joined together, but at this point I want to satisfy my idle curiosity and ask whether there's a way to combine my graphs in R. I apologize if this was answered before, but I would be very grateful for assistance.
The graphs I randomly chose are:
clust1 = autoplot(fanny(xo, 5))
clust2 = autoplot(x_pca, data = x, colour = "region", loadings = TRUE, loadings.label = TRUE, frame = TRUE, frame.type = "norm")
clust3 = rpart.plot(x_tree)
clust4 = plot(as.phylo(x_clust), type = "fan", tip.color = colors(ct))
The first two I easily combined with grid.arrange, but both trees are giving me trouble. Thank you in advance for any help!
You could use the package cowplot which is very handy to combine multiple plots from base R and ggplot. Use the recordPlot function after your r base plot and use the plot_grid function to combine all the plots. Here is an example using the mtcars dataset:
library(ggplot2)
library(cowplot)
library(gridGraphics)
plot(mtcars$cyl, mtcars$disp)
p1 <- recordPlot()
p2 <- ggplot(mtcars) + geom_boxplot(aes(gear, disp, group = gear))
p3 <- ggplot(mtcars) + geom_smooth(aes(disp, qsec))
p4 <- ggplot(mtcars) + geom_bar(aes(carb))
plot_grid(p1, p4, p2, p3, labels = 'AUTO')
Output:
Here is a way to combine base R graphics with ggplot2 graphics. It is directly inspired in this answer by user Ricardo Saporta.
It uses grid graphics to create 2 view ports and prints the ggplot2 plots in those view ports.
library(ggplot2)
library(ggfortify)
library(rpart.plot)
#> Loading required package: rpart
library(cluster)
library(ape)
library(grid)
data(ruspini, package = "cluster")
data(ptitanic, package = "rpart.plot")
data(bird.orders, package = "ape")
old_par <- par(mfrow = c(2, 2))
vp.BottomLeft <- viewport(height=unit(.5, "npc"), width=unit(0.5, "npc"),
just=c("right","top"),
y=0.5, x=0.5)
vp.BottomRight <- viewport(height=unit(.5, "npc"), width=unit(0.5, "npc"),
just=c("left","top"),
y=0.5, x=0.5)
clust1 <- autoplot(fanny(ruspini, 4))
x_pca <- prcomp(iris[1:4], scale. = TRUE)
clust2 <- autoplot(x_pca, data = iris, loadings = TRUE, loadings.label = TRUE, frame = TRUE, frame.type = "norm")
binary.model <- rpart(survived ~ ., data = ptitanic, cp = .02)
clust3 <- rpart.plot(binary.model)
hc <- as.hclust(bird.orders)
ct <- length(hc$labels)
clust4 <- plot(as.phylo(hc), type = "fan", tip.color = colors(ct))
print(clust1, vp = vp.BottomLeft)
print(clust2, vp = vp.BottomRight)
par(old_par)
Created on 2022-03-19 by the reprex package (v2.0.1)
Consider the code below to fit a GAM:
library(mgcv)
x1=runif(200)
x2=runif(200)
y=sin(x1)+x2^2+rnorm(200)
m = gam(y~s(x1)+s(x2))
Now using plot(m) plots smooth terms plots separately, so to merge plots I've found this code:
par(mfrow=c(1,2))
plot(m)
The plotted graph looks like this:
However I can not change the options of each plot individually, e.g. setting main="plot" changes both plots titles and I need to title each plot differently. How can I change set options of each plot separately?
You can use the select argument of plot.gam:
select: Allows the plot for a single model term to be selected for
printing. e.g. if you just want the plot for the second
smooth term set select=2.
For example:
library(mgcv)
df <- data.frame(x1 = runif(200), x2 = runif(200))
df <- transform(df, y = sin(x1) + x2^2 + rnorm(200))
m <- gam(y~s(x1)+s(x2), data = df)
layout(matrix(1:2, ncol = 2))
plot(m, select = 1, main = "First smooth")
plot(m, select = 2, main = "Second smooth")
layout(1)
The resulting plot is shown below
I am working with a data frame called d in R. I want to plot a scatter plot using two of the columns, include a best-fit regression line, and also plot binned means.
I have calculated the centers of the bins and binned means, and included those as columns in the data frame.
I can make the scatter plot and regression line work, but cannot get the binned means to show up. Using the code below I get no errors, but the panel.points function does not show up.
scatter.Epsilon <- xyplot(Epsilon ~ data.subset.UpdatedVS30.091015,
data = d,
grid = TRUE,
scales = list(x = list(log = 10)),
xlab = "Vs30 (m/s)",
ylab = "Epsilon",
ylim = c(-4, 3),
xlim = c(10^2,10^3.4),
subscripts = TRUE,
panel=function(x,y,subscripts,...) {
panel.xyplot(x,y)
panel.abline(mod <- lm(y ~ x), col = 'black')
panel.points(d$bin.ep[subscripts], d$means.ep[subscripts],
col = 'red')})
scatter.Epsilon
A simplified data set would be:
dist <- rnorm(10,4,100)
x <- seq(1,100)
bin <-rep(50,100)
mean <- rep(mean(dist),100)
d <- data.frame(x,dist,bin,mean)
where dist ~ x is the scatterplot component, and mean represents the binned mean for data points between 1-100, and bin is the bin's center (at 50). I want to add one point at (bin, mean) on top of dist ~ x. My real data set has multiple bins and means based on data.subset.UpdatedVS30.091015 that I want to add on top of Epsilon ~ data.subset.UpdatedVS30.091015.
I think you might be trying to do too much work in the call to panel.points. Using your example data, this code works fine:
scatter.Epsilon <- xyplot(dist ~ x,
data = d,
grid = TRUE,
subscripts = TRUE,
panel=function(x,y,subscripts,...) {
panel.xyplot(x,y)
panel.abline(mod <- lm(y ~ x), col = 'black')
panel.points(bin,mean,col = 'red')})
and plots a red point right where it should be. Have you tried just
panel.points(bin.ep,means.ep,col='red')
There is no grouping variable in your formula, so no need for subscripts.
I am quite new to R programming and have been given the task of representing some data in a boxplot. We were only provided the five figure summary of the data, i.e the lowest value, lower quartile,median,upper quartile,highest value. We are also told the amount of samples (n).
I read bxp was a function similar to boxplot but drew the boxplot based upon this five figure summary.
However, I know varwidth can be used to change the width of boxes proportionate to N, yet it does not seem to work here as all boxes are the same length. This is what I need help with.
MORSEYear1 <- c(18.2,58.5,64.4,73.4,91.1)
MORSEYear2 <- c(22.3,56.4,64.3,75.7,97.4)
MORSEYear3 <- c(29.1,57.9,66.6,73.4,86.0)
MathStatYear1 <- c(46.8,54.8,66.1,71.4,84.1)
MathStatYear2 <- c(35.1,47.8,57.8,65.7,82.8)
MathStatYear3 <- c(32.6,56.3,61.1,75.6,89.4)
MORSE1<-list(stats=matrix(MORSEYear1,MORSEYear1[5],MORSEYear1[1]), n=139)
MORSE2<-list(stats=matrix(MORSEYear2,MORSEYear2[5],MORSEYear2[1]), n=132)
MORSE3<-list(stats=matrix(MORSEYear3,MORSEYear3[5],MORSEYear3[1]), n=131)
MS1 <- list(stats=matrix(MathStatYear1,MathStatYear1[5],MathStatYear1[1]), n= 21)
MS2 <- list(stats=matrix(MathStatYear2,MathStatYear2[5],MathStatYear2[1]), n=20)
MS3 <- list(stats=matrix(MathStatYear3,MathStatYear3[5],MathStatYear3[1]), n= 14)
bxp(MORSE1, xlim = c(0.5,6.5),ylim = c(0,100),varwidth= TRUE, main = "Graph comparing distribution of marks across different years of MORSE and MathStat",ylab = "Marks", xlab = "Course and year of study (Course,Year)", axes = FALSE)
par(new=T)
bxp(MORSE2, xlim = c(-0.5,5.5), ylim = c(0,100),axes= TRUE, varwidth=TRUE)
par(new=T)
bxp(MORSE3, xlim = c(-1.5,4.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS1, xlim = c(-2.5,3.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS2, xlim = c(-3.5,2.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS3, xlim = c(-4.5,1.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
NOTE: My supervisor said to use par(new=T) and change the xlim to plot multiple graphs using bxp(), if someone could verify if this is the best method or not that would be great!
Thanks
Stumbled upon the same problem, without much experience with R.
The varwidth argument of the bxp() function requires multiple boxplots being plotted at once. Adding to an initial plot does not count, as no readjustment is possible after the fact.
The question is how to construct a multidimensional z argument for bxp(). To answer this, a look at the result of something like boxplot(c(c(1,1),c(2,2))~c(c(11,11),c(22,22))) helps.
First, a generic example with made-up data to aid anyone that lands here:
# data
d1 <- c(1,2,3,4,5)
d2 <- c(1,2,3,5,8,13,21,34)
# summaries (generated with quantile and structured accordingly)
z1 <- list(
stats=matrix(quantile(d1, c(0.05,0.25,0.5,0.75,0.85))),
n=length(d1)
)
z2 <- list(
stats=matrix(quantile(d2, c(0.05,0.25,0.5,0.75,0.85))),
n=length(d2)
)
# merging the summaries appropriately
z <- list(
stats=cbind(z1$stats,z2$stats),
n=c(z1$n,z2$n)
)
# check result
print(z)
# call bxp with needed parameters ("at" can/should also be used here)
bxp(z=z,varwidth=TRUE)
In the case of the original question, one should merge MORSE# and MS#. The code is far from optimal - there might be a better way to merge and a function for this can be written, but the aim is ugly clarity and simplicity:
z <- list(
stats=cbind(MORSE1$stats, MORSE2$stats, MORSE3$stats, M1$stats, M2$stats, M3$stats),
n=c(MORSE1$stats, MORSE2$n, MORSE3$n, M1$n, M2$n, M3$n)
)
I am using following commands to produce a scatterplot with jitter:
ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5],500,replace=T))
library(lattice)
stripplot(NUMS~GRP,data=ddf, jitter.data=T)
I want to add boxplots over these points (one for every group). I tried searching but I am not able to find code plotting all points (and not just outliers) and with jitter. How can I solve this. Thanks for your help.
Here's one way using base graphics.
boxplot(NUMS ~ GRP, data = ddf, lwd = 2, ylab = 'NUMS')
stripchart(NUMS ~ GRP, vertical = TRUE, data = ddf,
method = "jitter", add = TRUE, pch = 20, col = 'blue')
To do this in ggplot2, try:
ggplot(ddf, aes(x=GRP, y=NUMS)) +
geom_boxplot(outlier.shape=NA) + #avoid plotting outliers twice
geom_jitter(position=position_jitter(width=.1, height=0))
Obviously you can adjust the width and height arguments of position_jitter() to your liking (although I'd recommend height=0 since height jittering will make your plot inaccurate).
I've written an R function called spreadPoints() within a package basiclotteR. The package can be directly installed into your R library using the following code:
install.packages("devtools")
library("devtools")
install_github("JosephCrispell/basicPlotteR")
For the example provided, I used the following code to generate the example figure below.
ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5],500,replace=T))
boxplot(NUMS ~ GRP, data = ddf, lwd = 2, ylab = 'NUMS')
spreadPointsMultiple(data=ddf, responseColumn="NUMS", categoriesColumn="GRP",
col="blue", plotOutliers=TRUE)
It is a work in progress (the lack of formula as input is clunky!) but it provides a non-random method to spread points on the X axis that doubles as a violin like summary of the data. Take a look at the source code, if you're interested.
For a lattice solution:
library(lattice)
ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5], 500, replace = T))
bwplot(NUMS ~ GRP, ddf, panel = function(...) {
panel.bwplot(..., pch = "|")
panel.xyplot(..., jitter.x = TRUE)})
The default median dot symbol was changed to a line with pch = "|". Other properties of the box and whiskers can be adjusted with box.umbrella and box.rectangle through the trellis.par.set() function. The amount of jitter can be adjusted through a variable named factor where factor = 1.5 increases it by 50%.