annotate boxplot in ggplot2 - r

I've created a side-by-side boxplot using ggplot2.
p <- ggplot(mtcars, aes(x=factor(cyl), y=mpg))
p + geom_boxplot(aes(fill=factor(cyl)))
I want to annotate with min, max, 1st quartile, median and 3rd quartile in the plot. I know geom_text() can do so and may be fivenum() is useful. But I cannot figure out how exactly I can do!. These values should be displayed in my plot.

The most succinct way I can think of is to use stat_summary. I've also mapped the labels to a color aesthetic, but you can, of course, set the labels to a single color if you wish:
ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) +
geom_boxplot(width=0.6) +
stat_summary(geom="text", fun.y=quantile,
aes(label=sprintf("%1.1f", ..y..), color=factor(cyl)),
position=position_nudge(x=0.33), size=3.5) +
theme_bw()
In the code above we use quantile as the summary function to get the label values. ..y.. refers back to the output of the quantile function (in general, ..*.. is a ggplot construction for using values calculated within ggplot).

One way is to simply make the data.frame you need, and pass it to geom_text or geom_label:
library(dplyr)
cyl_fivenum <- mtcars %>%
group_by(cyl) %>%
summarise(five = list(fivenum(mpg))) %>%
tidyr::unnest()
ggplot(mtcars, aes(x=factor(cyl), y=mpg)) +
geom_boxplot(aes(fill=factor(cyl))) +
geom_text(data = cyl_fivenum,
aes(x = factor(cyl), y = five, label = five),
nudge_x = .5)

In case anyone is dealing with large ranges and has to log10 transform their y-axis, I found some code that works great. Just add 10^..y.. and scale_y_log10(). If you don't add 10^ before ..y.. the actual quantile values will be log transformed and displayed as such.
Does not work
ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) +
geom_boxplot(width=0.6) +
stat_summary(geom="text", fun.y=quantile,
aes(label=sprintf("%1.1f", ..y..), color=factor(cyl)),
position=position_nudge(x=0.45), size=3.5) +
scale_y_log10()+
theme_bw()
Works great
ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) +
geom_boxplot(width=0.6) +
stat_summary(geom="text", fun.y=quantile,
aes(label=sprintf("%1.1f", 10^..y..), color=factor(cyl)),
position=position_nudge(x=0.45), size=3.5) +
scale_y_log10()+
theme_bw()

Related

add geom vline and ggplotly to a facet grid

is there any way i could add a vertical line in both the plot at x=15. and also add a plotly to this. i tried but it doesn't seem to work. Thanks
sbucks_new %>%
ggplot(aes(x= category, y= bad_fat, color= category)) +
geom_boxplot() +
coord_flip() +
facet_grid(~ milk_dummy)+
labs(title= "Unhealthy Fats in Milk drinks by Category",
x= "Drinks Category",
y="Bad Fats (g)") +
theme_bw()
Use geom_hline for plotting the vertical line (confusing due to the coord_flip).
Here's an example with mtcars:
p <- ggplot(mtcars, aes(x=factor(carb), y=disp)) +
geom_boxplot() +
facet_wrap(~am2) +
geom_hline(aes(yintercept=300)) +
coord_flip()
Not sure about your other question, but you can quickly convert ggplot object into plotly using ggplotly function.
plotly::ggplotly(p)

Labelling plots arranged with grid.arrange

I have attached multiple plots to one page using grid.arrange.
Is there a way to label each plot with "(a)","(b)" etc...
I have tried using geom_text but it does not seem compatible with my plots....
.... as you can see, geom_text has some strange interaction with my legend symbols.
I will show an example using the mtcars data of what I am trying to achieve. THe alternative to geom_text I have found is "annotate" which does not interact with my legend symbols. However, it is not easy to label only one facet....
q1=ggplot(mtcars, aes(x=mpg, y=wt)) +
geom_line() +
geom_point()+
facet_grid(~cyl)+
annotate(geom="text", x=15, y=12, label="(a)",size=8,family="serif")
q2=ggplot(mtcars, aes(x=mpg, y=wt,)) +
geom_line() +
geom_point()+
facet_grid(~cyl)+
annotate(geom="text", x=15, y=12, label="(b)",size=8,family="serif")
geom_text(x=15, y=5,size=8, label="(b)")
gt1 <- ggplotGrob(q1)
gt2 <- ggplotGrob(q2)
grid.arrange(gt1,gt2, ncol=1)
Therefore, my question is, is there a way to label plots arranged using grid.arrange, so that the first facet in each plot is labelled with either a, or b or c etc...?
You can use ggarrange from ggpubr package and set labels for each plot using the argument labels:
library(ggplot2)
library(ggpubr)
q1=ggplot(mtcars, aes(x=mpg, y=wt)) +
geom_line() +
geom_point()+
facet_grid(~cyl)+
annotate(geom="text", x=15, y=12, label="(a)",size=8,family="serif")
q2=ggplot(mtcars, aes(x=mpg, y=wt,)) +
geom_line() +
geom_point()+
facet_grid(~cyl)+
annotate(geom="text", x=15, y=12, label="(b)",size=8,family="serif")
ggarrange(q1,q2, ncol = 1, labels = c("a)","b)"))
Is it what you are looking for ?
If you set inherit.aes=FALSE, you can prevent it from interring:
ggplot(mtcars, aes(x=mpg, y=wt,col=factor(cyl))) +
geom_line() +
geom_point()+
geom_text(inherit.aes=FALSE,aes(x=15,y=12,label="(a)"),
size=8,family="serif")+
facet_grid(~cyl)
If you want to only label the first facet (hope I got you correct), I think the easiest way to specify a data frame, e.g if we want only something in the first,
#place it in the first
lvl_data = data.frame(
x=15,y=12,label="(a)",
cyl=levels(factor(mtcars$cyl))[1]
)
ggplot(mtcars, aes(x=mpg, y=wt,col=factor(cyl))) +
geom_line() +
geom_point()+
geom_text(data=lvl_data,inherit.aes=FALSE,
aes(x=x,y=y,label=label),size=8,family="serif")+
facet_grid(~cyl)

Calculating means with stat_summary for two different groupings and plotting in one plot

I am having issues with plotting two calculated means using stat_summary in the same figure.
I am using ggplot and stat_summary to plot means of a dataset that I grouped based on variable A. Variable A can have value 1,2,3,4. The same data also have variable B that can have value 1,2.
So, I can make a plot with means of the data grouped after variable A, and I get 4 lines.
I can also make a plot with means of the data grouped after variable B, where I get 2 lines.
But how can I plot them in the same figure, so that I get 6 lines? I have made a somewhat similar example using the mtcars dataset:
library(ggplot2)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars
plot1 <- ggplot(mtcars, aes(x=gear, y=hp, color=cyl, fill=cyl)) +
stat_summary(geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(geom='line', fun.y = mean, size=1)
plot1
plot2 <- ggplot(mtcars, aes(x=gear, y=hp, color=vs, fill=vs)) +
stat_summary(geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(geom='line', fun.y = mean, size=1)
plot2
So far I have the impression, that since I start with ggplot(xxx), where xxx defines the data and grouping, I can't combine it with another ggplot with another grouping. If I could initiate ggplot() without defining anything in the argument, but only defining data and grouping in the argument for stat_summary, I feel like that would be the solution. But I can't figure out how to use stat_summary like that, if even possible.
You can just add more layers, defining the aes for each seperately:
ggplot(mtcars) +
stat_summary(aes(x=gear, y=hp, color=paste('cyl:', cyl), fill = paste('cyl:', cyl)), geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(aes(x=gear, y=hp, color=paste('cyl:', cyl)), geom='line', fun.y = mean, size=1) +
stat_summary(aes(x=gear, y=hp, color=paste('vs:', vs), fill=paste('vs:', vs)), geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(aes(x=gear, y=hp, color=paste('vs:', vs)), geom='line', fun.y = mean, size=1)

How to place multiple boxplots in the same column with ggplot(geom_boxplot)

I would like to built a boxplot in which the 4 factors (N1:N4) are overlaid in the same column. For example with the following data:
df<-data.frame(N=N,Value=Value)
Q<-c("C1","C1","C2","C3","C3","C1","C1","C2","C2","C3","C3","Q1","Q1","Q1","Q1","Q3","Q3","Q4","Q4","Q1","Q1","Q1","Q1","Q3","Q3","Q4","Q4")
N<-c("N2","N3","N3","N2","N3","N2","N3","N2","N3","N2","N3","N0","N1","N2","N3","N1","N3","N0","N1","N0","N1","N2","N3","N1","N3","N0","N1")
Value<-c(4.7,8.61,8.34,5.89,8.36,1.76,2.4,5.01,2.12,1.88,3.01,2.4,7.28,4.34,5.39,11.61,10.14,3.02,9.45,8.8,7.4,6.93,8.44,7.37,7.81,6.74,8.5)
with the following (usual) code, the output is 4 box-plots displayed in 4 columns for the 4 variables
ggplot(df, aes(x=N, y=Value,color=N)) + theme_bw(base_size = 20)+ geom_boxplot()
many thanks
Updated Answer
Based on your comment, here's a way to add marginal boxplots. We'll use the built-in mtcars data frame.
First, some set-up:
library(cowplot)
# Common theme elements
thm = list(theme_bw(),
guides(colour=FALSE, fill=FALSE),
theme(plot.margin=unit(rep(0,4),"lines")))
Now, create the three plots:
# Main plot
p1 = ggplot(mtcars, aes(wt, mpg, colour=factor(cyl), fill=factor(cyl))) +
geom_smooth(method="lm") + labs(colour="Cyl", fill="Cyl") +
scale_y_continuous(limits=c(10,35)) +
thm[-2] +
theme(legend.position = c(0.85,0.8))
# Top margin plot
p2 = ggplot(mtcars, aes(factor(cyl), wt, colour=factor(cyl))) +
geom_boxplot() + thm + coord_flip() + labs(x="Cyl", y="")
# Right margin plot
p3 = ggplot(mtcars, aes(factor(cyl), mpg, colour=factor(cyl))) +
geom_boxplot() + thm + labs(x="Cyl", y="") +
scale_y_continuous(limits=c(10,35))
Lay out the plots and add the legend:
plot_grid(plotlist=list(p2, ggplot(), p1, p3), ncol=2,
rel_widths=c(5,1), rel_heights=c(1,5), align="hv")
Original Answer
You can overlay all four boxplots in a single column, but the plot will be unreadable. The first example below removes N as the x coordinate, but keeps N as the colour aesthetic. This results in the four levels of N being plotted at a single tick mark (which I've removed by setting breaks to NULL). However, the plots are still dodged. To plot them one on top of the other, set the dodge width to zero, as I've done in the second example. However, the plots are not readable when they are overlaid.
ggplot(df, aes(x="", y=Value,color=N)) +
theme_bw(base_size = 20) +
geom_boxplot() +
scale_x_discrete(breaks=NULL) +
labs(x="")
ggplot(df, aes(x="", y=Value,color=N)) +
theme_bw(base_size = 20) +
geom_boxplot(position=position_dodge(0)) +
scale_x_discrete(breaks=NULL) +
labs(x="")

geom_point plot with only number without circles

In ggplot in R, is it possible to plot each point with a unique number but without circles surrounded? I tried to use color "white" but it doesn't work.
I would recommend geom_text.
set.seed(101)
dd <- data.frame(x=rnorm(50),y=rnorm(50),id=1:50)
library(ggplot2)
ggplot(dd,aes(x,y))+geom_text(aes(label=id))
I'll show how to do it with geom_text and/or geom_point.
Using geom_text (recommended)
For this example I'll use the built-in dataset mtcars and let's pretend the numbers you want to display are the weights (wt) variable:
data(mtcars)
p <- ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars)))
p + geom_text(aes(label = wt),
parse = TRUE)
or if you want an example with truly unique numbers, we can just make up an index using seq:
data(mtcars)
p <- ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars)))
p + geom_text(aes(label = seq(1:32)),
parse = TRUE)
Using geom_point
While it would require more work, it actually is possible to do this with geom_point.
This is a reference image of some of the shapes you can use with geom_point:
As you can see, shapes 48 to 57 are 0 to 9. You can leverage these shapes (and combinations of them to form an infinite amount of numbers) via geom_point like this:
d=data.frame(p=c(48:57))
ggplot() +
scale_y_continuous(name="") +
scale_x_continuous(name="") +
scale_shape_identity() +
geom_point(data=d, mapping=aes(x=p%%16, y=p%/%16, shape=p), size=5, fill="red")
Finally, a trivial example using mtcars + geom_point with arbitrary numbers:
d=data.frame(p=c(48:57,48:57,48:57,48,49))
attach(mtcars)
ggplot(mtcars) +
scale_y_continuous(name="") +
scale_x_continuous(name="") +
scale_shape_identity() +
geom_point(data=d, mapping=aes(x=wt, y=mpg, shape=p), size=5, fill="red")

Resources