Drawing rectangles on boxplot (ggplot2 in R) - r

Using the mtcars data as an example, I generated a boxplot and would like to add rectangles. Here is my full code.
library(ggplot2)
d=data.frame(x1=c(1,3,1,5,4), x2=c(2,4,3,6,6), y1=c(10,10,20,14,30), y2=c(15,20,25,18,35), t=c('a','a','a','b','b'))
ggplot(mtcars, aes(x = as.factor(mtcars$carb), y = mpg)) + geom_boxplot() + geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=t), color="black", alpha=0.5)
However, this does not work due to an aesthetics issue. I do not understand why, because each of the two above parts work separately, so:
#part 1 (works)
ggplot(mtcars, aes(x = as.factor(mtcars$carb), y = mpg)) + geom_boxplot()
#part 2 (works)
ggplot() + geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=t), color="black", alpha=0.5)
I would appreciate any suggestions. Thank you.

Here's an example of how this could work. The important thing is that ggplot expects all the layers' x-axes to be either continuous or discrete, not a mix. (And same for the y-axes.)
In your example, the boxplot x axis is on a discrete (aka ordinal) scale, like if you had one location for "orange" and another for "pineapple." But the rect is defined on a continuous scale, like 1, 2, 3. ggplot typically requires you to pick one kind or the other; if necessary you can coerce one into the other but you'd need to define how. i.e. Is "2" on the left or on the right of "pineapple"?
So for this to work, you can't feed the geom_boxplot layer a factor for the x-axis, at least without converting it to numeric somehow. Here I just leave it as the original number it started as, and add a group = carb term so that we get a boxplot for every carb value, not all of them in total in one group.
ggplot(mtcars) +
geom_boxplot(aes(x = carb, y = mpg, group = carb)) +
geom_rect(data=d, aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=t), color="black", alpha=0.5)

Related

Calculating means with stat_summary for two different groupings and plotting in one plot

I am having issues with plotting two calculated means using stat_summary in the same figure.
I am using ggplot and stat_summary to plot means of a dataset that I grouped based on variable A. Variable A can have value 1,2,3,4. The same data also have variable B that can have value 1,2.
So, I can make a plot with means of the data grouped after variable A, and I get 4 lines.
I can also make a plot with means of the data grouped after variable B, where I get 2 lines.
But how can I plot them in the same figure, so that I get 6 lines? I have made a somewhat similar example using the mtcars dataset:
library(ggplot2)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars
plot1 <- ggplot(mtcars, aes(x=gear, y=hp, color=cyl, fill=cyl)) +
stat_summary(geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(geom='line', fun.y = mean, size=1)
plot1
plot2 <- ggplot(mtcars, aes(x=gear, y=hp, color=vs, fill=vs)) +
stat_summary(geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(geom='line', fun.y = mean, size=1)
plot2
So far I have the impression, that since I start with ggplot(xxx), where xxx defines the data and grouping, I can't combine it with another ggplot with another grouping. If I could initiate ggplot() without defining anything in the argument, but only defining data and grouping in the argument for stat_summary, I feel like that would be the solution. But I can't figure out how to use stat_summary like that, if even possible.
You can just add more layers, defining the aes for each seperately:
ggplot(mtcars) +
stat_summary(aes(x=gear, y=hp, color=paste('cyl:', cyl), fill = paste('cyl:', cyl)), geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(aes(x=gear, y=hp, color=paste('cyl:', cyl)), geom='line', fun.y = mean, size=1) +
stat_summary(aes(x=gear, y=hp, color=paste('vs:', vs), fill=paste('vs:', vs)), geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(aes(x=gear, y=hp, color=paste('vs:', vs)), geom='line', fun.y = mean, size=1)

Draw circles around points belonging to a factor level in ggplot

A previous post describes how to draw red circles around points which exceed a given value in ggplot. I would like to do the same for anomaly detection results, but instead have the circles drawn around points belonging to a given factor level.
How could I change this code to allow circles to be drawn around a given factor level?
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_point(data=mtcars[mtcars$mpg>30,],
pch=21, fill=NA, size=4, colour="red", stroke=1) +
theme_bw()
All you need is to first plot all points and then plot only the circles for the data reduced to the factor levels you want to highlight. Does this solve your problem?
ggplot() +
geom_point(data=iris, aes(Sepal.Length, Sepal.Width)) +
geom_point(data=iris[iris$Species %in% c("setosa"),], aes(Sepal.Length, Sepal.Width),
pch=21, fill=NA, size=4, colour="red", stroke=1) +
theme_bw()
Please note that I changed the dataset, as I needed a factor in the data to show you how it works.
Let's suppose that the "factor level" you are interested in is the value 10.4 for mtcars$mpg. mtcars$mpg is a numerical vector, so you first have to convert it into a factor.
mtcars$mpg <- as.factor(mtcars$mpg)
Then you can use the same code you used previously for values greater than a limit, except that this time the condition is to belong to the factor level 10.4:
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_point(data=mtcars[mtcars$mpg %in% 10.4, ],
pch=21, fill=NA, size=4, colour="red", stroke=1) +
theme_bw()
Note that the conversion of mtcars$mpg to factor is not necessary and that the code will run on the numerical vector in the same way. I converted it since your question was about "factor level".
Note also that if you are not dealing with factor levels but simply with values matching a certain number, you can use:
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_point(data=mtcars[mtcars$mpg == 10.4, ],
pch=21, fill=NA, size=4, colour="red", stroke=1) +
theme_bw()
since you are now only testing for equality and not for appartenance.
I recently tried to use the above methods to highlight a subset of points with a factored axis. Unfortunately, the inclusion of the second subset geom_point call seemed to reorder the axis. I was able to avoid this problem by using the gghighlight package.
ggplot(mtcars, aes(x = cyl, y = mpg, color = as.factor(carb))) +
geom_point() +
gghighlight(carb == 2, use_direct_label = FALSE, unhighlighted_colour = NULL) +
geom_point(pch=21, fill=NA, size=4, colour="black", stroke=0.5)

annotate boxplot in ggplot2

I've created a side-by-side boxplot using ggplot2.
p <- ggplot(mtcars, aes(x=factor(cyl), y=mpg))
p + geom_boxplot(aes(fill=factor(cyl)))
I want to annotate with min, max, 1st quartile, median and 3rd quartile in the plot. I know geom_text() can do so and may be fivenum() is useful. But I cannot figure out how exactly I can do!. These values should be displayed in my plot.
The most succinct way I can think of is to use stat_summary. I've also mapped the labels to a color aesthetic, but you can, of course, set the labels to a single color if you wish:
ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) +
geom_boxplot(width=0.6) +
stat_summary(geom="text", fun.y=quantile,
aes(label=sprintf("%1.1f", ..y..), color=factor(cyl)),
position=position_nudge(x=0.33), size=3.5) +
theme_bw()
In the code above we use quantile as the summary function to get the label values. ..y.. refers back to the output of the quantile function (in general, ..*.. is a ggplot construction for using values calculated within ggplot).
One way is to simply make the data.frame you need, and pass it to geom_text or geom_label:
library(dplyr)
cyl_fivenum <- mtcars %>%
group_by(cyl) %>%
summarise(five = list(fivenum(mpg))) %>%
tidyr::unnest()
ggplot(mtcars, aes(x=factor(cyl), y=mpg)) +
geom_boxplot(aes(fill=factor(cyl))) +
geom_text(data = cyl_fivenum,
aes(x = factor(cyl), y = five, label = five),
nudge_x = .5)
In case anyone is dealing with large ranges and has to log10 transform their y-axis, I found some code that works great. Just add 10^..y.. and scale_y_log10(). If you don't add 10^ before ..y.. the actual quantile values will be log transformed and displayed as such.
Does not work
ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) +
geom_boxplot(width=0.6) +
stat_summary(geom="text", fun.y=quantile,
aes(label=sprintf("%1.1f", ..y..), color=factor(cyl)),
position=position_nudge(x=0.45), size=3.5) +
scale_y_log10()+
theme_bw()
Works great
ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) +
geom_boxplot(width=0.6) +
stat_summary(geom="text", fun.y=quantile,
aes(label=sprintf("%1.1f", 10^..y..), color=factor(cyl)),
position=position_nudge(x=0.45), size=3.5) +
scale_y_log10()+
theme_bw()

How to use free scales but keep a fixed reference point in ggplot?

I am trying to create a plot with facets. Each facet should have its own scale, but for ease of visualization I would like each facet to show a fixed y point. Is this possible with ggplot?
This is an example using the mtcars dataset. I plot the weight (wg) as a function of the number of miles per gallon (mpg). The facets represent the number of cylinders of each car. As you can see, I would like the y scales to vary across facets, but still have a reference point (3, in the example) at the same height across facets. Any suggestions?
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(mpg, wt)) + geom_point() +
geom_hline (yintercept=3, colour="red", lty=6, lwd=1) +
facet_wrap( ~ cyl, scales = "free_y")
[EDIT: in my actual data, the fixed reference point should be at y = 0. I used y = 3 in the example above because 0 didn't make sense for the range of the data points in the example]
It's unclear where the line should be, let's assume in the middle; you could compute limits outside ggplot, and add a dummy layer to set the scales,
library(ggplot2)
library(plyr)
# data frame where 3 is the middle
# 3 = (min + max) /2
dummy <- ddply(mtcars, "cyl", summarise,
min = 6 - max(wt),
max = 6 - min(wt))
ggplot(mtcars, aes(mpg, wt)) + geom_point() +
geom_blank(data=dummy, aes(y=min, x=Inf)) +
geom_blank(data=dummy, aes(y=max, x=Inf)) +
geom_hline (yintercept=3, colour="red", lty=6, lwd=1) +
facet_wrap( ~ cyl, scales = "free_y")

Seemingly incorrect guide color when geom_line() and geom_segment() are used together

I'm drawing some line segments on top of a plot that uses geom_line(). Surprisingly, the guide (legend) colors for geom_line() are drawn as the color of the last element I add to the plot - even if it is not the geom_line(). This looks like a bug to me, but it could be expected behavior for some reason I don't understand.
#Set up the data
require(ggplot2)
x <- rep(1:10, 2)
y <- c(1:10, 1:10+5)
fac <- gl(2, 10)
df <- data.frame(x=x, y=y, fac=fac)
#Draw the plot with geom_segment second, and the guide is the color of the segment
ggplot(df, aes(x=x, y=y, linetype=fac)) +
geom_line() +
geom_segment(aes(x=2, y=7, xend=7, yend=7), colour="red")
Whereas if I add the geom_segment first, the colors on the guide are black as I would expect:
ggplot(df, aes(x=x, y=y, linetype=fac)) +
geom_segment(aes(x=2, y=7, xend=7, yend=7), colour="red") +
geom_line()
Feature or bug? If the first, can someone explain what's happening?
Feature(ish). The guide that is drawn is a guide for linetype. But, it has to be drawn in some color to be seen. When the color is not specified by an aesthetic mapping, ggplot2 draws it in a color that is consistent with the plot. I'm speculating that the default is whatever last color was used. That is why you are seeing differences when you plot them in a different order.
However, you can control these details of the legend.
ggplot(df, aes(x=x, y=y, linetype=fac)) +
geom_line() +
geom_segment(aes(x=2, y=7, xend=7, yend=7), colour="red") +
scale_linetype_discrete(guide=guide_legend(override.aes=aes(colour="blue")))

Resources