Draw circles around points belonging to a factor level in ggplot - r

A previous post describes how to draw red circles around points which exceed a given value in ggplot. I would like to do the same for anomaly detection results, but instead have the circles drawn around points belonging to a given factor level.
How could I change this code to allow circles to be drawn around a given factor level?
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_point(data=mtcars[mtcars$mpg>30,],
pch=21, fill=NA, size=4, colour="red", stroke=1) +
theme_bw()

All you need is to first plot all points and then plot only the circles for the data reduced to the factor levels you want to highlight. Does this solve your problem?
ggplot() +
geom_point(data=iris, aes(Sepal.Length, Sepal.Width)) +
geom_point(data=iris[iris$Species %in% c("setosa"),], aes(Sepal.Length, Sepal.Width),
pch=21, fill=NA, size=4, colour="red", stroke=1) +
theme_bw()
Please note that I changed the dataset, as I needed a factor in the data to show you how it works.

Let's suppose that the "factor level" you are interested in is the value 10.4 for mtcars$mpg. mtcars$mpg is a numerical vector, so you first have to convert it into a factor.
mtcars$mpg <- as.factor(mtcars$mpg)
Then you can use the same code you used previously for values greater than a limit, except that this time the condition is to belong to the factor level 10.4:
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_point(data=mtcars[mtcars$mpg %in% 10.4, ],
pch=21, fill=NA, size=4, colour="red", stroke=1) +
theme_bw()
Note that the conversion of mtcars$mpg to factor is not necessary and that the code will run on the numerical vector in the same way. I converted it since your question was about "factor level".
Note also that if you are not dealing with factor levels but simply with values matching a certain number, you can use:
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_point(data=mtcars[mtcars$mpg == 10.4, ],
pch=21, fill=NA, size=4, colour="red", stroke=1) +
theme_bw()
since you are now only testing for equality and not for appartenance.

I recently tried to use the above methods to highlight a subset of points with a factored axis. Unfortunately, the inclusion of the second subset geom_point call seemed to reorder the axis. I was able to avoid this problem by using the gghighlight package.
ggplot(mtcars, aes(x = cyl, y = mpg, color = as.factor(carb))) +
geom_point() +
gghighlight(carb == 2, use_direct_label = FALSE, unhighlighted_colour = NULL) +
geom_point(pch=21, fill=NA, size=4, colour="black", stroke=0.5)

Related

Drawing rectangles on boxplot (ggplot2 in R)

Using the mtcars data as an example, I generated a boxplot and would like to add rectangles. Here is my full code.
library(ggplot2)
d=data.frame(x1=c(1,3,1,5,4), x2=c(2,4,3,6,6), y1=c(10,10,20,14,30), y2=c(15,20,25,18,35), t=c('a','a','a','b','b'))
ggplot(mtcars, aes(x = as.factor(mtcars$carb), y = mpg)) + geom_boxplot() + geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=t), color="black", alpha=0.5)
However, this does not work due to an aesthetics issue. I do not understand why, because each of the two above parts work separately, so:
#part 1 (works)
ggplot(mtcars, aes(x = as.factor(mtcars$carb), y = mpg)) + geom_boxplot()
#part 2 (works)
ggplot() + geom_rect(data=d, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=t), color="black", alpha=0.5)
I would appreciate any suggestions. Thank you.
Here's an example of how this could work. The important thing is that ggplot expects all the layers' x-axes to be either continuous or discrete, not a mix. (And same for the y-axes.)
In your example, the boxplot x axis is on a discrete (aka ordinal) scale, like if you had one location for "orange" and another for "pineapple." But the rect is defined on a continuous scale, like 1, 2, 3. ggplot typically requires you to pick one kind or the other; if necessary you can coerce one into the other but you'd need to define how. i.e. Is "2" on the left or on the right of "pineapple"?
So for this to work, you can't feed the geom_boxplot layer a factor for the x-axis, at least without converting it to numeric somehow. Here I just leave it as the original number it started as, and add a group = carb term so that we get a boxplot for every carb value, not all of them in total in one group.
ggplot(mtcars) +
geom_boxplot(aes(x = carb, y = mpg, group = carb)) +
geom_rect(data=d, aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=t), color="black", alpha=0.5)

ggplot2: smooth and fill

I'd like to smooth the geom_lines and fill the area between. I've tried stat_smooth() to smooth the lines, and both geom_ribbon() and geom_polygon() but without success.
Apologies for the double barrel question.
bell <- data.frame(
month = c("Launch","1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th","11th","12th"),
rate = c(0,.05,.12,.18,.34,.42,.57,.68,.75,.81,.83,.85,.87))
bell$month <- factor(bell$month, levels = rev(c("Launch","1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th","11th","12th")))
ggplot() +
theme_minimal() +
coord_flip() +
scale_fill_manual(values=cols) +
geom_line(data=bell, aes(x=month, y=.5-(rate/2), group=1), color='pink', size=1) +
geom_line(data=bell, aes(x=month, y=.5+(rate/2), group=1), color='pink', size=1) +
theme(legend.position='none', axis.ticks=element_blank(), axis.text.x=element_blank(),axis.title.x=element_blank())
One option is to calculate the points of the loess regression outside of ggplot and then plot them using geom_line (for a line) or geom_area for a filled area (geom_area is geom_ribbon, but with ymin fixed at zero).
Also, you don't need coord_flip. Instead, just switch your x and y mappings. This is necessary anyway if you want to fill underneath the curve.
In the example below I've created a numeric month variable for the regression. I've also commented out the scale_fill_manual line because your example doesn't provide a cols vector and the plot code doesn't produce a legend anyway. I've also commented out the legend.position='none' line as it's superfluous.
bell$month.num = 0:12
m1 = loess(rate ~ month.num, data=bell)
bell$loess.mod = predict(m1)
ggplot(bell, aes(y=month, group=1)) +
theme_minimal() +
#scale_fill_manual(values=cols) +
geom_area(aes(x=.5-(loess.mod/2)), fill='pink', size=1) +
geom_area(aes(x=.5+(loess.mod/2)), fill='pink', size=1) +
theme(#legend.position='none',
axis.ticks=element_blank(),
axis.text.x=element_blank(),
axis.title.x=element_blank())

How to place multiple boxplots in the same column with ggplot(geom_boxplot)

I would like to built a boxplot in which the 4 factors (N1:N4) are overlaid in the same column. For example with the following data:
df<-data.frame(N=N,Value=Value)
Q<-c("C1","C1","C2","C3","C3","C1","C1","C2","C2","C3","C3","Q1","Q1","Q1","Q1","Q3","Q3","Q4","Q4","Q1","Q1","Q1","Q1","Q3","Q3","Q4","Q4")
N<-c("N2","N3","N3","N2","N3","N2","N3","N2","N3","N2","N3","N0","N1","N2","N3","N1","N3","N0","N1","N0","N1","N2","N3","N1","N3","N0","N1")
Value<-c(4.7,8.61,8.34,5.89,8.36,1.76,2.4,5.01,2.12,1.88,3.01,2.4,7.28,4.34,5.39,11.61,10.14,3.02,9.45,8.8,7.4,6.93,8.44,7.37,7.81,6.74,8.5)
with the following (usual) code, the output is 4 box-plots displayed in 4 columns for the 4 variables
ggplot(df, aes(x=N, y=Value,color=N)) + theme_bw(base_size = 20)+ geom_boxplot()
many thanks
Updated Answer
Based on your comment, here's a way to add marginal boxplots. We'll use the built-in mtcars data frame.
First, some set-up:
library(cowplot)
# Common theme elements
thm = list(theme_bw(),
guides(colour=FALSE, fill=FALSE),
theme(plot.margin=unit(rep(0,4),"lines")))
Now, create the three plots:
# Main plot
p1 = ggplot(mtcars, aes(wt, mpg, colour=factor(cyl), fill=factor(cyl))) +
geom_smooth(method="lm") + labs(colour="Cyl", fill="Cyl") +
scale_y_continuous(limits=c(10,35)) +
thm[-2] +
theme(legend.position = c(0.85,0.8))
# Top margin plot
p2 = ggplot(mtcars, aes(factor(cyl), wt, colour=factor(cyl))) +
geom_boxplot() + thm + coord_flip() + labs(x="Cyl", y="")
# Right margin plot
p3 = ggplot(mtcars, aes(factor(cyl), mpg, colour=factor(cyl))) +
geom_boxplot() + thm + labs(x="Cyl", y="") +
scale_y_continuous(limits=c(10,35))
Lay out the plots and add the legend:
plot_grid(plotlist=list(p2, ggplot(), p1, p3), ncol=2,
rel_widths=c(5,1), rel_heights=c(1,5), align="hv")
Original Answer
You can overlay all four boxplots in a single column, but the plot will be unreadable. The first example below removes N as the x coordinate, but keeps N as the colour aesthetic. This results in the four levels of N being plotted at a single tick mark (which I've removed by setting breaks to NULL). However, the plots are still dodged. To plot them one on top of the other, set the dodge width to zero, as I've done in the second example. However, the plots are not readable when they are overlaid.
ggplot(df, aes(x="", y=Value,color=N)) +
theme_bw(base_size = 20) +
geom_boxplot() +
scale_x_discrete(breaks=NULL) +
labs(x="")
ggplot(df, aes(x="", y=Value,color=N)) +
theme_bw(base_size = 20) +
geom_boxplot(position=position_dodge(0)) +
scale_x_discrete(breaks=NULL) +
labs(x="")

How to ggplot with pre calculated quantiles?

I am using a model to predict some numbers. My prediction also includes a confidence interval for each number. I need to plot the actual numbers + predicted numbers and their quantile values on the same plot. Here is a simple example:
actualVals = c(12,20,15,30)
lowQuantiles = c(19,15,12,18)
midQuantiles = c(22,22,17,25)
highQuantiles = c(30,25,25,30)
and I'm looking for something like this, perhaps by using ggplot():
You can use geom_errorbar, among others you can see at ?geom_errorbar. I created a data.frame from your variables, dat and added dat$x <- 1:4.
ggplot(dat) +
geom_errorbar(aes(x, y=midQuantiles, ymax=highQuantiles, ymin=lowQuantiles, width=0.2), lwd=2, color="blue") +
geom_point(aes(x, midQuantiles), cex=4, shape=22, fill="grey", color="black") +
geom_line(aes(x, actualVals), color="maroon", lwd=2) +
geom_point(aes(x, actualVals), shape=21, cex=4, fill="white", color='maroon') +
ylim(0, 30) +
theme_bw()

Draw mean and outlier points for box plots using ggplot2

I am trying to plot the outliers and mean point for the box plots in below using the data available here. The dataset has 3 different factors and 1 value column for 3600 rows.
While I run the below the code it shows the mean point but doesn't draw the outliers properly
ggplot(df, aes(x=Representations, y=Values, fill=Methods)) +
geom_boxplot() +
facet_wrap(~Metrics) +
stat_summary(fun.y=mean, colour="black", geom="point", position=position_dodge(width=0.75)) +
geom_point() +
theme_bw()
Again, while I am modify the code like in below the mean points disappear !!
ggplot(df, aes(x=Representations, y=Values, colour=Methods)) +
geom_boxplot() +
facet_wrap(~Metrics) +
stat_summary(fun.y=mean, colour="black", geom="point", position=position_dodge(width=0.75)) +
geom_point() +
theme_bw()
In both of the cases I am getting the message: "ymax not defined: adjusting position using y instead" 3 times.
Any kind suggestions how to fix it? I would like to draw the mean points within individual box plots and show outliers in the same colour as the plots.
EDIT:
The original data set does not have any outliers and that was reason for my confusion. Thanks to MrFlick's answer with randomly generated data which clarifies it properly.
Rather than downloading the data, I just made a random sample.
set.seed(18)
gg <- expand.grid (
Methods=c("BC","FD","FDFND","NC"),
Metrics=c("DM","DTI","LB"),
Representations=c("CHG","QR","HQR")
)
df <- data.frame(
gg,
Values=rnorm(nrow(gg)*50)
)
Then you should be able to create the plot you want with
library(ggplot2)
ggplot(df, aes(x=Representations, y=Values, fill=Methods)) +
geom_boxplot() +
stat_summary(fun.y="mean", geom="point",
position=position_dodge(width=0.75), color="white") +
facet_wrap(~Metrics)
which gave me
I was using ggplot2 version 0.9.3.1

Resources