ggplot2 2.0.0 coloured boxplots and jitter with borders - r

I am trying to make a boxplot filled by a binary variable, with a facet grid. I also want to have jitter on top of the boxplots, but without getting them confused with the outliers. In order to fix this, I have added colour to the jitter, but by doing so, they meld in with the already coloured boxplots, as they are the same colour.
I really want to keep the colours the same, so is there a way to add borders to the jitter (or is there a different way to fix the outlier problem)?
Example code:
plot <- ggplot(mpg, aes(class, hwy))+
geom_boxplot(aes(fill = drv))+
geom_jitter(width = .3, aes(colour =drv))
# facet_grid(. ~some_binary_variable, scales="free")

You can use a filled plotting symbol (21:25, cf. ?pch) and then use a white border to differentiate the points:
ggplot(mpg, aes(class, hwy))+
geom_boxplot(aes(fill = drv))+
geom_jitter(width = .3, aes(fill = drv), shape = 21, color = "white")

Related

R ggplot2: How to draw geom_points that have a solid color and a transparent stroke and are colored depending on color?

I would like to make a scatter plot where every point gets a sphere. Both the dot and its sphere are colored according to some column values.
A minimal example that shows what I want:
library(ggplot2)
library(vcd) # only needed for example dataset
ggplot(Arthritis, aes(x = ID, y = Age)) +
geom_point(aes(color=Sex), size=10, alpha=.3) +
geom_point(aes(color=Treatment), size=3)
The problem with this "solution" is that using two geom_point layers seems to mess up the legend. I guess it would also make much more sense to only have one geom_point layer and use a shape that also adds a stroke, so something like this:
ggplot(Arthritis, aes(x = ID, y = Age)) +
geom_point(aes(color=Sex, fill=Treatment), shape=21, size=5, stroke=5)
Here the legend makes way more sense, however, I can not figure out how to make the stroke transparent. This is important because you just can not see anything anymore when points overlap.
Answers like this do not solve my problem, because they use a constant color and thus can use the function alpha. However, I can not figure out if and how to use this with colors that depend on the data.
TL;DR: How can I draw geom_points that have a solid color and a transparent stroke but not constant colors?
You are on the right track to recognize that you can use the function alpha(), and have realized that you cannot just put alpha() within aes(). You can, however, pass alpha() as the values= argument within any scale_* functions. Here's an example using mtcars:
ggplot(mtcars, aes(mpg, disp)) +
geom_point(
aes(color=factor(cyl), fill=factor(carb)),
shape=21, size=4, stroke=4) +
scale_color_manual(values=alpha(rainbow(3), 0.2))
One problem with that is those big black lines around the "factor(carb) legend don't sit well with me. Super ew. You can get rid of them using the guides() function and using override.aes= to specify what you want shown there and what to replace it with. In this case, you can set the color=NA to override the inherited aesthetic to be transparent (leaving only the fill= part).
ggplot(mtcars, aes(mpg, disp)) +
geom_point(
aes(color=factor(cyl), fill=factor(carb)),
shape=21, size=4, stroke=4) +
scale_color_manual(values=alpha(rainbow(3), 0.2)) +
guides(fill=guide_legend(override.aes = list(color=NA))) +
labs(color="cyl", fill="carb")
BTW, there's no simple way to place the stroke "behind" the fill part for geom_point. You can probably write your own custom stat/geom for doing that, but geom_point is always drawn with fill first, then stroke.
A simple way round this is to make it so that the larger transparent circles aren't points at all, but filled circles. That way you can use the fill aesthetic to label them. This uses geom_circle from ggforce:
library(ggplot2)
library(vcd)
library(ggforce)
ggplot(Arthritis) +
geom_circle(aes(x0 = ID, y0 = Age, r = 2, fill = Sex), alpha = .3, colour = NA) +
geom_point(aes(x = ID, y = Age, color = Treatment), size = 3) +
coord_equal() +
scale_color_discrete(h = c(350, 190))
Created on 2020-07-01 by the reprex package (v0.3.0)
Or make a second color scale!
library(ggplot2)
library(vcd)
#> Loading required package: grid
library(ggnewscale)
ggplot(Arthritis, aes(x = ID, y = Age)) +
## removing stroke so it does not have this awkward border around it
geom_point(aes(color=Sex), size=10, alpha=.3, stroke = 0) +
new_scale_color()+
geom_point(aes(color=Treatment), size=3)
Created on 2022-06-15 by the reprex package (v2.0.1)

How can I use different color or linetype aesthetics in same plot with ggplot?

I'm creating a plot with ggplot that uses colored points, vertical lines, and horizontal lines to display the data. Ideally, I'd like to use two different color or linetype scales for the geom_vline and geom_hline layers, but ggplot discourages/disallows multiple variables mapped to the same aesthetic.
# Create example data
library(tidyverse)
library(lubridate)
set.seed(1234)
example.df <- data_frame(dt = seq(ymd("2016-01-01"), ymd("2016-12-31"), by="1 day"),
value = rnorm(366),
grp = sample(LETTERS[1:3], 366, replace=TRUE))
date.lines <- data_frame(dt = ymd(c("2016-04-01", "2016-10-31")),
dt.label = c("April Fools'", "Halloween"))
value.lines <- data_frame(value = c(-1, 1),
value.label = c("Threshold 1", "Threshold 2"))
If I set linetype aesthetics for both geom_*lines, they get put in the
linetype legend together, which doesn't necessarily make logical sense
ggplot(example.df, aes(x=dt, y=value, colour=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, linetype=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(size=1) +
scale_x_date() +
theme_minimal()
Alternatively, I could set one of the lines to use a colour aesthetic,
but then that again puts the legend lines in an illogical legend
grouping
ggplot(example.df, aes(x=dt, y=value, colour=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, colour=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(size=1) +
scale_x_date() +
theme_minimal()
The only partial solution I've found is to use a fill aesthetic instead
of colour in geom_pointand setting shape=21 to use a fillable shape,
but that forces a black border around the points. I can get rid of the
border by manually setting color="white, but then the white border
covers up points. If I set colour=NA, no points are plotted.
ggplot(example.df, aes(x=dt, y=value, fill=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, colour=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(shape=21, size=2, colour="white") +
scale_x_date() +
theme_minimal()
This might be a case where ggplot's "you can't have two variables mapped
to the same aesthetic" rule can/should be broken, but I can't figure out clean way around it. Using fill with geom_point shows the most promise, but there's no way to remove the point borders.
Any ideas for plotting two different color or linetype aesthetics here?

mix discrete and continuous values to get a fill guide in ggplot2

I want to add a legend for filled rectangles in the background but I already used fill aesthetics for filling the bars of my bar plot.
How can I get the legend or create a matching legend by hand?
df <- data.frame(a=factor(c('a','b','c','c','d','e'), levels=c('a','b','c','d','e')),
x=seq(1,6),
b=factor(c('A','A','A','B','B','B'), levels=c('A','B')),
c=c(1,2,3,4,5,6),
d=rnorm(6))
ggplot(df, aes(x, c, fill=d, group=b)) +
geom_rect(aes(xmin=0.5,xmax=3.5,ymin=-Inf,ymax=Inf),alpha=0.05,fill="#E41A1C") +
geom_rect(aes(xmin=3.5,xmax=6.5,ymin=-Inf,ymax=Inf),alpha=0.05,fill="#377EB8") +
geom_bar(stat='identity', position=position_dodge()) +
coord_flip() +
scale_x_continuous(breaks=df$x, labels=df$a)
So I need a legend describing my two geom_rect areas. I was not able to map my two areas in any way to get a legend. In general the column df$b is describing the areas I do now by hand.
You can set colour= to variable b inside the aes() of both geom_rect(). This will make lines around the rectangles and also make legend. Lines can be removed setting size=0 for geom_rect(). Now using guides() and override.aes= you can change fill= for legend key.
ggplot(df, aes(x, c, fill=d, group=b)) +
geom_rect(aes(xmin=0.5,xmax=3.5,ymin=-Inf,ymax=Inf,colour=b),alpha=0.05,fill="#E41A1C",size=0) +
geom_rect(aes(xmin=3.5,xmax=6.5,ymin=-Inf,ymax=Inf,colour=b),alpha=0.05,fill="#377EB8",size=0) +
geom_bar(stat='identity', position=position_dodge()) +
coord_flip() +
scale_x_continuous(breaks=df$x, labels=df$a)+
guides(colour=guide_legend(override.aes=list(fill=c("#E41A1C","#377EB8"),alpha=0.3)))

How to fill boxes in geom_point legend with color of points, not just increasing their size?

I'm having a similar problem as described in here under "2- After having the two legends...", but instead of increasing the point size (which eventually also enlarges the legend itself), I would like fill each box in the legend with the corresponding color. Like in a bar plot's legend. Data & code examples here.
Looking through several other questions here, the ggplot docu, etc., I tried variations of code-snippets I found, but couldn't figure out a solution. The legend always retained the point symbols.
Therefore: If possible, how to tweak or replace the legend of a point/scatter/bubble plot so that it looks like the legend of a bar plot? Or, more generally, how to replace the legend of a given geom in ggplot2 with that of a different one? Thank you for any hints!
Edit: Example with mtcars data
library(ggplot2)
p <- ggplot(mtcars, aes(wt, mpg)) + geom_point(aes(colour = factor(cyl), size = qsec))
p
Adding what I gathered from other SO-answers...
p <- p + guides(colour = guide_legend(override.aes = list(fill = unique(mtcars$cyl))))
p
...keeps the points, instead of expanding the color to fill the legend box, no matter arguments and datasources I try for guides() and list().
On the other hand:
ggplot(mtcars, aes(wt, mpg)) + geom_bar(aes(fill = factor(cyl)), stat="identity")
...draws nicely color-filled boxes to the legend. That's what I'm trying to do for a bubble plot.
You won't be able to get a fill-type legend per se, but you can easily emulate it:
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(colour = factor(cyl), size = qsec)) +
guides(col = guide_legend(override.aes = list(shape = 15, size = 10)))

How to change style settings in stacked barchart overlaid with density line (ggplot2)

I am trying to change the style settings of this kind of chart and hope you can help me.
R code:
set_theme(theme_bw)
cglac$pred2<-as.factor(cglac$pred)
ggplot(cglac, aes(x=depth, colour=pred2))
+ geom_bar(aes(y=..density..),binwidth=3, alpha=.5, position="stack")
+ geom_density(alpha=.2)
+ xlab("Depth (m)")
+ ylab("Counts & Density")
+ coord_flip()
+ scale_x_reverse()
+ theme_bw()
which produces this graph:
Here some points:
What I want is to have the density line as black and white lines separated by symbols rather than colour (dashed line, dotted line etc).
The other thing is the histogram itself. How do I get rid of the grey background in the bars?
Can I change the bars also to black and white symbol lines (shaded etc)? So that they would match the density lines?
Last but not least I want to add a second x or in this case y axis, because of flip_coord(). The one I see right now is for the density. The other one I need would then be the count data from the pred2 variable.
Thanks for helping.
Best,
Moritz
Have different line types: inside aes(), put linetype = pred2. To make the line color black, inside geom_density, add an argument color = "black".
The "background" of the bars is called "fill". Inside geom_bar, you can set fill = NA for no fill. A more common approach is to fill in the bars with the colors, inside aes() specify fill = pred2. You might consider faceting by your variable, + facet_wrap(~ pred2, nrow = 1) might look very nice.
Shaded bars in ggplot? No, you can't do that easily. See the answers to this question for other options and hacks.
Second y-axis, similar to the shaded symbol lines, the ggplot creator thinks a second y-axis is a terrible design choice, so you can't do it at all easily. Here's a related question, including Hadley's point of view:
I believe plots with separate y scales (not y-scales that are transformations of each other) are fundamentally flawed.
It's definitely worth considering his point of view, and asking yourself if those design choices are really what you want.
Different linetypes for densities
Here's my built-in data version of what you're trying to do:
ggplot(mtcars, aes(x = hp,
linetype = cyl,
group = cyl,
color = cyl)) +
geom_histogram(aes(y=..density.., fill = cyl),
alpha=.5, position="stack") +
geom_density(color = "black") +
coord_flip() +
theme_bw()
And what I think you should do instead. This version uses facets instead of stacking/colors/linetypes. You seem to be aiming for black and white, which isn't a problem at all in this version.
ggplot(mtcars, aes(x = hp,
group = cyl)) +
geom_histogram(aes(y=..density..),
alpha=.5) +
geom_density() +
facet_wrap(~ cyl, nrow = 1) +
coord_flip() +
theme_bw()

Resources