Overlaying jittered points on boxplot conditioned by a factor using ggplot2 - r

I am making a boxplot conditioned by a factor similar to this example:
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot(aes(fill = factor(am)))
There are few points in the data set, and I'd like to express this visually by overlaying the data points. I want to overlay the points colored by the same factor "am" which I try to do like this:
p + geom_boxplot(aes(fill = factor(am))) + geom_jitter(aes(colour = factor(am)))
The points are colored by the factor "am" but not spaced to lay only over the box plots they are associated with. Rather they mix and cover both.
Does anyone know how the condition the geom_jitter so the points associate with the factor "am"?

Welcome to SO! Here's my attempt. It's a bit clumsy, but does the job. The trick is to map x to a dummy variable with manually constructed offset. I'm adding a fill scale to highlight point positioning.
mtcars$cylpt <- as.numeric(factor(mtcars$cyl)) + ifelse(mtcars$am == 0, -0.2, 0.2)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot(aes(fill = factor(am))) +
geom_point(aes(x = cylpt, colour = factor(am)), position = "jitter") +
scale_fill_manual(values = c("white", "gray"))

I have found this link that solves your problem:
https://datavizpyr.com/how-to-make-grouped-boxplot-with-jittered-data-points-in-ggplot2/
geom_jitter(position = position_jitterdodge())

Related

Visualizing two or more data points where they overlap (ggplot R)

I have a scatterplot that has colour-coded data points. When two or more of the data points overlap only one of the colours is shown (whichever is first in the legend). Each of these data points represents an item and I need to show which items fall at each point on the scale. I'm using R (v.3.3.1). Would anyone have any suggestions as per how I could show that there are multiple items at each point on the scatterplot?
Thanks in advance.
pdf('pedplot.pdf', height = 6, width = 10)
p3 <- ggplot(data=e4, aes(x=e4$domain, y=e4$ped)) + geom_point(aes(color =
e4$Database_acronym), size = 3, shape = 17) +
labs(x = "Domains", y = "Proportion of Elements per Domain", color = "Data
Sources") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
p3 dev.off();
You could jitter the points, meaning add a bit of noise to remove the overlap (probably the most commonly used option). Another option, would be to use different marker shapes (plus a small size adjustment) chosen so that the markers will be visible when plotted on top of each other. This will work if you have only two or three different marker types. A third option is to vary the size for each color, once again only for cases with maybe two or three colors/sizes, though the size difference might be confusing. If you can have multiple points of the same color with the same coordinates, then only jitter (among the three options above) will make that apparent. In any case, here are examples of each approach:
dat = data.frame(x=1:5, y=rep(1:5,3), group=rep(LETTERS[1:3],each=5))
theme_set(theme_bw())
# Jitter
set.seed(3)
ggplot(dat, aes(x,y, colour=group)) +
geom_point(size=3, position=position_jitter(h=0.15,w=0.15))
# Vary the marker size
ggplot(dat, aes(x,y, colour=group,size=group)) +
geom_point() +
scale_color_manual(values=c("red","blue","orange")) +
scale_size_manual(values=c(5,3,1))
# Vary the marker shape (plus a small size adjustment)
ggplot(dat, aes(x,y, colour=group, size=group, shape=group)) +
geom_point(stroke=1.5) +
scale_colour_manual(values=(c("black", "green", "orange"))) +
scale_shape_manual(values=c(19,17,4)) +
scale_size_manual(values=c(4,3,3))
Separately from or in addition to jittering as mentioned here, you could also consider making the points partially transparent:
linecolors <- c("#714C02", "#01587A", "#024E37")
fillcolors <- c("#9D6C06", "#077DAA", "#026D4E")
# partially transparent points by setting `alpha = 0.5`
ggplot(mpg, aes(displ, cty, colour = drv, fill = drv)) +
geom_point(position=position_jitter(h=0.1, w=0.1),
shape = 21, alpha = 0.5, size = 3) +
scale_color_manual(values=linecolors) +
scale_fill_manual(values=fillcolors) +
theme_bw()
What about using different shapes and fills?
ggplot(mpg, aes(displ, cty, fill = drv, shape = drv)) +
geom_point(position=position_jitter(h=0.1, w=0.1), alpha = 0.5, size = 3) +
scale_fill_manual(values=c("red","blue","orange")) +
scale_shape_manual(values= c(23, 24, 25)) +
theme_bw()
Another option could be by counting the overlapping points using geom_count with scale_size_area to scale the sizes of the points. Here is some reproducible code:
library(ggplot2)
ggplot(mpg, aes(x = displ, y = cty)) +
geom_count() +
scale_size_area()
Also, an example when using a color aesthetic to see the difference of counts of groups:
ggplot(mpg, aes(x = displ, y = cty, colour = drv)) +
geom_count() +
scale_size_area()
Created on 2023-01-31 with reprex v2.0.2
You could change the number of breaks in scale_size_area to show different sizes. Please check the link above for more examples.
Try geom_point(aes(color = e4$Database_acronym), position = "jitter", size = 3, shape = 17).
This adds a little bit of random variation to your scatter plot and thereby prevents overplotting.

How can I use different color or linetype aesthetics in same plot with ggplot?

I'm creating a plot with ggplot that uses colored points, vertical lines, and horizontal lines to display the data. Ideally, I'd like to use two different color or linetype scales for the geom_vline and geom_hline layers, but ggplot discourages/disallows multiple variables mapped to the same aesthetic.
# Create example data
library(tidyverse)
library(lubridate)
set.seed(1234)
example.df <- data_frame(dt = seq(ymd("2016-01-01"), ymd("2016-12-31"), by="1 day"),
value = rnorm(366),
grp = sample(LETTERS[1:3], 366, replace=TRUE))
date.lines <- data_frame(dt = ymd(c("2016-04-01", "2016-10-31")),
dt.label = c("April Fools'", "Halloween"))
value.lines <- data_frame(value = c(-1, 1),
value.label = c("Threshold 1", "Threshold 2"))
If I set linetype aesthetics for both geom_*lines, they get put in the
linetype legend together, which doesn't necessarily make logical sense
ggplot(example.df, aes(x=dt, y=value, colour=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, linetype=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(size=1) +
scale_x_date() +
theme_minimal()
Alternatively, I could set one of the lines to use a colour aesthetic,
but then that again puts the legend lines in an illogical legend
grouping
ggplot(example.df, aes(x=dt, y=value, colour=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, colour=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(size=1) +
scale_x_date() +
theme_minimal()
The only partial solution I've found is to use a fill aesthetic instead
of colour in geom_pointand setting shape=21 to use a fillable shape,
but that forces a black border around the points. I can get rid of the
border by manually setting color="white, but then the white border
covers up points. If I set colour=NA, no points are plotted.
ggplot(example.df, aes(x=dt, y=value, fill=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, colour=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(shape=21, size=2, colour="white") +
scale_x_date() +
theme_minimal()
This might be a case where ggplot's "you can't have two variables mapped
to the same aesthetic" rule can/should be broken, but I can't figure out clean way around it. Using fill with geom_point shows the most promise, but there's no way to remove the point borders.
Any ideas for plotting two different color or linetype aesthetics here?

geom_point plot with only number without circles

In ggplot in R, is it possible to plot each point with a unique number but without circles surrounded? I tried to use color "white" but it doesn't work.
I would recommend geom_text.
set.seed(101)
dd <- data.frame(x=rnorm(50),y=rnorm(50),id=1:50)
library(ggplot2)
ggplot(dd,aes(x,y))+geom_text(aes(label=id))
I'll show how to do it with geom_text and/or geom_point.
Using geom_text (recommended)
For this example I'll use the built-in dataset mtcars and let's pretend the numbers you want to display are the weights (wt) variable:
data(mtcars)
p <- ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars)))
p + geom_text(aes(label = wt),
parse = TRUE)
or if you want an example with truly unique numbers, we can just make up an index using seq:
data(mtcars)
p <- ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars)))
p + geom_text(aes(label = seq(1:32)),
parse = TRUE)
Using geom_point
While it would require more work, it actually is possible to do this with geom_point.
This is a reference image of some of the shapes you can use with geom_point:
As you can see, shapes 48 to 57 are 0 to 9. You can leverage these shapes (and combinations of them to form an infinite amount of numbers) via geom_point like this:
d=data.frame(p=c(48:57))
ggplot() +
scale_y_continuous(name="") +
scale_x_continuous(name="") +
scale_shape_identity() +
geom_point(data=d, mapping=aes(x=p%%16, y=p%/%16, shape=p), size=5, fill="red")
Finally, a trivial example using mtcars + geom_point with arbitrary numbers:
d=data.frame(p=c(48:57,48:57,48:57,48,49))
attach(mtcars)
ggplot(mtcars) +
scale_y_continuous(name="") +
scale_x_continuous(name="") +
scale_shape_identity() +
geom_point(data=d, mapping=aes(x=wt, y=mpg, shape=p), size=5, fill="red")

How to fill boxes in geom_point legend with color of points, not just increasing their size?

I'm having a similar problem as described in here under "2- After having the two legends...", but instead of increasing the point size (which eventually also enlarges the legend itself), I would like fill each box in the legend with the corresponding color. Like in a bar plot's legend. Data & code examples here.
Looking through several other questions here, the ggplot docu, etc., I tried variations of code-snippets I found, but couldn't figure out a solution. The legend always retained the point symbols.
Therefore: If possible, how to tweak or replace the legend of a point/scatter/bubble plot so that it looks like the legend of a bar plot? Or, more generally, how to replace the legend of a given geom in ggplot2 with that of a different one? Thank you for any hints!
Edit: Example with mtcars data
library(ggplot2)
p <- ggplot(mtcars, aes(wt, mpg)) + geom_point(aes(colour = factor(cyl), size = qsec))
p
Adding what I gathered from other SO-answers...
p <- p + guides(colour = guide_legend(override.aes = list(fill = unique(mtcars$cyl))))
p
...keeps the points, instead of expanding the color to fill the legend box, no matter arguments and datasources I try for guides() and list().
On the other hand:
ggplot(mtcars, aes(wt, mpg)) + geom_bar(aes(fill = factor(cyl)), stat="identity")
...draws nicely color-filled boxes to the legend. That's what I'm trying to do for a bubble plot.
You won't be able to get a fill-type legend per se, but you can easily emulate it:
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(colour = factor(cyl), size = qsec)) +
guides(col = guide_legend(override.aes = list(shape = 15, size = 10)))

Having all layers in the legend with ggplot

how could I make a legend representing all the curves that are plotted in my graph ? Presently, an automatic legend is generated for the first layer (based on the "colour" aesthetic), but the other layer (the black curve representing the density of "price" variable across all observations) in not contained in this legend.
I conceive that my question comes certainly from an incomplete understanding of the concepts behing ggplot package.
ggplot(diamonds) +
geom_density(aes(x = price, y = ..density.., colour = cut)) +
geom_density(aes(x = price,y = ..density..))
The principle in ggplot2 is that each aesthetic gets mapped to a scale. So, if you want to include a layer in the colour scale, you need to map that layer to colour.
Like this:
ggplot(diamonds, aes(x=price)) +
geom_density(aes(colour = cut)) +
geom_density(aes(colour="Overall"), size=1.5)
Note: You can take additional control over the colours by specifying a manual colour scale:
ggplot(diamonds, aes(x=price)) +
geom_density(aes(colour = cut)) +
geom_density(aes(colour="Overall"), size=1.5) +
scale_colour_manual(
limits=c("Overall", levels(diamonds$cut)),
values=c("black", 2:6)
)

Resources