geom_dotplot() loses dodge after applying colour aesthetics - r

I want to organize my data by one category on the X-axis, but color it by another category as in this example:
Graph 1, without coloring:
require(ggplot2)
nocolor <- ggplot(mtcars, aes(x=as.factor(cyl), y=disp)) +
geom_dotplot(binaxis="y", stackdir = "center")
print(nocolor)
Graph 2, with coloring:
nododge <- ggplot(mtcars, aes(x=as.factor(cyl), y=disp, fill=as.factor(gear))) +
geom_dotplot(binaxis="y", stackdir = "center")
print(nododge)
One problem that occurs after introducing coloring is that the dots belonging to different groups wont dodge one another anymore. This causes problems with my real data, as I get dots that happen to have the same value and completely obscure one another.
Then I tried this, but it garbled my data:
Graph 3:
garbled <- ggplot(mtcars, aes(x=as.factor(cyl), y=disp)) +
geom_dotplot(binaxis="y", stackdir = "center", fill=as.factor(mtcars$gear))
print(garbled)
The dots dodge one another, but the the coloring is just random and is not true to the actual data.
I expected the answer to this question to solve my problem, but the coloring remained random:
Graph 4:
graphdata <- mtcars
graphdata$colorname <- as.factor(graphdata$gear)
levels(graphdata$colorname) <- c("red", "blue", "black")
jalapic <- ggplot(graphdata, aes(x=as.factor(cyl), y=disp)) +
geom_dotplot(binaxis="y", stackdir = "center", fill=as.character(graphdata$colorname))
print(jalapic)
Does anyone have an idea how to get the dots in Graph #2 to dodge one another, or how to fix the coloring in graphs 3 or 4? I would really appreciate any help, thanks.

Using binpositions = "all" and stackgroups = TRUE:
ggplot(mtcars, aes(x=as.factor(cyl), y=disp, fill=as.factor(gear))) +
geom_dotplot(binaxis="y", stackdir = "center", binpositions="all", stackgroups=TRUE)
gives:
A possible alternative is using stackdir = "up":
ggplot(mtcars, aes(x=as.factor(cyl), y=disp, fill=as.factor(gear))) +
geom_dotplot(binaxis="y", stackdir = "up", binpositions="all", stackgroups=TRUE)
which gives:

Here's another option that might work better than a dotplot, depending on your needs. We plot the individual points, but we separate them so that each point is visible.
In my original answer, I used position_jitterdodge, but the randomness of that method resulted in overlapping points and little control over point placement. Below is an updated approach that directly controls point placement to prevent overlap.
In the example below, we have cyl as the x variable, disp as the y variable, and gear as the colour aesthetic.
Within each cyl, we want points to be dodged by gear.
Within each gear we want points with similar values of disp to be separated horizontally so that they don't overlap.
We do this by adding appropriate increments to the value of cyl in order to shift the horizontal placement of the points. We control this with two parameters: dodge separates groups of points by gear, while sep controls the separation of points within each gear that have similar values of disp. We determine "similar values of disp" by creating a grouping variable called dispGrp, which is just disp rounded to the nearest ten (although this can, of course, be adjusted, depending on the scale of the data, size of the plotted points, and physical size of the graph).
To determine the x-value of each point, we start with the value of cyl, add dodging by gear, and finally spread the points within each gear and dispGrp combination by amounts that depend on the number of points within the each grouping.
All of these data transformations are done within a dplyr chain, and the resulting data frame is then fed to ggplot. The sequence of data transformations and plotting could be generalized into a function, but the code below addressed only the specific case in the question.
library(dplyr)
library(ggplot2)
dodge = 0.3 # Controls the amount dodging
sep = 0.05 # Within each dodge group, controls the amount of point separation
mtcars %>%
# Round disp to nearest 10 to identify groups of points that need to be separated
mutate(dispGrp = round(disp, -1)) %>%
group_by(gear, cyl, dispGrp) %>%
arrange(disp) %>%
# Within each cyl, dodge by gear, then, within each gear, separate points
# within each dispGrp
mutate(cylDodge = cyl + dodge*(gear - mean(unique(mtcars$gear))) +
sep*seq(-(n()-1), n()-1, length.out=n())) %>%
ggplot(aes(x=cylDodge, y=disp, fill=as.factor(gear))) +
geom_point(pch=21, size=2) +
theme_bw() +
scale_x_continuous(breaks=sort(unique(mtcars$cyl)))
Here's my original answer, using position_jitterdodge to dodge by color and then jitter within each color group to separate overlapping points:
set.seed(3521)
ggplot(mtcars, aes(x=factor(cyl), y=disp, fill=as.factor(gear))) +
geom_point(pch=21, size=1.5, position=position_jitterdodge(jitter.width=1.2, dodge.width=1)) +
theme_bw()

Related

Issue: Plot alpha values scaling with Y-Axis/number of observations in ggplot facets

I haven't found anyone else with this issue. Here is my plot:
facet plot
Why are there different alpha values for each facet?
As you can see, the alpha value of the geom_rect() elements seems to scale with the y-axis or number of observations, maybe because I have set these to "free_y" in the facet_wrap() argument. How can I prevent this from happening?
Here is my code:
plot_data %>%
ggplot(aes(Date, n)) +
geom_rect(data= plot_data, inherit.aes = FALSE,
aes(xmin=current_date - lubridate::weeks(1), xmax=current_date, ymin=-Inf, ymax=+Inf),
fill='pink', alpha=0.2) +
geom_col() +
facet_wrap(~Type, scales = "free_y") +
xlab("Date") +
ylab("Count") +
theme_bw() +
scale_y_continuous(breaks = integer_breaks()) +
scale_alpha_manual(values = 0.2) +
theme(axis.text.x=element_text(angle=90, hjust=1))
Cheers!
TL;DR - It seems this is probably due to overplotting. You have 5 rect geoms drawn in the facet, but probably more than 5 observations in your dataset. The fix is to summarize your data and associate geom_rect() to plot with the summarized dataset.
Since OP did not provide an example dataset, we can only guess at the reason, but likely what's happening here is due to overplotting. geom_rect() behaves like all other geoms, which is to say that ggplot2 will draw or add to any geom layer with every observation (row) in the original dataset. If the geoms are drawn across facets and overlap in position, then you'll get overplotting. You can notice that this is happening based on:
Different alpha appearing on each facet, even though it should be constant based on the code, and
The fact that in order to get the rectangles to look like "light red", OP had to use pink color and an alpha value of 0.2... which shouldn't look like that if there was only one rect drawn.
Representative Example of the Issue
Here's an example that showcases the problem and how you can fix it using mtcars:
library(ggplot2)
df <- mtcars
p <- ggplot(df, aes(disp, mpg)) + geom_point() +
facet_wrap(~cyl) +
theme_bw()
p + geom_rect(
aes(xmin=200, xmax=300, ymin=-Inf, ymax=Inf),
alpha=0.01, fill='red')
Like OP's case, we expect all rectangles to be the same alpha value, but they are not. Also, note the alpha value is ridiculously low (0.01) for the color you see there. What's going on should be more obvious if we check number of observations in mtcars that falls within each facet:
> library(dplyr)
> mtcars %>% group_by(cyl) %>% tally()
# A tibble: 3 x 2
cyl n
<dbl> <int>
1 4 11
2 6 7
3 8 14
There's a lower number of observations where cyl==6 and cyl==4 has lower observations than cyl==8. This corresponds precisely to the alpha values we see for the geoms in the plot, so this is what's going on. For each observation, a rectangle is drawn over the same position and so there are 7 rectangles drawn in the middle facet, 14 on the right facet, and 11 on the left facet.
Fixing the Issue: Summarize the Data
To fix the issue, you should summarize your data and use the summarized dataset for plotting the rectangles.
summary_df <- df %>%
group_by(cyl) %>%
summarize(mean_d = mean(disp))
p + geom_rect(
data = summary_df,
aes(x=1, y=1, xmin=mean_d-50, xmax=mean_d+50, ymin=-Inf, ymax=Inf),
alpha=0.2, fill='red')
Since summary_df has only 3 observations (one for each group of cyl), the rectangles are drawn correctly and now alpha=0.2 with fill="red" gives the expected result. One thing to note here is that we still have to define x and y in the aes(). I set them both to 1 because although geom_rect() doesn't use them, ggplot2 still expects to find them in the dataset summary_df because we stated that they are assigned to that plot globally up in ggplot(df, aes(x=..., y=...)). The fix is to either move the aes() declaration into geom_point() or just assign both to be constant values in geom_rect().

How to specify different background colors for each facet label in ggplot2?

A plot will be made from these data:
mtcars %>%
gather(-mpg, key = "var", value = "value") %>%
ggplot(aes(x = value, y = mpg)) +
geom_point() +
facet_wrap(~ var, scales = "free") +
theme_bw()
How can I change the gray color of the titles of the panels
for instance
panels of am and hp green
panels of gear drat disp red
panels of vs wt blue
panels cyl qsec carb black
add a legend
green = area
red= bat
blue= vege
black = indus
Unfortunately, it seems the way to answer OP's question is still going to be quite hacky.
If you're not into gtable hacks like those referenced... here's another exceptionally hacky way to do this. Enjoy the strange ride.
TL;DR - The idea here is to use a rect geom outside of the plot area to draw each facet label box color
Here's the basic plot below. OP wanted to (1) change the gray color behind the facet labels (called the "strip" labels) to a specific color depending on the facet label, then (2) add a legend.
First of all, I just referenced the gathered dataframe as df, so the plot code looks like this now:
df <- mtcars %>% gather(-mpg, key = "var", value = "value")
ggplot(df, aes(x = value, y = mpg)) +
geom_point() +
facet_wrap(~ var, scales = "free") +
theme_bw()
How to recolor each facet label?
As referenced in the other answers, it's pretty simple to change all the facet label colors at once (and facet label text) via the theme() elements strip.background and strip.text:
plot + theme(
strip.background = element_rect(fill="blue"),
strip.text=element_text(color="white"))
Of course, we can't do that for all facet labels, because strip.background and element_rect() cannot be sent a vector or have mapping applied to the aesthetics.
The idea here is that we use something that can have aesthetics mapped to data (and therefore change according to the data) - use a geom. In this case, I'm going to use geom_rect() to draw a rectangle in each facet, then color that rect based upon the criteria OP states in their question. Moreover, using geom_rect() in this way also creates a legend automatically for us, since we are going to use mapping and aes() to specify the color. All we need to do is allow ggplot2 to draw layers outside the plot area, use a bit of manual fine-tuning to get the placement correct, and it works!
The Hack
First, a separate dataset is created containing a column called var that contains all facet names. Then var_color specifies the names OP gave for each facet. We specify color using a scale_fill_manual() function. Finally, it's important to use coord_cartesian() carefully here. We need this function for two reasons:
Cut the panel area in the plot to only contain the points. If we did not specify the y limit, the panel would automatically resize to accomodate the rect geom.
Turn clipping off. This allows layers drawn outside the panel to be seen.
We then need to turn strip.background to transparent (so we can see the color of the box), and we're good to go. Hopefully you can follow along below.
I'm representing all the code below for extra clarity:
library(ggplot2)
library(tidyr)
library(dplyr)
# dataset for plotting
df <- mtcars %>% gather(-mpg, key = "var", value = "value")
# dataset for facet label colors
hacky_df <- data.frame(
var = c("am", "carb", "cyl", "disp", "drat", "gear", "hp", "qsec", "vs", "wt"),
var_color = c("area", "indus", "indus", "bat", "bat", "bat", "area", "indus", "vege", "vege")
)
# plot code
plot_new <-
ggplot(df) + # don't specify x and y here. Otherwise geom_rect will complain.
geom_rect(
data=hacky_df,
aes(xmin=-Inf, xmax=Inf,
ymin=36, ymax=42, # totally defined by trial-and-error
fill=var_color, alpha=0.4)) +
geom_point(aes(x = value, y = mpg)) +
coord_cartesian(clip="off", ylim=c(10, 35)) +
facet_wrap(~ var, scales = "free") +
scale_fill_manual(values = c("area" = "green", "bat" = "red", "vege" = "blue", "indus" = "black")) +
theme_bw() +
theme(
strip.background = element_rect(fill=NA),
strip.text = element_text(face="bold")
)
plot_new

Create vertical layers in ggplot

For each level of the y-axis I want to separate the lines vertically by a small distance so they aren't overlapping. Can someone help me achieve this please? Also, I don't want it to be random by a method such as jittering. The placement needs to be constant across all levels.
data(mtcars)
str(mtcars)
mtcars$cyl = as.factor(mtcars$cyl)
mtcars$carb = as.factor(mtcars$carb)
ggplot(mtcars) + aes(mpg,cyl,color = carb) + geom_line() +
geom_point()
You can make use of position_dodge, though because that only has an option to set width, I believe that you will have to construct it with the opposite axes, then use coord_flip to get it back the way you wanted it:
ggplot(mtcars
, aes(cyl, mpg
,color = carb) ) +
geom_line(position = position_dodge(0.3)) +
geom_point(position = position_dodge(0.3)) +
coord_flip()
Gives:

How to use free scales but keep a fixed reference point in ggplot?

I am trying to create a plot with facets. Each facet should have its own scale, but for ease of visualization I would like each facet to show a fixed y point. Is this possible with ggplot?
This is an example using the mtcars dataset. I plot the weight (wg) as a function of the number of miles per gallon (mpg). The facets represent the number of cylinders of each car. As you can see, I would like the y scales to vary across facets, but still have a reference point (3, in the example) at the same height across facets. Any suggestions?
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(mpg, wt)) + geom_point() +
geom_hline (yintercept=3, colour="red", lty=6, lwd=1) +
facet_wrap( ~ cyl, scales = "free_y")
[EDIT: in my actual data, the fixed reference point should be at y = 0. I used y = 3 in the example above because 0 didn't make sense for the range of the data points in the example]
It's unclear where the line should be, let's assume in the middle; you could compute limits outside ggplot, and add a dummy layer to set the scales,
library(ggplot2)
library(plyr)
# data frame where 3 is the middle
# 3 = (min + max) /2
dummy <- ddply(mtcars, "cyl", summarise,
min = 6 - max(wt),
max = 6 - min(wt))
ggplot(mtcars, aes(mpg, wt)) + geom_point() +
geom_blank(data=dummy, aes(y=min, x=Inf)) +
geom_blank(data=dummy, aes(y=max, x=Inf)) +
geom_hline (yintercept=3, colour="red", lty=6, lwd=1) +
facet_wrap( ~ cyl, scales = "free_y")

How would you plot a box plot and specific points on the same plot?

We can draw box plot as below:
qplot(factor(cyl), mpg, data = mtcars, geom = "boxplot")
and point as:
qplot(factor(cyl), mpg, data = mtcars, geom = "point")
How would you combine both - but just to show a few specific points(say when wt is less than 2) on top of the box?
If you are trying to plot two geoms with two different datasets (boxplot for mtcars, points for a data.frame of literal values), this is a way to do it that makes your intent clear. This works with the current (Sep 2016) version of ggplot (ggplot2_2.1.0)
library(ggplot2)
ggplot() +
# box plot of mtcars (mpg vs cyl)
geom_boxplot(data = mtcars,
aes(x = factor(cyl), y= mpg)) +
# points of data.frame literal
geom_point(data = data.frame(x = factor(c(4,6,8)), y = c(15,20,25)),
aes(x=x, y=y),
color = 'red')
I threw in a color = 'red' for the set of points, so it's easy to distinguish them from the points generated as part of geom_boxplot
Use + geom_point(...) on your qplot (just add a + geom_point() to get all the points plotted).
To plot selectively just select those points that you want to plot:
n <- nrow(mtcars)
# plot every second point
idx <- seq(1,n,by=2)
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(cyl)[idx],y=mpg[idx]) ) # <-- see [idx] ?
If you know the points before-hand, you can feed them in directly e.g.:
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(c(4,6,8)),y=c(15,20,25)) ) # plot (4,15),(6,20),...
You can show both by using ggplot() rather than qplot(). The syntax may be a little harder to understand, but you can usually get much more done. If you want to plot both the box plot and the points you can write:
boxpt <- ggplot(data = mtcars, aes(factor(cyl), mpg))
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(aes(factor(cyl), mpg))
I don't know what you mean by only plotting specific points on top of the box, but if you want a cheap (and probably not very smart) way of just showing points above the edge of the box, here it is:
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(data = ddply(mtcars, .(cyl),summarise, mpg = mpg[mpg > quantile(mpg, 0.75)]), aes(factor(cyl), mpg))
Basically it's the same thing except for the data supplied to geom_point is adjusted to include only the mpg numbers in the top quarter of the distribution by cylinder. In general I'm not sure this is good practice because I think people expect to see points beyond the whiskers only, but there you go.

Resources