Annotate facet plot with grouped variables - r

I would like to place the numbers of observations above a facet boxplot. Here is an example:
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n())
ggplot(exmp, aes(x = am, fill = gear, y = wt)) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(aes(y = 6, label = N))
So, I already created column N to get the label over each box in the boxplot (combination of cyl, am and gear). How do I plot these labels so that they are over the respective box? Please note that the number of levels of gear for each level of am differs on purpose.
I really looked at a lot of ggplot tutorials and there are tons of questions dealing with annotating in facet plots. But none addressed this fairly common problem...

You need to give position_dodge() inside geom_textto match the position of the boxes, also define data argument to get the distinct value of observations:
ggplot(exmp, aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
facet_grid(.~cyl) +
geom_text(data = dplyr::distinct(exmp, N),
aes(y = 6, label = N), position = position_dodge(0.9))

One minor issue here is that you are printing the N value once for every data point, not once for every cyl/am/gear combination. So you might want to add a filtering step to avoid overplotting that text, which can look messy on screen, reduce your control over alpha, and slow down plotting in cases with larger data.
library(tidyverse)
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(am = as.factor(am),
gear = as.factor(gear))
(The data prep above was necessary for me to get the plot to look like your example. I'm using tidyverse 1.2.1 and ggplot2 3.2.1)
ggplot(exmp, aes(x = am, fill = gear, y = wt,
group = interaction(gear, am))) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(data = exmp %>% distinct(cyl, gear, am, N),
aes(y = 6, label = N),
position = position_dodge(width = 0.8))
Here's the same chart with overplotting:

Perhaps using position_dodge() in your geom_text() will get you what you want?
mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ggplot(aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
geom_text(aes(y = 6, label = N), position = position_dodge(width = 0.7)) +
facet_grid(.~cyl)

Related

How to connect means per group in ggplot?

I can do a scatterplot of two continuous variables like this:
mtcars %>%
ggplot(aes(x=mpg, y = disp)) + geom_point() +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
I use cut to create 5 groups of mpg intervals for cars (any better command would do as well). I like to see the intervals in the graph, thus they are easy to understand.
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x=mpg_groups, y = mean_disp)) + geom_point()
mpg_groups is a factor variable and can no longer be connected via geom_smooth().
# not working
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x=mpg_groups, y = mean_disp)) + geom_point() +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
What can I do with easy (tidyverse) code in order to create the mean values per group and connect them via line?
As a more or less general rule, when drawing a line via ggplot2 you have to explicitly set the group aesthetic in all cases where the variable mapped on x isn't a numeric, i.e. use e.g. group=1 to assign all observations to one group which I call 1 for simplicity:
library(ggplot2)
library(dplyr, warn=FALSE)
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x = mpg_groups, y = mean_disp, group = 1)) +
geom_point() +
geom_smooth(method = "auto", se = TRUE, fullrange = FALSE, level = 0.95)

Difference between geom_col() and geom_point() for same value

So, I'm trying to plot missing values here over time (longitudinal data).
I would prefer placing them in a geom_col() to fill up with colours of certain treatments afterwards. But for some weird reason, geom_col() gives me weird values, while geom_point() gives me the correct values using the same function. I'm trying to wrap my head around why this is happening. Take a look at the y-axis.
Disclaimer:
I know the missing values dissappear on day 19-20. This is why I'm making the plot.
Sorry about the lay-out of the plot. Not polished yet.
For the geom_point:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_point()
Picture: geom_point
For the geom_col:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_col()
Picture: geom_col
The problem is that you're using mutate and creating several rows for your groups. You cannot see that, but you will have plenty of points overlapping in your geom_point plot.
One way is to either use summarise, or you use distinct
Compare
library(tidyverse)
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_point()
The points look ugly because there is a lot of over plotting.
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
distinct(order, .keep_all = TRUE) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
Created on 2021-06-02 by the reprex package (v2.0.0)
So after some digging:
What happens was that the geom_col() function sums up all the missing values while geom_point() does not. Hence the large values for y. Why this is happening, I do not know. However doing the following worked fine for me:
gaussian_transformed$time <- as.factor(gaussian_transformed$time)
gaussian_transformed %>% group_by(time) %>% summarise(missing = sum(is.na(Rose_width))) -> gaussian_transformed
gaussian_transformed %>% ggplot(aes(x = time, y = missing)) + geom_col(fill = "blue", alpha = 0.5) + theme_minimal() + labs(title = "Missing values in Gaussian Outcome over the days", x = "Time (in days)", y = "Amount of missing values") + scale_y_continuous(breaks = seq(0, 10, 1))
With the plot: GaussianMissing

Add 2 additional lines to a ggplot

I have successfully plotted my dat below. However, I want to do the following data transformation: dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm) and then ADD 2 lines to my current plot:
geom_smooth() using x.cm as x
geom_smooth() using x.cwc as x
Is there a way to do this?
p.s: Is it also possible to display the 3 unique(x.cm) values as 3 stars on the plot? (see pic below)
library(tidyverse)
dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/cw.csv')
dat %>% group_by(groups) %>% ggplot() +
aes(x, y, color = groups, shape = groups)+
geom_point(size = 2) + theme_classic()+
stat_ellipse()
# Now do the transformation:
dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm)
I'm not sure the second part of your question makes any sense to me. But from the description one way is to simply add a level where you alter you data and aes argument, as I do in the example below (using mtcars as an example data)
# Load libraries and data
library(tidyverse)
library(ggplot2)
data(mtcars)
mtcars %>% ggplot(aes(x = hp, y = mpg)) +
geom_point(aes(col = factor(cyl))) +
stat_ellipse(aes(col = factor(cyl))) +
# Add line for ellipsis center
geom_line(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
geom_point(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
# Add smooth for.. what? I don't understand this part of the question.
geom_smooth(data = mtcars %>% group_by(cyl) %>% mutate(x_val = hp - mean(hp)) %>% ungroup(),
mapping = aes(x = x_val, y = mpg))
Now it should be quite clear which part does not make sense to me. Why/what do you mean with the second path (geom_smooth)? Moving the x-axis on the smoother makes no sense to me. Also I took the liberty of changing the definition of the first part, by instead adding the single points of the mean (center of circles) to the plot and connecting the using geom_line.

Using expand_limits in ggplot when counts are not a column in the dataset (R)

I have a dataset and I need to plot a bar chart of the counts of the different outcomes of a certain column. For this example I am using the mtcars dataset.
When I first attempted this, I found that the labels on the bars were getting cut off at the top, so I used the expand_limits argument to give them more space. As I want to be able to use this code for refreshed data, the limits might change, which is why I've used the max() function.
mtcars_cyl_counts <- as.data.frame(table(mtcars$cyl))
colnames(mtcars_cyl_counts)[1:2] <- c("cyl", "counts")
mtcars_cyl_counts %>%
arrange(desc(counts)) %>%
ggplot(aes(x = reorder(cyl, -counts), y = counts)) +
geom_bar(stat = "identity") +
geom_text(aes(label = comma(counts), vjust = -0.5), size = 3) +
expand_limits(y = max((mtcars_cyl_counts$counts) * 1.05))
This works fine, but it seemed unnecessarily cumbersome to create a separate table of counts, and made some of the future code more complicated, so I redid this:
mtcars %>%
group_by(cyl) %>%
summarize(counts = n()) %>%
arrange(-counts) %>%
mutate(cyl = factor(cyl, cyl)) %>%
ggplot() +
geom_bar(aes(x = cyl, y = counts), stat = "identity") +
geom_text(aes(x = cyl, y = counts, label = comma(counts), vjust = -0.5), size = 3) +
expand_limits(y = max((counts) * 1.05))
However, this returns the following error:
Error in data.frame(..., stringsAsFactors = FALSE) :
object 'counts' not found
I get that 'counts' is not technically in the mtcars dataset (which is why it also doesn't work if I use mtcars$counts), but it's what I've used elsewhere in the code for y definitions.
So, is there a way to write this so that it works, or an alternative way to expand the vertical limits in a way that will adapt to different datasets?
(NB: with these examples, the bar labels don't get cut off because they aren't very big, but for the purpose of this I just need the limits expanded so that detail is not critically important to the working...)
If this helps Megan,
mtcars %>%
count(cyl) %>%
arrange(-n) %>%
mutate(cyl = factor(cyl, cyl)) %>%
ggplot(aes(cyl, n)) +
geom_text(vjust = -0.5, aes(label = n)) +
geom_bar(stat = "identity") +
expand_limits(y = max(table(mtcars$cyl) * 1.05))

How do you dodge a horizontal linerange in ggplot?

How do you dodge a ggstance::geom_linerangeh in ggplot2?
library(tidyverse)
library(ggstance)
mtcars %>%
group_by(cyl, am) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ggplot() +
geom_linerangeh(aes(y = am %>%
factor,
xmin = lo,
xmax = hi,
group = am %>%
factor),
position = position_dodgev(height = .25)) +
facet_wrap(~cyl, ncol = 1)
results in :
whereas I would like to see the lines sitting slightly below the horizontals, consistent with the standard behaviour of position_dodge elsewhere.
To get dodging, you need to map colour or linetype to another variable that splits am into sub-categories based on that third variable; otherwise there's only one category for each level of am and therefore nothing to dodge.
For example, let's use vs as that other variable and we'll map it to color. We also add rows (using complete) for missing combinations of am,vs, and cyl to ensure that dodging occurs even for combinations of cyl and am where only one level of vs is present in the data.
library(tidyr)
mtcars %>%
group_by(vs=factor(vs), cyl=factor(cyl), am=factor(am)) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ungroup() %>%
complete(am, cyl, nesting(vs)) %>%
ggplot() +
geom_linerangeh(aes(y = am, colour=vs, xmin = lo, xmax = hi),
position = position_dodgev(height = 0.5)) +
facet_wrap(~cyl, ncol = 1) +
theme_bw()

Resources