How to order facets by variable in ggplot2? - r

Suppose I have a graph like this:
library(tidyverse)
df <- mtcars %>%
group_by(cyl, gear) %>%
summarise(hp_mean = mean(hp))
ggplot(df, aes(x = gear, y = hp_mean)) +
geom_point(size = 2.12, colour = "black") +
theme_bw() +
facet_wrap(vars(cyl))
and would like to arrange the order of facets, according to the hp_mean value for gear=3. E.g. the facet with cyl=8 should be first as hp_mean for gear=3 is 194 which is the highest.
Any ideas?
All help is much appreaciated!

Might not be the tidiest answer out there but you could:
extract the level of hp when gear == 3 to create a variable to order by (hp_gear3)
use forcats::fct_reorder() to reorder by the mean of this value across gear (from group_by() command)
use .desc = TRUE to put in descending order
plot using stat_summary to do the mean calculation for you
mtcars %>%
group_by(gear) %>%
mutate(hp_gear3 = ifelse(gear == 3, hp, NA),
cyl = fct_reorder(factor(cyl),
hp_gear3,
mean,
na.rm = TRUE,
.desc = TRUE)) %>%
ggplot(aes(gear, hp)) +
stat_summary(fun = mean) +
facet_wrap(~cyl)

Related

How to connect means per group in ggplot?

I can do a scatterplot of two continuous variables like this:
mtcars %>%
ggplot(aes(x=mpg, y = disp)) + geom_point() +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
I use cut to create 5 groups of mpg intervals for cars (any better command would do as well). I like to see the intervals in the graph, thus they are easy to understand.
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x=mpg_groups, y = mean_disp)) + geom_point()
mpg_groups is a factor variable and can no longer be connected via geom_smooth().
# not working
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x=mpg_groups, y = mean_disp)) + geom_point() +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
What can I do with easy (tidyverse) code in order to create the mean values per group and connect them via line?
As a more or less general rule, when drawing a line via ggplot2 you have to explicitly set the group aesthetic in all cases where the variable mapped on x isn't a numeric, i.e. use e.g. group=1 to assign all observations to one group which I call 1 for simplicity:
library(ggplot2)
library(dplyr, warn=FALSE)
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x = mpg_groups, y = mean_disp, group = 1)) +
geom_point() +
geom_smooth(method = "auto", se = TRUE, fullrange = FALSE, level = 0.95)

Create bar graph-find the average mpg by the number of gears

mtcars %>%
group_by(gear, mpg) %>%
summarise(m = mean(mpg)) %>%
ggplot(aes(x = mpg, y = gear)) +
geom_bar(stat = "count")
I cannot figure out to create a bargraph with the average mpg by the number of gears
Is that what you need?
packages
library(dplyr)
library(ggplot2)
Average mpg (m) by the number of gears
mtcars %>%
group_by(gear) %>%
summarise(m = mean(mpg)) %>%
ungroup() %>%
ggplot(aes(y = m, x = gear)) +
geom_bar(stat = "identity")
First, we get the mean of mpg by gear. To do that, you want to group by gear (just gear. You don't need to group by mpg as well).
Ungroup, so you have a unified dataset.
Now you want to plot the mean you created (m) by gear. You can which of them go where. In this case, I put gear on the x-axis and the mean of mpg on the y-axis.
Given you have specific values for the mean, you don't have to count all the values. Just plot the specific value you have there. Thus, use stat = "identity" instead of stat = "count"
Now you can play with colors using fill argument in aes and change the titles and axis labels.
output
In base R (i.e. without additional libraries) you might do
with(mtcars, tapply(mpg, gear, mean)) |>
barplot(xlab='gear', ylab='n', col=4, main='My plot')

Add 2 additional lines to a ggplot

I have successfully plotted my dat below. However, I want to do the following data transformation: dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm) and then ADD 2 lines to my current plot:
geom_smooth() using x.cm as x
geom_smooth() using x.cwc as x
Is there a way to do this?
p.s: Is it also possible to display the 3 unique(x.cm) values as 3 stars on the plot? (see pic below)
library(tidyverse)
dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/cw.csv')
dat %>% group_by(groups) %>% ggplot() +
aes(x, y, color = groups, shape = groups)+
geom_point(size = 2) + theme_classic()+
stat_ellipse()
# Now do the transformation:
dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm)
I'm not sure the second part of your question makes any sense to me. But from the description one way is to simply add a level where you alter you data and aes argument, as I do in the example below (using mtcars as an example data)
# Load libraries and data
library(tidyverse)
library(ggplot2)
data(mtcars)
mtcars %>% ggplot(aes(x = hp, y = mpg)) +
geom_point(aes(col = factor(cyl))) +
stat_ellipse(aes(col = factor(cyl))) +
# Add line for ellipsis center
geom_line(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
geom_point(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
# Add smooth for.. what? I don't understand this part of the question.
geom_smooth(data = mtcars %>% group_by(cyl) %>% mutate(x_val = hp - mean(hp)) %>% ungroup(),
mapping = aes(x = x_val, y = mpg))
Now it should be quite clear which part does not make sense to me. Why/what do you mean with the second path (geom_smooth)? Moving the x-axis on the smoother makes no sense to me. Also I took the liberty of changing the definition of the first part, by instead adding the single points of the mean (center of circles) to the plot and connecting the using geom_line.

Annotate facet plot with grouped variables

I would like to place the numbers of observations above a facet boxplot. Here is an example:
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n())
ggplot(exmp, aes(x = am, fill = gear, y = wt)) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(aes(y = 6, label = N))
So, I already created column N to get the label over each box in the boxplot (combination of cyl, am and gear). How do I plot these labels so that they are over the respective box? Please note that the number of levels of gear for each level of am differs on purpose.
I really looked at a lot of ggplot tutorials and there are tons of questions dealing with annotating in facet plots. But none addressed this fairly common problem...
You need to give position_dodge() inside geom_textto match the position of the boxes, also define data argument to get the distinct value of observations:
ggplot(exmp, aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
facet_grid(.~cyl) +
geom_text(data = dplyr::distinct(exmp, N),
aes(y = 6, label = N), position = position_dodge(0.9))
One minor issue here is that you are printing the N value once for every data point, not once for every cyl/am/gear combination. So you might want to add a filtering step to avoid overplotting that text, which can look messy on screen, reduce your control over alpha, and slow down plotting in cases with larger data.
library(tidyverse)
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(am = as.factor(am),
gear = as.factor(gear))
(The data prep above was necessary for me to get the plot to look like your example. I'm using tidyverse 1.2.1 and ggplot2 3.2.1)
ggplot(exmp, aes(x = am, fill = gear, y = wt,
group = interaction(gear, am))) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(data = exmp %>% distinct(cyl, gear, am, N),
aes(y = 6, label = N),
position = position_dodge(width = 0.8))
Here's the same chart with overplotting:
Perhaps using position_dodge() in your geom_text() will get you what you want?
mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ggplot(aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
geom_text(aes(y = 6, label = N), position = position_dodge(width = 0.7)) +
facet_grid(.~cyl)

How do you dodge a horizontal linerange in ggplot?

How do you dodge a ggstance::geom_linerangeh in ggplot2?
library(tidyverse)
library(ggstance)
mtcars %>%
group_by(cyl, am) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ggplot() +
geom_linerangeh(aes(y = am %>%
factor,
xmin = lo,
xmax = hi,
group = am %>%
factor),
position = position_dodgev(height = .25)) +
facet_wrap(~cyl, ncol = 1)
results in :
whereas I would like to see the lines sitting slightly below the horizontals, consistent with the standard behaviour of position_dodge elsewhere.
To get dodging, you need to map colour or linetype to another variable that splits am into sub-categories based on that third variable; otherwise there's only one category for each level of am and therefore nothing to dodge.
For example, let's use vs as that other variable and we'll map it to color. We also add rows (using complete) for missing combinations of am,vs, and cyl to ensure that dodging occurs even for combinations of cyl and am where only one level of vs is present in the data.
library(tidyr)
mtcars %>%
group_by(vs=factor(vs), cyl=factor(cyl), am=factor(am)) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ungroup() %>%
complete(am, cyl, nesting(vs)) %>%
ggplot() +
geom_linerangeh(aes(y = am, colour=vs, xmin = lo, xmax = hi),
position = position_dodgev(height = 0.5)) +
facet_wrap(~cyl, ncol = 1) +
theme_bw()

Resources