I want to split a percentage histogram (that integrates to 100%) into two facets using facet_grid. However, when splitting to facets, each facet by itself doesn't integrate to 100%. This kind of question has been resolved here in the past, but I cannot translate that solution to my current situation where x is a factor, and thus a histogram using stat(density) doesn't work.
My Data
Dataframe with two columns. equipment denotes whether a household has enough equipment for homeschooling, and children_n denotes number of children.
library(tidyverse)
library(magrittr)
df <-
structure(list(equipment = c(1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,
1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0,
0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1,
0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0,
1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1,
1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1,
0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1,
0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0,
1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0,
0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0,
1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,
1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1,
1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1,
0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1,
1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1,
1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1,
1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0,
0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1,
1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1,
1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0,
1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0,
1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,
0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0,
0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1,
0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0,
0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0,
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1,
1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1,
0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0,
0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1,
1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1,
1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1,
1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,
1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1,
1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1), children_n = c(4,
4, 2, 2, 2, 1, 1, 3, 2, 3, 3, 7, 3, 2, 1, 2, 1, 1, 3, 3, 3, 2,
3, 3, 3, 2, 4, 3, 1, 2, 3, 4, 4, 1, 2, 5, 2, 8, 1, 2, 1, 2, 2,
3, 4, 3, 3, 3, 3, 2, 3, 2, 2, 4, 3, 3, 3, 4, 3, 1, 1, 2, 1, 1,
2, 1, 3, 3, 2, 3, 3, 3, 4, 2, 2, 2, 3, 5, 2, 2, 2, 2, 1, 2, 4,
3, 4, 3, 3, 1, 2, 3, 3, 3, 2, 4, 4, 3, 1, 3, 2, 2, 2, 3, 1, 1,
1, 3, 1, 2, 2, 2, 3, 6, 3, 2, 2, 6, 3, 4, 3, 2, 3, 3, 2, 2, 2,
3, 2, 3, 3, 6, 3, 1, 4, 3, 4, 9, 1, 1, 3, 4, 2, 2, 1, 2, 3, 1,
3, 3, 6, 4, 1, 3, 2, 2, 3, 2, 3, 2, 4, 3, 1, 3, 3, 2, 3, 2, 2,
4, 2, 2, 3, 3, 3, 1, 3, 3, 2, 4, 2, 7, 3, 3, 3, 2, 2, 2, 4, 3,
1, 1, 3, 4, 1, 4, 3, 4, 3, 3, 2, 3, 3, 3, 2, 3, 3, 2, 3, 3, 3,
3, 1, 1, 2, 2, 4, 2, 3, 3, 2, 2, 1, 2, 5, 2, 2, 2, 5, 3, 2, 2,
4, 2, 1, 3, 4, 4, 3, 3, 4, 3, 3, 1, 3, 2, 1, 8, 2, 3, 2, 3, 3,
2, 3, 3, 1, 3, 3, 4, 2, 3, 3, 3, 2, 6, 1, 2, 2, 2, 2, 2, 2, 4,
3, 5, 4, 1, 2, 2, 2, 4, 2, 3, 3, 1, 3, 2, 1, 2, 2, 3, 3, 3, 3,
1, 3, 4, 2, 1, 3, 4, 2, 1, 3, 4, 3, 4, 2, 3, 3, 2, 7, 1, 2, 1,
3, 2, 2, 2, 2, 3, 3, 3, 2, 3, 1, 2, 2, 3, 2, 4, 3, 2, 3, 3, 5,
3, 5, 3, 5, 1, 2, 1, 4, 1, 4, 2, 2, 3, 2, 2, 2, 3, 2, 3, 3, 3,
3, 4, 3, 8, 3, 1, 2, 3, 3, 2, 1, 3, 2, 2, 3, 3, 4, 4, 2, 2, 3,
1, 2, 3, 2, 3, 3, 2, 1, 3, 3, 2, 3, 3, 3, 4, 1, 2, 3, 3, 3, 4,
2, 1, 3, 4, 2, 3, 3, 2, 2, 2, 2, 2, 3, 3, 3, 1, 3, 3, 1, 1, 3,
2, 1, 3, 2, 4, 1, 3, 2, 3, 2, 2, 2, 4, 1, 2, 3, 2, 3, 2, 2, 1,
3, 1, 3, 1, 3, 3, 2, 1, 2, 3, 2, 3, 1, 2, 1, 2, 2, 3, 3, 4, 1,
2, 4, 2, 4, 2, 2, 2, 1, 3, 2, 1, 1, 4, 3, 4, 3, 2, 2, 2, 3, 7,
3, 1, 3, 3, 3, 2, 1, 3, 2, 3, 3, 2, 4, 1, 1, 1, 4, 3, 3, 4, 3,
8, 2, 4, 5, 3, 2, 3, 1, 2, 1, 2, 2, 3, 1, 4, 3, 2, 2, 3, 3, 3,
3, 1, 2, 1, 2, 3, 3, 2, 2, 2, 2, 3, 3, 4, 5, 3, 2, 2, 2, 3, 1,
3, 3, 4, 2, 1, 3, 3, 3, 4, 2, 1, 2, 1, 2, 2, 3, 3, 4, 1, 1, 6,
3, 2, 2, 2, 6, 3, 3, 2, 2, 1, 4, 2, 3, 3, 3, 2, 2, 3, 3, 2, 4,
6, 1, 1, 1, 1, 3, 9, 4, 2, 3, 2, 2, 2, 4, 3, 3, 4, 1, 2, 6, 3,
3, 3, 2, 2, 3, 4, 2, 3, 2, 2, 3, 2, 3, 4, 7, 2, 3, 3, 2, 3, 2,
3, 4, 3, 3, 3, 2, 2, 2, 1, 3, 4, 2, 1, 3, 4, 1, 3, 4, 4, 3, 3,
3, 3, 3, 2, 3, 3, 3, 5, 3, 3, 5, 2, 2, 1, 1, 2, 2, 2, 3, 1, 3,
2, 2, 2, 4, 2, 2, 2, 4, 1, 3, 4, 3, 3, 4, 3, 2, 1, 3, 4, 8, 1,
2, 3, 3, 3, 3, 2, 3, 3, 1, 3, 4, 2, 3, 2, 6, 3, 1, 2, 2, 2, 2,
2, 4, 3, 5, 1, 2, 2, 2, 4, 2, 3, 3, 1, 1, 2, 2, 3, 3, 2, 3, 3,
3, 3, 1, 4, 4, 2, 3, 3, 1, 4, 3, 4, 2, 3, 3, 2, 7, 1, 4, 1, 2,
2, 3, 2, 5, 2, 3, 2, 3, 1, 3, 2, 2, 3, 2, 4, 2, 3, 3, 3, 3, 1,
5, 5, 1, 1, 2, 3, 1, 4, 2, 2, 3, 2, 2, 2, 3, 3, 3, 3, 2, 3, 4,
8, 3, 2, 3, 1, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 4, 2, 3, 2, 1, 3,
2, 3, 3, 2, 3, 3, 2, 3, 2, 3, 3, 1, 1, 2, 4, 3, 4, 3, 1, 3, 4,
2, 3, 3, 2, 2, 2, 2, 2, 3, 3, 3, 1, 3, 3, 2, 1, 1, 4, 1, 3, 2,
1, 2, 3, 3, 2, 2, 2, 4, 2, 1, 3, 2, 3, 2, 1, 3, 1, 3, 1, 3, 3,
2, 1, 2, 3, 2, 3, 1, 2, 2, 2, 3, 3, 2, 3, 1, 3, 3, 3, 3, 2, 4,
2, 4, 4, 1, 2, 1, 2, 1, 3, 3, 3, 2, 3, 3, 4, 2, 2, 3, 2, 1, 2,
2, 1, 1, 3, 1, 2, 3, 3, 3, 2, 1, 1, 1, 2, 1, 2, 5, 1, 2, 1, 4,
2, 2, 2, 1, 4, 2, 3, 3, 3, 2, 4, 5, 4, 2, 4, 2, 3, 1, 4, 3, 3,
2, 3, 3, 2, 3, 2, 1, 3, 2, 4, 2, 3, 4, 1, 2, 3, 1, 3, 3, 4, 2,
2, 2, 3, 3, 2, 1, 2, 2, 1, 3, 1, 3, 1, 1, 1, 3, 2, 2, 4, 3, 4,
3, 3, 4, 1, 1, 3, 3, 2, 3, 2, 3, 2, 1, 3, 3, 1, 5, 1, 1, 2, 4,
2, 3, 5, 4, 1, 3, 2, 1, 2, 2, 4, 3, 4, 2, 2, 1, 3, 2, 4, 2, 3,
3, 2, 3, 2, 1, 2, 3, 4)), row.names = c(NA, -1059L), class = c("tbl_df",
"tbl", "data.frame"))
df
## # A tibble: 1,059 x 2
## equipment children_n
## <dbl> <dbl>
## 1 1 4
## 2 0 4
## 3 1 2
## 4 1 2
## 5 0 2
## 6 1 1
## 7 1 1
## 8 1 3
## 9 1 2
## 10 1 3
## # ... with 1,049 more rows
In cases where number of children is above 6, I want to collapse those cases to one category of "6+".
df %<>%
mutate_at(vars(children_n), as.character) %>%
mutate_at(vars(children_n), recode, "9" = "6_plus", "8" = "6_plus", "7" = "6_plus", "6" = "6_plus") %>%
mutate_at(vars(children_n), fct_relevel, "1", "2", "3", "4", "5", "6_plus")
glimpse(df)
## Rows: 1,059
## Columns: 2
## $ equipment <dbl> 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, ...
## $ children_n <fct> 4, 4, 2, 2, 2, 1, 1, 3, 2, 3, 3, 6_plus, 3, 2, 1, 2, 1, 1, 3, 3, 3, 2, 3, 3, 3, 2, 4, 3, 1, 2, 3, 4, 4, 1, 2, 5, 2, 6_plus, 1, 2, 1, 2,...
Now I want to plot the proportion of number of children in two separate panels: one panel for families who have enough equipment, and another panel for families who don't have enough equipment:
df %>%
ggplot(data = ., aes(x = children_n, y = equipment)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), stat = "count" , fill = "darkblue") +
geom_text(aes(label = scales::percent(((..count..)/sum(..count..)), accuracy = 1),
y = ((..count..)/sum(..count..)) ), stat= "count", vjust = -.5, color = "darkblue") +
scale_y_continuous(labels = scales::percent) +
facet_grid(~ equipment, labeller = as_labeller(c("1" = "have enough equipment",
"0" = "don't have enough equipment")))
This gives two panels that *DON'T* integrate to 100% independently:
Trying to solve the problem
I found this question that describes the same intention and problem. The chosen solution suggests defining the geom_histogram as density so it integrates to 100%. But this won't work in my case because stat(density) asks that the x variable will be continuous, unlike my case where x is a factor.
df %>%
ggplot(data = ., aes(x = children_n, y = equipment)) +
geom_histogram(aes(y = stat(density) * 6), binwidth = 6, fill = "darkblue") +
facet_grid(~ equipment, labeller = as_labeller(c("1" = "have enough equipment",
"0" = "don't have enough equipment")))
Error: StatBin requires a continuous x variable: the x variable is
discrete. Perhaps you want stat="count"?
Other approaches suggest using ..PANEL.. while others are strongly against it.
How can I get the two facets to show percents that independently integrate to 100%, in a proper way?
This could be achieved like so:
Map the facetting variable on the group aes
Use e.g. tapply to get the total number per group or facet
BTW: I have put the code for the normalization inside a helper function to reduce the code duplication and readability
library(tidyverse)
library(magrittr)
df %<>%
mutate_at(vars(children_n), as.character) %>%
mutate_at(vars(children_n), recode, "9" = "6_plus", "8" = "6_plus", "7" = "6_plus", "6" = "6_plus") %>%
mutate_at(vars(children_n), fct_relevel, "1", "2", "3", "4", "5", "6_plus")
help <- function(count, group) {
count / tapply(count, group, sum)[group]
}
df %>%
ggplot(data = ., aes(x = children_n, y = equipment, group = equipment)) +
geom_histogram(aes(y = help(..count.., ..group..)), stat = "count" , fill = "darkblue") +
geom_text(aes(label = scales::percent(help(..count.., ..group..), accuracy = 1),
y = help(..count.., ..group..) ), stat= "count", vjust = -.5, color = "darkblue") +
scale_y_continuous(labels = scales::percent) +
facet_grid(~ equipment, labeller = as_labeller(c("1" = "have enough equipment",
"0" = "don't have enough equipment")))
#> Warning: Ignoring unknown parameters: binwidth, bins, pad