R, ggplot2, limit rows in a faceted barplot - r

Sup,
Consider the following lines:
data
df=data.frame(
prod=sample(1:30, 1000, replace=TRUE),
mat=sample(c('yes', 'no'), 1000, replace=TRUE),
fj=sample(c(1,2), 1000, replace = TRUE)
)
plot
df %>%
group_by(mat, prod, fj) %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
slice(1:5) %>%
ggplot(aes(x = reorder(prod, n), y = n)) +
geom_col(fill = RColorBrewer::brewer.pal(3, 'Dark2')[2], colour = "grey", alpha = 0.8) +
labs(x = "Prod", y = "Qnt") +
scale_y_continuous(labels = scales::comma) +
coord_flip() +
facet_wrap(fj ~ mat, scale="free") +
theme_minimal()
which gives me
Now, if i drop fj variable, as in
df %>%
group_by(mat, prod) %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
slice(1:5) %>%
ggplot(aes(x = reorder(prod, n), y = n)) +
geom_col(fill = RColorBrewer::brewer.pal(3, 'Dark2')[2], colour = "grey", alpha = 0.8) +
labs(x = "Prod", y = "Qnt") +
scale_y_continuous(labels = scales::comma) +
coord_flip() +
facet_wrap(~ mat, scale="free") +
theme_minimal()
slice(1:5) does it's job and i've got:
Question
why slice and reorder doesn't seems to work properly when there's 3+ variables and what should i do to limit the first plot to 5 lines each?

When you call summarize you loose one level of grouping. In this case, you lost fj, so when you slice it's not included in the group divisions.
If you first ungroup then group_by mat and fj, I think you'll end up with what you are looking for.
df %>%
group_by(mat, prod, fj) %>%
summarise(n = n()) %>%
ungroup()%>%
group_by(mat, fj) %>%
arrange(desc(n)) %>%
slice(1:5) %>%
ggplot(aes(x = reorder(prod, n), y = n)) +
geom_col(fill = RColorBrewer::brewer.pal(3, 'Dark2')[2], colour = "grey", alpha = 0.8) +
labs(x = "Prod", y = "Qnt") +
scale_y_continuous(labels = scales::comma) +
coord_flip() +
facet_wrap(fj ~ mat, scale="free") +
theme_minimal()
This leaves the problem of reordering the prod variable within each facet. It doesn't work in the example above because you are ordering by the entire data frame, and some of the values of Prod are repeated in several of the facets. As discussed in this blog post by #drsimonj you need to create an order variable and plot based on that. This follows/blatently copies the method outlined in the blog post.
df %>%
group_by(mat, prod, fj) %>%
summarise(n = n()) %>%
group_by(mat, fj) %>%
arrange(desc(n)) %>%
slice(1:5) %>%
ungroup() %>%
arrange(fj,mat, n) %>% # arrange the entire table by the facets first, then by the n value
mutate(row.order = row_number()) %>% # create dummy variable
ggplot(aes(x = row.order, y = n)) + # plot by the dummy variable
geom_col(fill = RColorBrewer::brewer.pal(3, 'Dark2')[2], colour = "grey", alpha = 0.8, position = "dodge") +
labs(x = "Prod", y = "Qnt") +
scale_y_continuous(labels = scales::comma) +
scale_x_continuous( # add back in the Prod values
breaks = df2$row.order,
labels = df2$prod
)+
coord_flip() +
facet_wrap(fj ~ mat, scales = "free") +
theme_minimal()

Related

Order grouped scatterplot by mean

I am plotting a geom_point for several groups (Loc) and want in addition a line that indicates the mean of the points for each group. The groups should be ordered based on the mean of the Size for each group. I am trying to do this by reorder(Loc, Size.Mean) but it does not reorder.
ggplot(data,aes(Loc,Size,color=Loc)) +
geom_point() +
geom_point(data %>%
group_by(Loc) %>%
summarise(Size.Mean = mean(Size)),
mapping = aes(y = Size.Mean, x = reorder(Loc, Size.Mean)),
color = "black", shape = '-') +
theme_pubr(base_size=8) +
scale_y_continuous(trans="log10") +
theme(axis.text.x = element_text(angle = 90,hjust = 1)) +
theme(legend.position = "none")
ggplot orders discrete x ticks according to their level if the variable is a factor:
library(tidyverse)
iris_means <-
iris %>%
group_by(Species) %>%
summarise(mean = mean(Sepal.Length)) %>%
arrange(-mean)
iris %>%
mutate(Species = Species %>% factor(levels = iris_means$Species)) %>%
ggplot(aes(Species, Sepal.Length)) +
geom_point() +
geom_crossbar(data = iris_means, mapping = aes(y = mean, ymin = mean, ymax = mean), color = "red")
Created on 2021-09-10 by the reprex package (v2.0.1)

ggplot2: How to reorder stacked bar charts by proportions of fill variable

I'm working with the "NYC Property Sales" dataset which is available on kaggle:
https://www.kaggle.com/new-york-city/nyc-property-sales?select=nyc-rolling-sales.csv
After cleaning the dataset, I produced the following barplot with this code:
nyc_clean %>%
filter(year == 2017,
borough == "Manhatten") %>%
add_count(neighborhood) %>%
mutate(neighborhood = fct_reorder(neighborhood, n) %>% fct_rev()) %>%
filter(as.numeric(neighborhood) <= 13) %>%
distinct(borough, block, lot, .keep_all = TRUE) %>%
pivot_longer(c("residential_units", "commercial_units"),
names_to = "type",
values_to = "count") %>%
mutate(neighborhood = fct_reorder(neighborhood, as.numeric(as.factor(type)),
mean, na.rm = TRUE)) %>%
ggplot(aes(neighborhood, count, fill = type)) +
geom_col(position = "fill") +
scale_y_continuous(labels = percent) +
coord_flip() +
theme_light()
I want to reorder the barplot so that the proportion of residential units is in a descending order (from top to bottom). In the code above, I tried to reorder the neighborhoods with fct_reorder but it doesn't have any effect on the plot.
As a reproducible example, consider this dataset:
df <- tibble(neighborhood = c(rep("Chelsea", 4), rep("Tribeca", 4),
rep("Flatiron", 4)),
type = c("residential_unit", "commercial_unit", "residential_unit",
"commercial_unit", "residential_unit", "commercial_unit",
"residential_unit", "commercial_unit", "residential_unit",
"commercial_unit", "residential_unit", "commercial_unit"),
count = c(8, 3, 9, 1, 5, 4, 6, 3, 12, 2, 10, 1))
When trying to reorder the plot, the bars are ordered equally messy as in my output above:
df %>%
mutate(neighborhood = fct_reorder(neighborhood, as.numeric(as.factor(type)),
mean, na.rm = TRUE)) %>%
ggplot(aes(neighborhood, count, fill = type)) +
geom_col(position = "fill") +
scale_y_continuous(labels = scales::percent) +
coord_flip() +
theme_light()
Any ideas on what I'm missing here?
Hopefully this makes up for lack of concision with clarity:
df %>%
left_join( # Add res_share for each neighborhood
df %>%
group_by(neighborhood) %>%
mutate(share = count / sum(count)) %>%
ungroup() %>%
filter(type == "residential_unit") %>%
select(neighborhood, res_share = share)
) %>%
mutate(neighborhood = fct_reorder(neighborhood, res_share)) %>%
ggplot(aes(neighborhood, count, fill = type)) +
geom_col(position = "fill") +
scale_y_continuous(labels = scales::percent) +
coord_flip() +
theme_light()
One way would be to arrange the data by 'residential_unit' and count and assign factor levels in the order they appear.
library(dplyr)
library(ggplot2)
df %>%
group_by(neighborhood, type) %>%
summarise(prop = sum(count)) %>%
mutate(prop = prop.table(prop)) %>%
arrange(type != 'residential_unit', prop) %>%
pull(neighborhood) %>% unique -> levels
df %>%
mutate(neighborhood = factor(neighborhood, levels)) %>%
ggplot(aes(neighborhood, count, fill = type)) +
geom_col(position = "fill") +
scale_y_continuous(labels = scales::percent) +
coord_flip() +
theme_light()

How to add text label to show total n in each bar of stacked proportion bars in ggplot?

I have a stacked bar chart of proportions, so all bars total 100%. I would like to add a label to the end of each bar (i.e. on the far right-hand side of each bar, not within the bar itself) to show the total number of observations in each bar.
Something like this gets close-ish...
library(dplyr)
library(ggplot2)
data("mtcars")
mtcars %>%
# prep data
mutate(across(where(is.numeric), as.factor)) %>%
count(am, cyl, gear) %>%
mutate(prop = n / sum(n)) %>%
# plot
ggplot(aes(x = prop, y = cyl)) +
geom_col(aes(fill = gear),
position = "fill",
alpha = 0.8) +
facet_wrap(~am, ncol = 1) +
theme_minimal() +
scale_x_continuous(labels = scales::percent) +
# add labels to show total n for each bar
geom_text(aes(label = paste0("n = ", stat(y)), group = cyl),
stat = 'summary',
fun = sum)
...but (i) the values for my n labels clearly aren't the sums for each bar that I was expecting, and (ii) I can't figure out how to position the labels at the end of each bar. I thought I could specify a location on the x-axis within the geom_text aes, like this...
mtcars %>%
# prep data
mutate(across(where(is.numeric), as.factor)) %>%
count(am, cyl, gear) %>%
mutate(prop = n / sum(n)) %>%
# plot
ggplot(aes(x = prop, y = cyl)) +
geom_col(aes(fill = gear),
position = "fill",
alpha = 0.8) +
facet_wrap(~am, ncol = 1) +
theme_minimal() +
scale_x_continuous(labels = scales::percent) +
# add labels to show total n for each bar
geom_text(aes(label = paste0("n = ", stat(y)), group = cyl, x = 1),
stat = 'summary',
fun = sum)
...but I can't work out why that throws the x-axis scale out, and doesn't position all the labels at the same location on the scale.
Thanks in advance for any suggestions!
Try this:
library(dplyr)
library(ggplot2)
data("mtcars")
#Code
mtcars %>%
# prep data
mutate(across(where(is.numeric), as.factor)) %>%
count(am, cyl, gear) %>%
mutate(prop = n / sum(n)) %>%
# plot
ggplot(aes(x = prop, y = cyl)) +
geom_col(aes(fill = gear),
position = "fill",
alpha = 0.8) +
geom_text(aes(x=1.05,label = paste0("n = ", stat(y)), group = cyl),
hjust=0.5
)+
facet_wrap(~am, ncol = 1,scales = 'free')+
theme_minimal() +
scale_x_continuous(labels = scales::percent)
Output:
This is a modified version to add both proportions and numbers
library(dplyr)
library(ggplot2)
library(scales)
data("mtcars")
mtcars %>%
# prep data
mutate(across(where(is.numeric), as.factor)) %>%
count(am, cyl, gear) %>%
mutate(prop = n / sum(n)) %>%
# plot
ggplot(aes(x = prop, y = cyl)) +
geom_col(aes(fill = gear),
position = "fill", alpha = 0.8) +
theme_minimal() +
scale_x_continuous(labels = scales::percent) +
# add labels to show total n for each bar
geom_text(aes(x = 1.1, , group = cyl,
label = paste0("n = ", stat(y))),
hjust = 0.5) +
geom_text(aes(x = prop, y = cyl, group = gear,
label = paste0('p =',round(stat(x),2))),
hjust = 0.5, angle = 0,
position = position_fill(vjust = .5)) +
facet_wrap(~am, ncol = 1, scales = 'free')
It's not the most elegant solution, but I got there in the end by expanding on #Duck's answer for the positioning of labels (thanks!), and calculating the totals to be used as labels outside of ggplot.
mtcars %>%
# prep data
mutate(across(where(is.numeric), as.factor)) %>%
count(am, cyl, gear) %>%
group_by(cyl, am) %>%
mutate(prop = n / sum(n)) %>%
mutate(column_total = sum(n)) %>%
ungroup() %>%
# plot
ggplot(aes(x = prop, y = cyl)) +
geom_col(aes(fill = gear),
position = "fill",
alpha = 0.8) +
geom_text(aes(x = 1.05, label = paste0("n = ", column_total))) +
facet_wrap(~am, ncol = 1, scales = 'free')+
theme_minimal() +
scale_x_continuous(labels = scales::percent)

How to sort the double bar using ggplot in r?

I am learning r and I have problem with sorting the double bar in ascending or descending order and I want to set the legend just on the top of the plot with two color represent respectively with one row and two columns like for example:
The title Time
box color Breakfast box color Dinner
And the plot here
Here is my dataframe:
dat <- data.frame(
time = factor(c("Breakfast","Breakfast","Breakfast","Breakfast","Breakfast","Lunch","Lunch","Lunch","Lunch","Lunch","Lunch","Dinner","Dinner","Dinner","Dinner","Dinner","Dinner","Dinner"), levels=c("Breakfast","Lunch","Dinner")),
class = c("a","a","b","b","c","a","b","b","c","c","c","a","a","b","b","b","c","c"))
And here is my code to make change:
dat %>%
filter(time %in% c("Breakfast", "Dinner")) %>%
droplevels %>%
count(time, class) %>%
group_by(time) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = class, y = prop, fill = time, label = scales::percent(prop))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = 0.9), vjust = 0.5, size = 3) +
scale_y_continuous(labels = scales::percent)+
coord_flip()
Any help would be appreciated.
Something like this should be close to what you are asking, feel free to ask more
Resources consulted during the answer: http://www.sthda.com/english/wiki/ggplot2-legend-easy-steps-to-change-the-position-and-the-appearance-of-a-graph-legend-in-r-software
Using part of the answer you can look further into https://ggplot2.tidyverse.org/reference/theme.html
library(tidyverse)
dat <- data.frame(
time = factor(c("Breakfast","Breakfast","Breakfast","Breakfast","Breakfast","Lunch","Lunch","Lunch","Lunch","Lunch","Lunch","Dinner","Dinner","Dinner","Dinner","Dinner","Dinner","Dinner"), levels=c("Breakfast","Lunch","Dinner")),
class = c("a","a","b","b","c","a","b","b","c","c","c","a","a","b","b","b","c","c"))
dat %>%
filter(time %in% c("Breakfast", "Dinner")) %>%
droplevels %>%
count(time, class) %>%
group_by(time) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = fct_reorder(class,prop), y = prop, fill = time, label = scales::percent(prop))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = 0.9), vjust = 0.5, size = 3) +
scale_y_continuous(labels = scales::percent)+
coord_flip() +
labs(x = "class",fill = "Time") +
theme(legend.position = "top", legend.direction="vertical", legend.title=element_text(hjust = 0.5,face = "bold",size = 12))
Created on 2020-05-08 by the reprex package (v0.3.0)
To get the legend title above the legend key, requires a little additional adjustments to the theme and guides.
dat %>%
filter(time %in% c("Breakfast", "Dinner")) %>%
droplevels %>%
count(time, class) %>%
group_by(time) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = class, y = prop, fill = time, label = scales::percent(prop))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = 0.9), vjust = 0.5, size = 3) +
scale_y_continuous(labels = scales::percent)+
coord_flip() +
theme(legend.position="top", legend.direction="vertical", legend.title=element_text(hjust = 0.5))+
guides(fill = guide_legend(title = "Time", nrow = 1))

ggplot2: merge two legends

I'm trying to plot an area with two different set of points with ggplot2 but I get always two different legends. I've read this and this but I still have two legends.
Below the code and the chart.
Thank you very much
library(ggplot2)
library(dplyr)
set.seed(1)
df <- data.frame(x = letters,
y = 1:26 +runif(26),
z = 2*(1:26) + runif(26),
jj = 1:26,
hh = 1:26*2,
x1 = 1:26)
some_names <- df %>%
filter(row_number() %% 10 == 1) %>%
select(x,x1)
p <- df %>%
ggplot(aes(x1)) +
geom_ribbon(aes(ymin = y, ymax = z, fill = "area")) +
geom_point(aes(y = jj, colour = "points1")) +
geom_point(aes(y = hh, colour = "points2")) +
scale_x_continuous(breaks = some_names %>% select(x1) %>% unlist %>% unname,
labels = some_names %>% select(x) %>% unlist %>% unname )
p + scale_fill_manual(name = "legend",
values = c("area" = "red","points1" = NA,"points2" = NA)) +
scale_colour_manual(name = "legend",
values = c("area" = NA ,"points1" = "blue","points2" = "purple"))
You could do something in the veins of
library(tidyverse)
packageVersion("ggplot2")
# [1] ‘2.2.1’
df %>%
gather(var, val, jj, hh) %>%
ggplot(aes(x1, val, ymin=y, ymax=z, color=var, fill=var)) +
geom_ribbon(color=NA) +
geom_point() +
scale_color_manual(values=c("blue","purple"), name="leg", labels = c("lab1","lab2")) +
scale_fill_manual(values = rep("red", 2), name="leg", labels= c("lab1","lab2"))
or
library(tidyverse)
df %>%
gather(var, val, jj, hh) %>%
bind_rows(data.frame(x=NA,y=NA,z=NA,x1=NA,var="_dummy",val=NA)) %>%
ggplot(aes(x1, val, ymin=y, ymax=z, color=var, fill=var)) +
geom_ribbon(color=NA) +
geom_ribbon(color=NA, fill="red") +
geom_point() +
scale_color_manual(
values=c("#FFFFFF00", "blue","purple"), name="leg", labels = c("lab1","lab2","lab3")) +
scale_fill_manual(
values = c("red", rep(NA, 2)), name="leg", labels= c("lab1","lab2","lab3"))
One option is to use an interior fill for each element. There may be a way to use override.aes to get the points to be a point in the legend, but I wasn't able to get that with any quick experimentation.
p <- df %>%
ggplot(aes(x1)) +
geom_ribbon(aes(ymin = y, ymax = z, fill = "area")) +
geom_point(aes(y = jj, fill = "points1"), shape=21, colour="blue") +
geom_point(aes(y = hh, fill = "points2"), shape=21, colour="purple") +
scale_x_continuous(breaks = some_names %>% select(x1) %>% unlist %>% unname,
labels = some_names %>% select(x) %>% unlist %>% unname ) +
scale_fill_manual(name = "legend",
values = c("area" = "red","points1" = "blue","points2" = "purple"),
guide = guide_legend(override.aes=aes(colour=NA)))
p

Resources