ggplot2 Facet Wrap Reorder by y-axis, Not x-axis - r

I want to plot faceted bar graphs and order them left-to-right from the largest to smallest values. I should be able to do this with code similar to this:
library(ggplot2)
ggplot(mpg, aes(reorder(cyl, -hwy), hwy)) +
geom_col() +
facet_wrap(~ manufacturer, scales = "free")
Instead what I get is ordering by the x-axis which happens to be 'cyl', smallest to largest values. How do I order descending, by the y-axis, so it looks like a Pareto chart? It has to be faceted as well. Thank you.

Here is a different approach that can be performed directly in ggplot utilizing two functions from here. I will use eipi10's example:
library(tidyverse)
mpg$hwy[mpg$manufacturer=="audi" & mpg$cyl==8] <- 40
dat <- mpg %>% group_by(manufacturer, cyl) %>%
summarise(hwy = mean(hwy)) %>%
arrange(desc(hwy)) %>%
mutate(cyl = factor(cyl, levels = cyl))
Functions:
reorder_within <- function(x, by, within, fun = mean, sep = "___", ...) {
new_x <- paste(x, within, sep = sep)
stats::reorder(new_x, by, FUN = fun)
}
scale_x_reordered <- function(..., sep = "___") {
reg <- paste0(sep, ".+$")
ggplot2::scale_x_discrete(labels = function(x) gsub(reg, "", x), ...)
}
plot:
ggplot(dat, aes(reorder_within(cyl, -hwy, manufacturer), y = hwy), hwy) +
geom_col() +
scale_x_reordered() +
facet_wrap(~ manufacturer, scales = "free") +
theme(axis.title=element_blank())
for ascending order you would: reorder_within(cyl, hwy, manufacturer)
Plot without the functions:
ggplot(dat, aes(cyl, y = hwy)) +
geom_col() +
facet_wrap(~ manufacturer, scales = "free") +
theme(axis.title=element_blank())

If I understand your question, the goal is to plot the average highway mpg (the hwy column) by cyl for each manufacturer. Within each manufacturer, you want to order the x-axis (the cyl values), by the mean hwy value for each cyl.
To do that, we need to create the plots separately for each manufacturer and then lay them out together. This is because we can't have different x-axis orderings (cyl orderings in this case) for different panels in the same plot. (UPDATE: I stand corrected. #missuse's answer links to functions written by David Robinson, based on a blog post by Tyler Rinker to vary the x-axis label order in facetted plots.) So, we'll create a list of plots and then lay them out together, as if they were facetted.
library(tidyverse)
library(egg)
Since in the real data, the mean value of hwy is always monotonically decreasing with increasing cyl, we'll create an artificially high hwy value for 8-cylinder Audis, just for illustration:
mpg$hwy[mpg$manufacturer=="audi" & mpg$cyl==8] = 40
Now we split the data by manufacturer so we can create a separate plot, and therefore a separate cyl ordering for each manufacturer. We'll use the map function to iterate over the manufacturers.
plot.list = split(mpg, mpg$manufacturer) %>%
map(function(dat) {
# Order cyl by mean(hwy)
dat = dat %>% group_by(manufacturer, cyl) %>%
summarise(hwy = mean(hwy)) %>%
arrange(desc(hwy)) %>%
mutate(cyl = factor(cyl, levels=cyl))
ggplot(dat, aes(cyl, hwy)) +
geom_col() +
facet_wrap(~ manufacturer) +
theme(axis.title=element_blank()) +
expand_limits(y=mpg %>%
group_by(manufacturer,cyl) %>%
mutate(hwy=mean(hwy)) %>%
pull(hwy) %>% max)
})
Now let's remove the y-axis values and ticks from the plot that won't be in the first column when we lay out the plots together:
num_cols = 5
plot.list[-seq(1,length(plot.list), num_cols)] =
lapply(plot.list[-seq(1,length(plot.list), num_cols)], function(p) {
p + theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())
})
Finally, we lay out the plots. ggarrange from the egg package ensures that the panels all have the same width (otherwise the panels in the first column would be narrower than the others, due to space taken up by the y-axis labels).
ggarrange(plots=plot.list, left="Highway MPG", bottom="Cylinders", ncol=num_cols)
Note that the cyl values for audi are not in increasing order, showing that our reordering worked properly.

Related

Can you add a count to the legend for each level of a factor in ggplot?

I am producing a scatterplot using ggplot, and will be colouring the data points by a given factor. The legend that is produced, details the colour assigned to each level of the factor, but is it possible for it to also count the number of points in each factor.
For example, I have included the code for the cars data set:
p <- ggplot(mtcars, aes(wt, mpg))
p + geom_point(aes(colour = factor(cyl)))
In this plot, I would be looking to have the count for each number of cylinders. So 4(Count 1), 6(Count 2) and 8(Count 3).
Thanks in advance.
you can try something like this
mtcars %>%
group_by(cyl) %>%
mutate(label = paste0(cyl, ' (Count ', n(), ')')) %>%
ggplot(aes(wt, mpg)) +
geom_point(aes(colour = factor(label)))

How can I append facets and keep individual facet height in the plot as is?

I want to create a large facet_wrap with ggplot2.
What I want is some automatic way to append the individual facets by column so that the facets keep the size they would have by default if (with the same data) only the first row with four columns of facets would be supplied.
E.g. I am concerned about the plot height. I know how to manually change the plot height, but I need an automatic way.
Some example data:
mpg <- mpg %>%
distinct(model, year, .keep_all = TRUE)
ggplot(mpg, aes(x=year, y=hwy))+
geom_point()+
facet_wrap(~model)
All facets should have the same height as
mpg %>%
distinct(model, year, .keep_all = TRUE) %>%
filter(model %in% c("4runner 4wd", "a4", "a4 quattro", "altima")) %>%
ggplot(aes(x=year, y=hwy))+
geom_point()+
facet_wrap(~model, ncol = 4)
You can conserve plot height by using theme(aspect.ratio). For example,
mpg <- mpg %>%
distinct(model, year, .keep_all = TRUE)
ggplot(mpg, aes(x=year, y=hwy))+
geom_point()+
facet_wrap(~model) + theme(aspect.ratio=2)

How to graph "before and after" measures using ggplot with connecting lines and subsets?

I’m totally new to ggplot, relatively fresh with R and want to make a smashing ”before-and-after” scatterplot with connecting lines to illustrate the movement in percentages of different subgroups before and after a special training initiative. I’ve tried some options, but have yet to:
show each individual observation separately (now same values are overlapping)
connect the related before and after measures (x=0 and X=1) with lines to more clearly illustrate the direction of variation
subset the data along class and id using shape and colors
How can I best create a scatter plot using ggplot (or other) fulfilling the above demands?
Main alternative: geom_point()
Here is some sample data and example code using genom_point
x <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) # 0=before, 1=after
y <- c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95) # percentage of ”feelings of peace"
class <- c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1) # 0=multiple days 1=one day
id <- c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6) # id = per individual
df <- data.frame(x,y,class,id)
ggplot(df, aes(x=x, y=y), fill=id, shape=class) + geom_point()
Alternative: scale_size()
I have explored stat_sum() to summarize the frequencies of overlapping observations, but then not being able to subset using colors and shapes due to overlap.
ggplot(df, aes(x=x, y=y)) +
stat_sum()
Alternative: geom_dotplot()
I have also explored geom_dotplot() to clarify the overlapping observations that arise from using genom_point() as I do in the example below, however I have yet to understand how to combine the before and after measures into the same plot.
df1 <- df[1:10,] # data before
df2 <- df[11:20,] # data after
p1 <- ggplot(df1, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
p2 <- ggplot(df2, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
grid.arrange(p1,p2, nrow=1) # GridExtra package
Or maybe it is better to summarize data by x, id, class as mean/median of y, filter out ids producing NAs (e.g. ids 3 and 6), and connect the points by lines? So in case if you don't really need to show variability for some ids (which could be true if the plot only illustrates tendencies) you can do it this way:
library(ggplot)
library(dplyr)
#library(ggthemes)
df <- df %>%
group_by(x, id, class) %>%
summarize(y = median(y, na.rm = T)) %>%
ungroup() %>%
mutate(
id = factor(id),
x = factor(x, labels = c("before", "after")),
class = factor(class, labels = c("one day", "multiple days")),
) %>%
group_by(id) %>%
mutate(nas = any(is.na(y))) %>%
ungroup() %>%
filter(!nas) %>%
select(-nas)
ggplot(df, aes(x = x, y = y, col = id, group = id)) +
geom_point(aes(shape = class)) +
geom_line(show.legend = F) +
#theme_few() +
#theme(legend.position = "none") +
ylab("Feelings of peace, %") +
xlab("")
Here's one possible solution for you.
First - to get the color and shapes determined by variables, you need to put these into the aes function. I turned several into factors, so the labs function fixes the labels so they don't appear as "factor(x)" but just "x".
To address multiple points, one solution is to use geom_smooth with method = "lm". This plots the regression line, instead of connecting all the dots.
The option se = FALSE prevents confidence intervals from being plotted - I don't think they add a lot to your plot, but play with it.
Connecting the dots is done by geom_line - feel free to try that as well.
Within geom_point, the option position = position_jitter(width = .1) adds random noise to the x-axis so points do not overlap.
ggplot(df, aes(x=factor(x), y=y, color=factor(id), shape=factor(class), group = id)) +
geom_point(position = position_jitter(width = .1)) +
geom_smooth(method = 'lm', se = FALSE) +
labs(
x = "x",
color = "ID",
shape = 'Class'
)

rearrange facet_wrap plots based on the points in the subplot

I would like to rearrange the facet_wrap plots in a better way.
library(ggplot2)
set.seed(123)
freq <- sample(1:10, 20, replace = T)
labels <- sample(LETTERS, 20)
value <- paste("i",1:13,sep='')
lab <- rep(unlist(lapply(1:length(freq), function(x) rep(labels[x],freq[x]))),2)
ival <- rep(unlist(lapply(1:length(freq), function(x) value[1:freq[x]])),2)
df <- data.frame(lab, ival, type=c(rep('Type1',119),rep('Type2',119)),val=runif(238,0,1))
ggplot(df, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
It results in the below plot:
Is there any way rearrange the plots based on their frequency? Some of the lab frequencies (or the number of points per type) are very low(1-3). I would like to arrange the plots facet_wrap wrt their frequencies instead of their label orders. One advantage is to reduce the plotting area and get better intuition from the plots.
Can it be done using the frequency values computed on the fly and passing them to the facet_wrap? Or it should be done separately using dplyr approaches and divide the data into low/medium/high frequent set of plots?
Here is one idea. We can use dplyr to calculate the number of each group in lab and use fct_reorder from forcats to reorder the factor level.
library(dplyr)
library(forcats)
df2 <- df %>%
group_by(lab) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(lab = fct_reorder(lab, N))
ggplot(df2, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
Set .desc = TRUE when using fct_reorder if you want to reverse the factor levels.

Easily add an '(all)' facet to facet_wrap in ggplot2?

I have data that looks like this example in the facet_wrap documentation:
(source: ggplot2.org)
I would like to fill the last facet with the overall view, using all data.
Is there an easy way to add a 'total' facet with facet_wrap? It's easy to add margins to facet_grid, but that option does not exist in facet_wrap.
Note: using facet_grid is not an option if you want a quadrant as in the plot above, which requires the ncol or nrow arguments from facet_wrap.
library(ggplot2)
p <- qplot(displ, hwy, data = transform(mpg, cyl = as.character(cyl)))
cyl6 <- subset(mpg, cyl == 6)
p + geom_point(data = transform(cyl6, cyl = "7"), colour = "red") +
geom_point(data = transform(mpg, cyl = "all"), colour = "blue") +
facet_wrap(~ cyl)
I prefer a slightly alternative approach. Essentially, the data is duplicated before creating the plot, with a new set of data added for the all data. I wrote the following CreateAllFacet function to simplify the process. It returns a new dataframe with the duplicated data and an additional column facet.
library(ggplot2)
#' Duplicates data to create additional facet
#' #param df a dataframe
#' #param col the name of facet column
#'
CreateAllFacet <- function(df, col){
df$facet <- df[[col]]
temp <- df
temp$facet <- "all"
merged <-rbind(temp, df)
# ensure the facet value is a factor
merged[[col]] <- as.factor(merged[[col]])
return(merged)
}
The benefit of adding the new column facet to the original data is that it still allows the variable cylinder to be used to colour the points in the plot within the aesthetics:
df <- CreateAllFacet(mpg, "cyl")
ggplot(data=df, aes(x=displ,y=hwy)) +
geom_point(aes(color=cyl)) +
facet_wrap(~ facet) +
theme(legend.position = "none")
you can try "margins" option in facet_wrap as followings :
library(ggplot2)
p <- qplot(displ, hwy, data = transform(mpg, cyl = as.character(cyl)))
cyl6 <- subset(mpg, cyl == 6)
p + geom_point(data = transform(cyl6, cyl = "7"), colour = "red") +
facet_wrap(~ cyl, margins=TRUE)

Resources