Easily add an '(all)' facet to facet_wrap in ggplot2? - r

I have data that looks like this example in the facet_wrap documentation:
(source: ggplot2.org)
I would like to fill the last facet with the overall view, using all data.
Is there an easy way to add a 'total' facet with facet_wrap? It's easy to add margins to facet_grid, but that option does not exist in facet_wrap.
Note: using facet_grid is not an option if you want a quadrant as in the plot above, which requires the ncol or nrow arguments from facet_wrap.

library(ggplot2)
p <- qplot(displ, hwy, data = transform(mpg, cyl = as.character(cyl)))
cyl6 <- subset(mpg, cyl == 6)
p + geom_point(data = transform(cyl6, cyl = "7"), colour = "red") +
geom_point(data = transform(mpg, cyl = "all"), colour = "blue") +
facet_wrap(~ cyl)

I prefer a slightly alternative approach. Essentially, the data is duplicated before creating the plot, with a new set of data added for the all data. I wrote the following CreateAllFacet function to simplify the process. It returns a new dataframe with the duplicated data and an additional column facet.
library(ggplot2)
#' Duplicates data to create additional facet
#' #param df a dataframe
#' #param col the name of facet column
#'
CreateAllFacet <- function(df, col){
df$facet <- df[[col]]
temp <- df
temp$facet <- "all"
merged <-rbind(temp, df)
# ensure the facet value is a factor
merged[[col]] <- as.factor(merged[[col]])
return(merged)
}
The benefit of adding the new column facet to the original data is that it still allows the variable cylinder to be used to colour the points in the plot within the aesthetics:
df <- CreateAllFacet(mpg, "cyl")
ggplot(data=df, aes(x=displ,y=hwy)) +
geom_point(aes(color=cyl)) +
facet_wrap(~ facet) +
theme(legend.position = "none")

you can try "margins" option in facet_wrap as followings :
library(ggplot2)
p <- qplot(displ, hwy, data = transform(mpg, cyl = as.character(cyl)))
cyl6 <- subset(mpg, cyl == 6)
p + geom_point(data = transform(cyl6, cyl = "7"), colour = "red") +
facet_wrap(~ cyl, margins=TRUE)

Related

add space and a line between two specified boxs in ggplot2

I have a project using boxplots, but one of the boxplots was a little different from all others. So I want to add some more space between this box and all other, but in the meanwhile, the spacing between other boxes remain the same. And I also want to add a dash line between this box and all others.
Here is a reproducible example:
library(ggplot2)
ggplot(data = mtcars, aes(x = factor(gear), y = mpg)) +
geom_boxplot(width = 0.5) +
geom_vline(xintercept = 4.5) +
theme_classic()
What I want is add some extra space between factor(gear) 4 and 5, while keeping the space between 3 and 4 the same. In addition, add a dash line between 4 and 5.
I tried to google this but did not find a good answer. Any suggestion will be greatly appreciated.
To make things more realistic, let's start off with gear as a factor rather than converting it inside ggplot:
mtcars2 <- within(mtcars, gear <- factor(gear))
The trick is to make the discrete axis a continuous axis with custom labels. We therefore need to convert the factor to numeric and add a little to the rightmost value:
xvals <- as.numeric(mtcars2$gear)
xvals[xvals == max(xvals)] <- xvals[xvals == max(xvals)] + 1
mtcars2$xvals <- xvals
Now we plot using xvals on the x axis, but using the factor levels from gear to label the breaks. Note that we could use words instead of the characters "3", "4" and "5" even though this is a numeric axis.
ggplot(data = mtcars2, aes(x = xvals, y = mpg, group = gear)) +
geom_boxplot(width = 0.5) +
geom_vline(xintercept = max(xvals) - 1, linetype = 2) +
scale_x_continuous(breaks = sort(unique(xvals)), labels = levels(mtcars2$gear)) +
labs(x = "gear") +
theme_classic()
To manually add some additional space between boxplots we could add an additional factor level as shown in the post linked by #MrFlick in his comment. However, additionally you want to a seperating vertical without altering the spacing between the categories.
In my opinion the easiest way to achieve both is to convert the factor to numerics. Try this:
library(ggplot2)
library(dplyr)
mtcars$gear <- factor(mtcars$gear)
# Save factor labels
labels <- levels(mtcars$gear)
mtcars %>%
mutate(
# Convert factor to numeric
gear = as.numeric(gear),
# Recode the special category, i.e. shift to the left. Here: Category 1
gear = ifelse(gear == 1, 0.5, gear)
) %>%
ggplot(aes(x = gear, y = mpg, group = gear)) +
geom_boxplot(width = 0.5) +
# Add dahed line
geom_vline(xintercept = 2.5, linetype = "dashed") +
# Set breaks and labels
scale_x_continuous(breaks = c(0.5, 2:3), labels = labels) +
theme_classic()

How do I facet by geom / layer in ggplot2?

I'm hoping to recreate the gridExtra output below with ggplot's facet_grid, but I'm unsure of what variable ggplot identifies with the layers in the plot. In this example, there are two geoms...
require(tidyverse)
a <- ggplot(mpg)
b <- geom_point(aes(displ, cyl, color = drv))
c <- geom_smooth(aes(displ, cyl, color = drv))
d <- a + b + c
# output below
gridExtra::grid.arrange(
a + b,
a + c,
ncol = 2
)
# Equivalent with gg's facet_grid
# needs a categorical var to iter over...
d$layers
#d + facet_grid(. ~ d$layers??)
The gridExtra output that I'm hoping to recreate is:
A hacky way of doing this is to take the existing data frame and create two, three, as many copies of the data frame you need with a value linked to it to be used for the facet and filtering later on. Union (or rbind) the data frames together into one data frame. Then set up the ggplot and geoms and filter each geom for the desired attribute. Also for the facet use the existing attribute to split the plots.
This can be seen below:
df1 <- data.frame(
graph = "point_plot",
mpg
)
df2 <- data.frame(
graph = "spline_plot",
mpg
)
df <- rbind(df1, df2)
ggplot(df, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(data = filter(df, graph == "point_plot")) +
geom_smooth(data = filter(df, graph == "spline_plot"), se=FALSE) +
facet_grid(. ~ graph)
If you really want to show different plots on different facets, one hacky way would be to make separate copies of the data and subset those...
mpg2 <- mpg %>% mutate(facet = 1) %>%
bind_rows(mpg %>% mutate(facet = 2))
ggplot(mpg2, aes(displ, cyl, color = drv)) +
geom_point(data = subset(mpg2, facet == 1)) +
geom_smooth(data = subset(mpg2, facet == 2)) +
facet_wrap(~facet)

R: Unexplainable behavior of ggplot inside a function

I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.
Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))

ggplot2 Facet Wrap Reorder by y-axis, Not x-axis

I want to plot faceted bar graphs and order them left-to-right from the largest to smallest values. I should be able to do this with code similar to this:
library(ggplot2)
ggplot(mpg, aes(reorder(cyl, -hwy), hwy)) +
geom_col() +
facet_wrap(~ manufacturer, scales = "free")
Instead what I get is ordering by the x-axis which happens to be 'cyl', smallest to largest values. How do I order descending, by the y-axis, so it looks like a Pareto chart? It has to be faceted as well. Thank you.
Here is a different approach that can be performed directly in ggplot utilizing two functions from here. I will use eipi10's example:
library(tidyverse)
mpg$hwy[mpg$manufacturer=="audi" & mpg$cyl==8] <- 40
dat <- mpg %>% group_by(manufacturer, cyl) %>%
summarise(hwy = mean(hwy)) %>%
arrange(desc(hwy)) %>%
mutate(cyl = factor(cyl, levels = cyl))
Functions:
reorder_within <- function(x, by, within, fun = mean, sep = "___", ...) {
new_x <- paste(x, within, sep = sep)
stats::reorder(new_x, by, FUN = fun)
}
scale_x_reordered <- function(..., sep = "___") {
reg <- paste0(sep, ".+$")
ggplot2::scale_x_discrete(labels = function(x) gsub(reg, "", x), ...)
}
plot:
ggplot(dat, aes(reorder_within(cyl, -hwy, manufacturer), y = hwy), hwy) +
geom_col() +
scale_x_reordered() +
facet_wrap(~ manufacturer, scales = "free") +
theme(axis.title=element_blank())
for ascending order you would: reorder_within(cyl, hwy, manufacturer)
Plot without the functions:
ggplot(dat, aes(cyl, y = hwy)) +
geom_col() +
facet_wrap(~ manufacturer, scales = "free") +
theme(axis.title=element_blank())
If I understand your question, the goal is to plot the average highway mpg (the hwy column) by cyl for each manufacturer. Within each manufacturer, you want to order the x-axis (the cyl values), by the mean hwy value for each cyl.
To do that, we need to create the plots separately for each manufacturer and then lay them out together. This is because we can't have different x-axis orderings (cyl orderings in this case) for different panels in the same plot. (UPDATE: I stand corrected. #missuse's answer links to functions written by David Robinson, based on a blog post by Tyler Rinker to vary the x-axis label order in facetted plots.) So, we'll create a list of plots and then lay them out together, as if they were facetted.
library(tidyverse)
library(egg)
Since in the real data, the mean value of hwy is always monotonically decreasing with increasing cyl, we'll create an artificially high hwy value for 8-cylinder Audis, just for illustration:
mpg$hwy[mpg$manufacturer=="audi" & mpg$cyl==8] = 40
Now we split the data by manufacturer so we can create a separate plot, and therefore a separate cyl ordering for each manufacturer. We'll use the map function to iterate over the manufacturers.
plot.list = split(mpg, mpg$manufacturer) %>%
map(function(dat) {
# Order cyl by mean(hwy)
dat = dat %>% group_by(manufacturer, cyl) %>%
summarise(hwy = mean(hwy)) %>%
arrange(desc(hwy)) %>%
mutate(cyl = factor(cyl, levels=cyl))
ggplot(dat, aes(cyl, hwy)) +
geom_col() +
facet_wrap(~ manufacturer) +
theme(axis.title=element_blank()) +
expand_limits(y=mpg %>%
group_by(manufacturer,cyl) %>%
mutate(hwy=mean(hwy)) %>%
pull(hwy) %>% max)
})
Now let's remove the y-axis values and ticks from the plot that won't be in the first column when we lay out the plots together:
num_cols = 5
plot.list[-seq(1,length(plot.list), num_cols)] =
lapply(plot.list[-seq(1,length(plot.list), num_cols)], function(p) {
p + theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())
})
Finally, we lay out the plots. ggarrange from the egg package ensures that the panels all have the same width (otherwise the panels in the first column would be narrower than the others, due to space taken up by the y-axis labels).
ggarrange(plots=plot.list, left="Highway MPG", bottom="Cylinders", ncol=num_cols)
Note that the cyl values for audi are not in increasing order, showing that our reordering worked properly.

Don't drop zero count: dodged barplot

I am making a dodged barplot in ggplot2 and one grouping has a zero count that I want to display. I remembered seeing this on HERE a while back and figured the scale_x_discrete(drop=F) would work. It does not appear to work with dodged bars. How can I make the zero counts show?
For instance, (code below) in the plot below, type8~group4 has no examples. I would still like the plot to display the empty space for the zero count instead of eliminating the bar. How can I do this?
mtcars2 <- data.frame(type=factor(mtcars$cyl),
group=factor(mtcars$gear))
m2 <- ggplot(mtcars2, aes(x=type , fill=group))
p2 <- m2 + geom_bar(colour="black", position="dodge") +
scale_x_discrete(drop=F)
p2
Here's how you can do it without making summary tables first.
It did not work in my CRAN versioin (2.2.1) but in the latest development version of ggplot (2.2.1.900) I had no issues.
ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = position_dodge(preserve = "single"))
http://ggplot2.tidyverse.org/reference/position_dodge.html
Updated geom_bar() needs stat = "identity"
For what it's worth: The table of counts, dat, above contains NA. Sometimes, it is useful to have an explicit 0 instead; for instance, if the next step is to put counts above the bars. The following code does just that, although it's probably no simpler than Joran's. It involves two steps: get a crosstabulation of counts using dcast, then melt the table using melt, followed by ggplot() as usual.
library(ggplot2)
library(reshape2)
mtcars2 = data.frame(type=factor(mtcars$cyl), group=factor(mtcars$gear))
dat = dcast(mtcars2, type ~ group, fun.aggregate = length)
dat.melt = melt(dat, id.vars = "type", measure.vars = c("3", "4", "5"))
dat.melt
ggplot(dat.melt, aes(x = type,y = value, fill = variable)) +
geom_bar(stat = "identity", colour = "black", position = position_dodge(width = .8), width = 0.7) +
ylim(0, 14) +
geom_text(aes(label = value), position = position_dodge(width = .8), vjust = -0.5)
The only way I know of is to pre-compute the counts and add a dummy row:
dat <- rbind(ddply(mtcars2,.(type,group),summarise,count = length(group)),c(8,4,NA))
ggplot(dat,aes(x = type,y = count,fill = group)) +
geom_bar(colour = "black",position = "dodge",stat = "identity")
I thought that using stat_bin(drop = FALSE,geom = "bar",...) instead would work, but apparently it does not.
I asked this same question, but I only wanted to use data.table, as it's a faster solution for much larger data sets. I included notes on the data so that those that are less experienced and want to understand why I did what I did can do so easily. Here is how I manipulated the mtcars data set:
library(data.table)
library(scales)
library(ggplot2)
mtcars <- data.table(mtcars)
mtcars$Cylinders <- as.factor(mtcars$cyl) # Creates new column with data from cyl called Cylinders as a factor. This allows ggplot2 to automatically use the name "Cylinders" and recognize that it's a factor
mtcars$Gears <- as.factor(mtcars$gear) # Just like above, but with gears to Gears
setkey(mtcars, Cylinders, Gears) # Set key for 2 different columns
mtcars <- mtcars[CJ(unique(Cylinders), unique(Gears)), .N, allow.cartesian = TRUE] # Uses CJ to create a completed list of all unique combinations of Cylinders and Gears. Then counts how many of each combination there are and reports it in a column called "N"
And here is the call that produced the graph
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_x_discrete(drop = FALSE)
And it produces this graph:
Furthermore, if there is continuous data, like that in the diamonds data set (thanks to mnel):
library(data.table)
library(scales)
library(ggplot2)
diamonds <- data.table(diamonds) # I modified the diamonds data set in order to create gaps for illustrative purposes
setkey(diamonds, color, cut)
diamonds[J("E",c("Fair","Good")), carat := 0]
diamonds[J("G",c("Premium","Good","Fair")), carat := 0]
diamonds[J("J",c("Very Good","Fair")), carat := 0]
diamonds <- diamonds[carat != 0]
Then using CJ would work as well.
data <- data.table(diamonds)[,list(mean_carat = mean(carat)), keyby = c('cut', 'color')] # This step defines our data set as the combinations of cut and color that exist and their means. However, the problem with this is that it doesn't have all combinations possible
data <- data[CJ(unique(cut),unique(color))] # This functions exactly the same way as it did in the discrete example. It creates a complete list of all possible unique combinations of cut and color
ggplot(data, aes(color, mean_carat, fill=cut)) +
geom_bar(stat = "identity", position = "dodge") +
ylab("Mean Carat") + xlab("Color")
Giving us this graph:
Use count and complete from dplyr to do this.
library(tidyverse)
mtcars %>%
mutate(
type = as.factor(cyl),
group = as.factor(gear)
) %>%
count(type, group) %>%
complete(type, group, fill = list(n = 0)) %>%
ggplot(aes(x = type, y = n, fill = group)) +
geom_bar(colour = "black", position = "dodge", stat = "identity")
You can exploit the feature of the table() function, which computes the number of occurrences of a factor for all its levels
# load plyr package to use ddply
library(plyr)
# compute the counts using ddply, including zero occurrences for some factor levels
df <- ddply(mtcars2, .(group), summarise,
types = as.numeric(names(table(type))),
counts = as.numeric(table(type)))
# plot the results
ggplot(df, aes(x = types, y = counts, fill = group)) +
geom_bar(stat='identity',colour="black", position="dodge")

Resources