ggplot2: Put multi-variable facet_wrap labels on one line - r

I am using facet_wrap to split my scatter plot as
facet_wrap(x~y+z)
This generates 22 plots in my case as desired. However, label for each of those 22 plots is displayed in 3 rows (x, y and z) which unnecessarily consumes the space in the window and squishes the plots into a small area. I would rather want my plots to be bigger in size. Since variables y and z are short, I would like to display them in same row instead of two.
I looked into the labeller options but none of them seem to do what I would want. I would appreciate any suggestions here.

In this case you might also consider label_wrap_gen():
p <- ggplot(mtcars, aes(wt,mpg)) + geom_point()
p + facet_wrap(cyl~am+vs, labeller = label_wrap_gen(multi_line=FALSE))
For more details see also here and here.

I'm not sure how to do this with a labeller function, but another option is to create a grouping variable that combines all three of your categorical variables into a single variable that can be used for faceting. Here's an example using the built-in mtcars data frame and the dplyr package for creating the new grouping variable on the fly. Following that is an update with a function that allows dynamic choice of from one to three faceting variables.
library(dplyr)
ggplot(mtcars %>% mutate(group = paste(cyl,am,vs, sep="-")),
aes(wt,mpg)) +
geom_point() +
facet_wrap(~group)
UPDATE: Regarding the comment about flexibility, the code below is a function that allows the user to enter the desired data frame and variable names, including dynamically choosing to facet on one, two, or three columns.
library(dplyr)
library(lazyeval)
mygg = function(dat, v1, v2, f1, f2=NA, f3=NA) {
dat = dat %>%
mutate_(group =
if (is.na(f2)) {
f1
} else if (is.na(f3)) {
interp(~paste(f1,f2, sep='-'), f1=as.name(f1), f2=as.name(f2))
} else {
interp(~paste(f1,f2,f3,sep='-'), f1=as.name(f1), f2=as.name(f2), f3=as.name(f3))
})
ggplot(dat, aes_string(v1,v2)) +
geom_point() +
facet_wrap(~group)
}
Now let's try out the function:
library(vcd) # For Arthitis data frame
mygg(Arthritis, "ID","Age","Sex","Treatment","Improved")
mygg(mtcars, "wt","mpg","cyl","am")
mygg(iris, "Petal.Width","Petal.Length","Species")

This was a top search result for me, so I am adding an answer with knowledge from 2022. ggplot's labeller() method now has a .multi_line argument, which, when FALSE, will comma-separate facet labels, including if you want to use a custom labeller.
library(tidyverse)
ggplot(mtcars, aes(wt,mpg)) +
geom_point() +
facet_wrap(~ cyl + gear + carb, labeller =
labeller(
cyl = ~ paste("Cylinder: ", .),
gear = ~ paste("Gear: ", .),
carb = ~ paste("Carb: ", .),
.multi_line = FALSE
)
)

Related

How can I print mutliple (81) ggplots from a for loop?

I have a large, long dataset like this example:
df <- data.frame("Sample" = c("CM","PB","CM","PB"),"Compound" = c("Hydrogen","Hydrogen","Helium","Helium"), "Value" = c(8,3,3,2))
however I have about 162 rows (81 sample/compound pairs)
I am trying to write a loop that prints individual geom_col() plots of each compound where
x=Sample
y=Value
and there are 81 plots for each compound.
I think I am close with this loop:
I want i in "each compound" etc.
for (i in df$Compound){
print(ggplot(data = i),
aes(x=Sample,
y=Value))+
geom_col()
}
What am I missing from this loop? I have also tried facet_wrap(~Compound) However it looks like 81 is too large and each plot is tiny once made. I am looking for a full size bar graph of each compound.
Two issues with your code:
Your aes needs to be combined with ggplot(.) somehow, not as a second argument to print.
Your geom_col needs to be added to the ggplot(.) chain, not to print.
I think then that your code should be
for (i in df$Compound){
print(
ggplot(data = i) +
aes(x = Sample, y = Value) +
geom_col()
)
}
A known-working example:
for (CYL in unique(mtcars$cyl)) {
print(
ggplot(subset(mtcars, cyl == CYL), aes(mpg, disp)) +
geom_point() +
labs(title = paste("cyl ==", CYL))
)
}
produces three plots (rapidly).
Note:
If you want a break, consider adding readline("next ...") after your print.
I tend to use gg <- ggplot(..) + ... ; print(gg) (instead of print(ggplot(.)+...)) mostly out of habit, but it can provide a little clarity in errors if/when they occur. It's minor and perhaps more technique than anything.
I think you can loop and pull out the selected data set for each index.
for (i in df$Compound){
print(ggplot(data = df[df$Compound == i,],
aes(x=Sample,
y=Value))+
geom_col())
}
(This code also fixes the problems/misplaced parentheses pointed out by #r2evans)
There are a variety of other ways to do this, e.g. split() the data frame by Compound, or something tidyverse-ish, or ...

Box plots not appearing properly in RStudio

I am creating box plots within R, however, they are appearing incorrectly. My data is based off of German Credit Dataset on Kaggle.
My code with two different attributes trying to be tested:
data %>%
ggplot(aes(x = Creditability, y = Purpose, fill = Creditability)) +
geom_boxplot() +
ggtitle("Creditability vs Purpose")
data %>%
ggplot(aes(x = Creditability, y = Account.Balance, fill = Creditability)) +
geom_boxplot() +
ggtitle("Creditability vs Account Balance")
I've tried a few of the different attributes for it, but results in the same error
Edited info: Is it because the attributes have too much information? I have split the sample into test (300) vs train (700) and I am currently using train. Would it simply be because there's too much info?
Edit picture:
Factors
Edit for graph error:
Error
As others have explained in the comments, you cannot show boxplots where the y axis is set to be a factor. Factors are by their nature discrete variables, even if the levels are named as numbers. In order to utilize the stat function for the boxplot geom, you need the y axis to be continuous and the x axis to be discrete (or able to be separated into discrete values via the group= aesthetic).
Let me demonstrate with the mtcars dataset built into ggplot2:
library(ggplot2)
ggplot(mtcars, aes(x=factor(carb), y=mpg)) + geom_boxplot()
Here we can draw boxpots because the x aesthetic is forced to be discrete (via factor(carb)), while the y axis is using mpg which is a numeric column in the mtcars dataset.
If you set both carb and mpg to be factors, you get something that should look pretty similar to what you're seeing:
ggplot(mtcars, aes(x=factor(carb), y=factor(mpg))) + geom_boxplot()
In your case, all your columns in your dataset are factors. If they are factors that can be coerced to be numbers, you can turn them into continuous vectors via using as.numeric(levels(column_name)[column_name]). Alternatively, you can use as.numeric(as.character(column_name)). Here's what it looks like to first convert the mtcars$mpg column to a factor of numeric values, and then back to being only numeric via this method.
df <- mtcars
# convert to a factor
df$mpg <- factor(df$mpg)
# back to numeric!
df$mpg <- as.numeric(levels(df$mpg)[df$mpg])
# this plot looks like it did before when we did the same with mtcars
ggplot(df, aes(x=factor(carb), y=mpg)) + geom_boxplot()
So, for your case, do this two step process:
data$Purpose <- as.numeric(levels(data$Purpose)[data$Purpose])
data %>%
ggplot(aes(x = Creditability, y = Purpose, fill = Creditability)) +
geom_boxplot() +
ggtitle("Creditability vs Purpose")
That should work. You can follow in a similar fashion for your other variables.

Iterate over a dataframe and create one plot for each column

I would like to iterate over a data frame and plot each column against a particular column such as price.
What I have done so far is:
for(i in ncol(dat.train)) {
ggplot(dat.train, aes(dat.train[[,i]],price)) + geom_point()
}
What I want is to have the first introduction to my data (Approximately 300 columns) by plotting against the decision variable (i.e., price)
I know that there is a similar question, though I cannot really understand why the above is not really working.
You can do this, I have used mtcars data to plot other continuous variables with mpg. You have to melt the data into long form (use gather) and then use ggplot to plot these contiuous variables (disp,drat,qsec etc) against mpg. In your case instead of mpg you would take price and all the other continuous variables to be melted (like here disp,drat,qsec etc), the rest categorical variables can be taken for shape and colors etc (optional).
library(tidyverse)
mtcars %>%
gather(-mpg, -hp, -cyl, key = "var", value = "value") %>%
ggplot(aes(x = value, y = mpg, color = hp, shape = factor(cyl))) +
geom_point() +
facet_wrap(~ var, scales = "free") +
theme_bw()
EDIT:
This is another solution in case we need separate graphs for each of the variables.
Create a list of variables like this: lyst <- list("disp","hp") , you can use colnames function to get all the variable names. Use lapply to to loop through all the "lyst" objects on your data frame.
setwd("path") ###set the working directory here, This is the place where all the files are saved.
pdf(file=paste0("one.pdf"))
lapply(lyst, function(i)ggplot(mtcars, aes_string(x=i, y="mpg")) + geom_point())
dev.off()
A pdf file wil. be generated with all the graphs pdfs at your working directory which you have set
Output from solution first:

(Re)name factor levels (or include variable name) in ggplot2 facet_ call

One pattern I do a lot is to facet plots on cuts of numeric values. facet_wrap in ggplot2 doesn't allow you to call a function from within, so you have to create a temporary factor variable. This is okay using mutate from dplyr. The advantage of this is that you can play around doing EDA and varying the number of quantiles, or changing to set cut points etc. and view the changes in one line. The downside is that the facets are only labelled by the factor level; you have to know, for example, that it's a temperature. This isn't too bad for yourself, but even I get confused if I'm doing a facet_grid on two such variables and have to remember which is which. So, it's really nice to be able to relabel the facets by including a meaningful name.
The key points of this problem is that the levels will change as you change the number of quantiles etc.; you don't know what they are in advance. You could use the base levels() function, but that means augmenting the data frame with the cut variable, then calling levels(), then passing this augmented data frame to ggplot().
So, using plyr::mapvalues, we can wrap all this into a dplyr::mutate, but the required arguments for mapvalues() makes it quite clunky. Having to retype "Temp.f" many times is not very "dplyr"!
Is there a neater way of renaming such factor levels "on the fly"? I hope this description is clear enough and the code example below helps.
library(ggplot2)
library(plyr)
library(dplyr)
library(Hmisc)
df <- data.frame(Temp = seq(-100, 100, length.out = 1000), y = rnorm(1000))
# facet_wrap doesn't allow functions so have to create new, temporary factor
# variable Temp.f
ggplot(df %>% mutate(Temp.f = cut2(Temp, g = 4))) + geom_histogram(aes(x = y)) + facet_wrap(~Temp.f)
# fine, but facet headers aren't very clear,
# we want to highlight that they are temperature
ggplot(df %>% mutate(Temp.f = paste0("Temp: ", cut2(Temp, g = 4)))) + geom_histogram(aes(x = y)) + facet_wrap(~Temp.f)
# use of paste0 is undesirable because it creates a character vector and
# facet_wrap then recodes the levels in the wrong numerical order
# This has the desired effect, but is very long!
ggplot(df %>% mutate(Temp.f = cut2(Temp, g = 4), Temp.f = mapvalues(Temp.f, levels(Temp.f), paste0("Temp: ", levels(Temp.f))))) + geom_histogram(aes(x = y)) + facet_wrap(~Temp.f)
I think you can do this from within facet_wrap using a custom labeller function, like so:
myLabeller <- function(x){
lapply(x,function(y){
paste("Temp:", y)
})
}
ggplot(df %>% mutate(Temp.f = cut2(Temp, g = 4))) +
geom_histogram(aes(x = y)) +
facet_wrap(~Temp.f
, labeller = myLabeller)
That labeller is clunky, but at least an example. You could write one for each variable that you are going to use (e.g. tempLabeller, yLabeller, etc).
A slight tweak makes this even better: it automatically uses the name of the thing you are facetting on:
betterLabeller <- function(x){
lapply(names(x),function(y){
paste0(y,": ", x[[y]])
})
}
ggplot(df %>% mutate(Temp.f = cut2(Temp, g = 4))) +
geom_histogram(aes(x = y)) +
facet_wrap(~Temp.f
, labeller = betterLabeller)
Okay, with thanks to Mark Peterson for pointing me towards the labeller argument/function, the exact answer I'm happy with is:
ggplot(df %>% mutate(Temp.f = cut2(Temp, g = 4))) + geom_histogram(aes(x = y)) + facet_wrap(~Temp.f, labeller = labeller(Temp.f = label_both))
I'm a fan of lazy and "label_both" means I can simply create a meaningful temporary (or overwrite the original) variable column and both the name and the value are given. Rolling your own labeller function is more powerful, but using label_both is a good, easy option.

How can I overlay by-group plot elements to ggplot2 facets?

My question has to do with facetting. In my example code below, I look at some facetted scatterplots, then try to overlay information (in this case, mean lines) on a per-facet basis.
The tl;dr version is that my attempts fail. Either my added mean lines compute across all data (disrespecting the facet variable), or I try to write a formula and R throws an error, followed by incisive and particularly disparaging comments about my mother.
library(ggplot2)
# Let's pretend we're exploring the relationship between a car's weight and its
# horsepower, using some sample data
p <- ggplot()
p <- p + geom_point(aes(x = wt, y = hp), data = mtcars)
print(p)
# Hmm. A quick check of the data reveals that car weights can differ wildly, by almost
# a thousand pounds.
head(mtcars)
# Does the difference matter? It might, especially if most 8-cylinder cars are heavy,
# and most 4-cylinder cars are light. ColorBrewer to the rescue!
p <- p + aes(color = factor(cyl))
p <- p + scale_color_brewer(pal = "Set1")
print(p)
# At this point, what would be great is if we could more strongly visually separate
# the cars out by their engine blocks.
p <- p + facet_grid(~ cyl)
print(p)
# Ah! Now we can see (given the fixed scales) that the 4-cylinder cars flock to the
# left on weight measures, while the 8-cylinder cars flock right. But you know what
# would be REALLY awesome? If we could visually compare the means of the car groups.
p.with.means <- p + geom_hline(
aes(yintercept = mean(hp)),
data = mtcars
)
print(p.with.means)
# Wait, that's not right. That's not right at all. The green (8-cylinder) cars are all above the
# average for their group. Are they somehow made in an auto plant in Lake Wobegon, MN? Obviously,
# I meant to draw mean lines factored by GROUP. Except also obviously, since the code below will
# print an error, I don't know how.
p.with.non.lake.wobegon.means <- p + geom_hline(
aes(yintercept = mean(hp) ~ cyl),
data = mtcars
)
print(p.with.non.lake.wobegon.means)
There must be some simple solution I'm missing.
You mean something like this:
rs <- ddply(mtcars,.(cyl),summarise,mn = mean(hp))
p + geom_hline(data=rs,aes(yintercept=mn))
It might be possible to do this within the ggplot call using stat_*, but I'd have to go back and tinker a bit. But generally if I'm adding summaries to a faceted plot I calculate the summaries separately and then add them with their own geom.
EDIT
Just a few expanded notes on your original attempt. Generally it's a good idea to put aes calls in ggplot that will persist throughout the plot, and then specify different data sets or aesthetics in those geom's that differ from the 'base' plot. Then you don't need to keep specifying data = ... in each geom.
Finally, I came up with a kind of clever use of geom_smooth to do something similar to what your asking:
p <- ggplot(data = mtcars,aes(x = wt, y = hp, colour = factor(cyl))) +
facet_grid(~cyl) +
geom_point() +
geom_smooth(se=FALSE,method="lm",formula=y~1,colour="black")
The horizontal line (i.e. constant regression eqn) will only extend to the limits of the data in each facet, but it skips the separate data summary step.

Resources