ggExtra plot format: similar marginal plots for different plot dimensions - r

This question in regarding formatting plots produced using ggplot2 + ggExtra and is not related to any bug.
require(ggplot2)
#> Loading required package: ggplot2
require(ggExtra)
#> Loading required package: ggExtra
p1 <- ggplot(data = mpg,aes(x = cty,y = cty)) +
geom_point()+
xlab("City driving (miles/gallon)") +
ylab("City driving (miles/gallon)")
ggMarginal(p = p1,type= "boxplot")
The y-axis marginal plot in this chart is usually not similar to the x-axis marginal plot i.e. the width of the 2 boxplots are not similar. This problem become more acute when I change plot dimensions (in my case, using RStudio). Any suggestions how to make the width of the 2 boxplots similar while using different plot dimensions (width x height).
I face similar problems with other marginal plot type options provided by ggExtra package: histogram, density.

I suggest the axis_canvas function from the cowplot package. (Disclaimer: I'm the package author.) It requires a little more work, but it allows you to draw any marginals you want. And you can specify the size exactly, in output units (e.g. inch).
require(cowplot)
pmain <- ggplot(data = mpg, aes(x = cty, y = hwy)) +
geom_point() +
xlab("City driving (miles/gallon)") +
ylab("Highway driving (miles/gallon)")
xbox <- axis_canvas(pmain, axis = "x", coord_flip = TRUE) +
geom_boxplot(data = mpg, aes(y = cty, x = 1)) + coord_flip()
ybox <- axis_canvas(pmain, axis = "y") +
geom_boxplot(data = mpg, aes(y = hwy, x = 1))
p1 <- insert_xaxis_grob(pmain, xbox, grid::unit(1, "in"), position = "top")
p2 <- insert_yaxis_grob(p1, ybox, grid::unit(1, "in"), position = "right")
ggdraw(p2)
See how the boxplots retain their width/height in the following two images with different aspect ratios. (Unfortunately Stackoverflow rescales the images, so the effect is somewhat obscured, but you can see that the height of the top boxplot is always equal to the width of the side one.)
The second advantage is that because you can use full-blown ggplot2 for your marginal plots, you can draw anything you want, e.g. grouped box plots.
require(cowplot)
pmain <- ggplot(data = mpg, aes(x = cty, y = hwy, color = factor(cyl))) +
geom_point() +
xlab("City driving (miles/gallon)") +
ylab("Highway driving (miles/gallon)") +
theme_minimal()
xbox <- axis_canvas(pmain, axis = "x", coord_flip = TRUE) +
geom_boxplot(data = mpg, aes(y = cty, x = factor(cyl), color = factor(cyl))) +
scale_x_discrete() + coord_flip()
ybox <- axis_canvas(pmain, axis = "y") +
geom_boxplot(data = mpg, aes(y = hwy, x = factor(cyl), color = factor(cyl))) +
scale_x_discrete()
p1 <- insert_xaxis_grob(pmain, xbox, grid::unit(1, "in"), position = "top")
p2 <- insert_yaxis_grob(p1, ybox, grid::unit(1, "in"), position = "right")
ggdraw(p2)

I'm not entirely sure what you mean. Setting width=height of your output plot ensures the same width of the boxplots.
For example, in RMarkdown, if I include
```{r, fig.width = 5, fig.height = 5}
ggMarginal(p1, type = "boxplot", size = 2);
```
I get the following output
The box widths are identical.
Alternatively, if you save your plot make sure to set the same width and height.
ggsave(file = "test.png", ggMarginal(p1, type = "boxplot", size = 2), width = 5, height = 5);

Related

plot_grid function removes axis breaks from ggbreak in plots

I'm struggling with a problem:
I created two volcano plots in ggplot2, but due to the fact that I had one outlier point in both plot, I need to add y axis break for better visualization.
The problem arises when I WANT TO plot both in the same page using plot_grid from cowplot::, because it visualizes the original plot without the breaks that I set.
p<- c1 %>%
ggplot(aes(x = avg_log2FC,
y = -log10(p_val_adj),
fill = gene_type,
size = gene_type,
alpha = gene_type)) +
geom_point(shape = 21, # Specify shape and colour as fixed local parameters
colour = "black") +
geom_hline(yintercept = 0,
linetype = "dashed") +
scale_fill_manual(values = cols) +
scale_size_manual(values = sizes) +
scale_alpha_manual(values = alphas) +
scale_x_continuous(limits=c(-1.5,1.5), breaks=seq(-1.5,1.5,0.5)) +
scale_y_continuous(limits=c(0,110),breaks=seq(0,110,25))+
labs(title = "Gene expression",
x = "log2(fold change)",
y = "-log10(adjusted P-value)",
colour = "Expression \nchange") +
theme_bw() + # Select theme with a white background
theme(panel.border = element_rect(colour = "black", fill = NA, size= 0.5),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank())
p1 <- p + scale_y_break(breaks = c(30, 100))
p1
p plot without breaks:
and p1 plot with breaks:
The same I did for the second plot. But this is the result using plot_grid(p1,p3, ncol = 2)
Can you help me understanding if I'm doing something wrong? or it is just a limitation of the package?
OP, it seems in that ggbreak is not compatible with functions that arrange multiple plots, as indicated in the documentation for the package here. There does seem to be a workaround via either print() (I didn't get this to work) or aplot::plot_list(...), which did work for me. Here's an example using built-in datasets.
# setting up the plots
library(ggplot2)
library(ggbreak)
library(cowplot)
p1 <-
ggplot(mtcars, aes(x=mpg, disp)) + geom_point() +
scale_y_break(c(200, 220))
p2 <-
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
geom_point() + scale_y_break(c(3.5, 3.7))
Plots p1 and p2 yield breaks in the y axis like you would expect, but plot_grid(p1,p2) results in the plots placed side-by-side without the y axis breaks.
The following does work to arrange the plots without disturbing the y axis breaks:
aplot::plot_list(p1,p2)

Add data distribution bars on a scatterplot [duplicate]

Is there a way of creating scatterplots with marginal histograms just like in the sample below in ggplot2? In Matlab it is the scatterhist() function and there exist equivalents for R as well. However, I haven't seen it for ggplot2.
I started an attempt by creating the single graphs but don't know how to arrange them properly.
require(ggplot2)
x<-rnorm(300)
y<-rt(300,df=2)
xy<-data.frame(x,y)
xhist <- qplot(x, geom="histogram") + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 5/16, axis.text.y = theme_blank(), axis.title.y=theme_blank(), background.colour="white")
yhist <- qplot(y, geom="histogram") + coord_flip() + opts(background.fill = "white", background.color ="black")
yhist <- yhist + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 16/5, axis.text.y = theme_blank(), axis.title.y=theme_blank() )
scatter <- qplot(x,y, data=xy) + scale_x_continuous(limits=c(min(x),max(x))) + scale_y_continuous(limits=c(min(y),max(y)))
none <- qplot(x,y, data=xy) + geom_blank()
and arranging them with the function posted here. But to make long story short: Is there a way of creating these graphs?
This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
This might be a bit late, but I decided to make a package (ggExtra) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
Link to ggExtra package
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
The gridExtra package should work here. Start by making each of the ggplot objects:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
One addition, just to save some searching time for people doing this after us.
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
You can correct this by using some of these theme settings,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
I tried those options, but wasn't satisfied by the results or the messy code one would need to use to get there. Lucky me, Thomas Lin Pedersen just developed a package called patchwork, which gets the job done in a pretty elegant manner.
If you want to create a scatterplot with marginal histograms, first you'd have to create those three plots seperately.
library(ggplot2)
x <- rnorm(300)
y <- rt(300, df = 2)
xy <- data.frame(x, y)
plot1 <- ggplot(xy, aes(x = x, y = y)) +
geom_point()
dens1 <- ggplot(xy, aes(x = x)) +
geom_histogram(color = "black", fill = "white") +
theme_void()
dens2 <- ggplot(xy, aes(x = y)) +
geom_histogram(color = "black", fill = "white") +
theme_void() +
coord_flip()
The only thing left to do, is to add those plots with a simple + and specify the layout with the function plot_layout().
library(patchwork)
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(
ncol = 2,
nrow = 2,
widths = c(4, 1),
heights = c(1, 4)
)
The function plot_spacer() adds an empty plot to the top right corner. All the other arguments should be self-explanatory.
Since histograms heavily depend on the chosen binwidth, one might argue to prefer density plots. With some small modifications one would get e.g. for eye tracking data a beautiful plot.
library(ggpubr)
plot1 <- ggplot(df, aes(x = Density, y = Face_sum, color = Group)) +
geom_point(aes(color = Group), size = 3) +
geom_point(shape = 1, color = "black", size = 3) +
stat_smooth(method = "lm", fullrange = TRUE) +
geom_rug() +
scale_y_continuous(name = "Number of fixated faces",
limits = c(0, 205), expand = c(0, 0)) +
scale_x_continuous(name = "Population density (lg10)",
limits = c(1, 4), expand = c(0, 0)) +
theme_pubr() +
theme(legend.position = c(0.15, 0.9))
dens1 <- ggplot(df, aes(x = Density, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none")
dens2 <- ggplot(df, aes(x = Face_sum, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none") +
coord_flip()
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(ncol = 2, nrow = 2, widths = c(4, 1), heights = c(1, 4))
Though the data is not provided at this point, the underlying principles should be clear.
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
It works for both grouped and ungrouped data and accepts additional graphical parameters:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
Iris set marginal histograms scatterplot
You can easily create attractive scatterplots with marginal histograms using ggstatsplot (it will also fit and describe a model):
data(iris)
library(ggstatsplot)
ggscatterstats(
data = iris,
x = Sepal.Length,
y = Sepal.Width,
xlab = "Sepal Length",
ylab = "Sepal Width",
marginal = TRUE,
marginal.type = "histogram",
centrality.para = "mean",
margins = "both",
title = "Relationship between Sepal Length and Sepal Width",
messages = FALSE
)
Or slightly more appealing (by default) ggpubr:
devtools::install_github("kassambara/ggpubr")
library(ggpubr)
ggscatterhist(
iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", # comment out this and last line to remove the split by species
margin.plot = "histogram", # I'd suggest removing this line to get density plots
margin.params = list(fill = "Species", color = "black", size = 0.2)
)
UPDATE:
As suggested by #aickley I used the developmental version to create the plot.
To build on the answer by #alf-pascu, setting up each plot manually and arranging them with cowplot grants a lot of flexibility with respect to both the main and the marginal plots (compared to some of the other solutions). Distributions by groups is one example. Changing the main plot to a 2D-density plot is another.
The following creates a scatterplot with (properly aligned) marginal histograms.
library("ggplot2")
library("cowplot")
# Set up scatterplot
scatterplot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 3, alpha = 0.6) +
guides(color = FALSE) +
theme(plot.margin = margin())
# Define marginal histogram
marginal_distribution <- function(x, var, group) {
ggplot(x, aes_string(x = var, fill = group)) +
geom_histogram(bins = 30, alpha = 0.4, position = "identity") +
# geom_density(alpha = 0.4, size = 0.1) +
guides(fill = FALSE) +
theme_void() +
theme(plot.margin = margin())
}
# Set up marginal histograms
x_hist <- marginal_distribution(iris, "Sepal.Length", "Species")
y_hist <- marginal_distribution(iris, "Sepal.Width", "Species") +
coord_flip()
# Align histograms with scatterplot
aligned_x_hist <- align_plots(x_hist, scatterplot, align = "v")[[1]]
aligned_y_hist <- align_plots(y_hist, scatterplot, align = "h")[[1]]
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, scatterplot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
To plot a 2D-density plot instead, just change the main plot.
# Set up 2D-density plot
contour_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
stat_density_2d(aes(alpha = ..piece..)) +
guides(color = FALSE, alpha = FALSE) +
theme(plot.margin = margin())
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, contour_plot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
This is an old question, but I thought it would be useful to post an update here since I've come across this same problem recently (thanks to Stefanie Mueller for the help!).
The most upvoted answer using gridExtra works, but aligning axes is difficult/hacky, as has been pointed out in the comments. This can now be solved using the command ggMarginal from the ggExtra package, as such:
#load packages
library(tidyverse) #for creating dummy dataset only
library(ggExtra)
#create dummy data
a = round(rnorm(1000,mean=10,sd=6),digits=0)
b = runif(1000,min=1.0,max=1.6)*a
b = b+runif(1000,min=9,max=15)
DummyData <- data.frame(var1 = b, var2 = a) %>%
filter(var1 > 0 & var2 > 0)
#plot
p = ggplot(DummyData, aes(var1, var2)) + geom_point(alpha=0.3)
ggMarginal(p, type = "histogram")
Another solution using ggpubr and cowplot, but here we create plots using cowplot::axis_canvas and add them to original plot with cowplot::insert_xaxis_grob:
library(cowplot)
library(ggpubr)
# Create main plot
plot_main <- ggplot(faithful, aes(eruptions, waiting)) +
geom_point()
# Create marginal plots
# Use geom_density/histogram for whatever you plotted on x/y axis
plot_x <- axis_canvas(plot_main, axis = "x") +
geom_density(aes(eruptions), faithful)
plot_y <- axis_canvas(plot_main, axis = "y", coord_flip = TRUE) +
geom_density(aes(waiting), faithful) +
coord_flip()
# Combine all plots into one
plot_final <- insert_xaxis_grob(plot_main, plot_x, position = "top")
plot_final <- insert_yaxis_grob(plot_final, plot_y, position = "right")
ggdraw(plot_final)
Nowadays, there is at least one CRAN package that makes the scatterplot with its marginal histograms.
library(psych)
scatterHist(rnorm(1000), runif(1000))
You can use the interactive form of ggExtra::ggMarginalGadget(yourplot) and choose between boxplots, violin plots, density plots and histograms whit easy.
like that

Marginal plots using axis_canvas in cowplot: How to insert gap between main panel and marginal plots

The following came up in a comment to this post: When making marginal plots with the axis_canvas() function in cowplot, how can we create a gap between the main plot and the marginal plot?
Example code:
require(cowplot)
pmain <- ggplot(data = mpg, aes(x = cty, y = hwy, color = factor(cyl))) +
geom_point() +
xlab("City driving (miles/gallon)") +
ylab("Highway driving (miles/gallon)") +
theme_minimal()
xbox <- axis_canvas(pmain, axis = "x", coord_flip = TRUE) +
geom_boxplot(data = mpg, aes(y = cty, x = factor(cyl), color = factor(cyl))) +
scale_x_discrete() + coord_flip()
ybox <- axis_canvas(pmain, axis = "y") +
geom_boxplot(data = mpg, aes(y = hwy, x = factor(cyl), color = factor(cyl))) +
scale_x_discrete()
p1 <- insert_xaxis_grob(pmain, xbox, grid::unit(0.6, "in"), position = "top")
p2 <- insert_yaxis_grob(p1, ybox, grid::unit(0.6, "in"), position = "right")
ggdraw(p2)
As we can see in this example, the marginal boxplots directly touch the main plot panel. The goal is to generate some gap. How can this be done?
I see two options:
Insert empty plot
We can apply the insert_xaxis_grob() / insert_yaxis_grob() functions iteratively to insert multiple grobs, one of which can be empty. In this way, we can insert a specified amount of space on either side of the marginal plots. Here I'm showing how to do this on the inside, to generate a gap between the main panel and the marginal plots:
# pmain, xbox, ybox are defined as in the question
pnull <- ggdraw() # generate empty plot
p1 <- insert_xaxis_grob(
insert_xaxis_grob(pmain, xbox, grid::unit(0.6, "in"), position = "top"),
pnull, grid::unit(0.2, "in"), position = "top")
p2 <- insert_yaxis_grob(
insert_yaxis_grob(p1, ybox, grid::unit(0.6, "in"), position = "right"),
pnull, grid::unit(0.2, "in"), position = "right")
ggdraw(p2)
Create gap in the marginal plots
Alternatively, since the marginal plots are drawn with ggplot2, we can just specify axis limits that generate space in the appropriate location. I.e., instead of xbox and ybox in the original code, we define xbox2 and ybox2 via:
xbox2 <- axis_canvas(pmain, axis = "x", coord_flip = TRUE) +
geom_boxplot(data = mpg, aes(y = cty, x = as.numeric(factor(cyl)), color = factor(cyl))) +
scale_x_continuous(limits = c(-2, 4.5)) + coord_flip()
ybox2 <- axis_canvas(pmain, axis = "y") +
geom_boxplot(data = mpg, aes(y = hwy, x = as.numeric(factor(cyl)), color = factor(cyl))) +
scale_x_continuous(limits = c(-2, 4.5))
p1 <- insert_xaxis_grob(pmain, xbox2, grid::unit(0.8, "in"), position = "top")
p2 <- insert_yaxis_grob(p1, ybox2, grid::unit(0.8, "in"), position = "right")
ggdraw(p2)
To understand what is happening here, let's compare xbox and xbox2 side by side:
plot_grid(xbox + panel_border("black"),
xbox2 + panel_border("black"), nrow = 1, scale = 0.9)
We see how xbox2 (on the right) has extra space at the bottom, which was created by starting the axis at -2, even though the first box plot is located at position 1. More information on how to choose the axis ranges for these marginal plots can be found here.

box plot in R with additional point

I have a dataframe of multiple columns (let's say n) with different range and a vector of length n. I want different x-axis for each variable to be shown below each box plot. I tried facet_grid and facet_wrap but it gives common x axis.
This is what I have tried:
d <- data.frame(matrix(rnorm(10000), ncol = 20))
point_var <- rnorm(20)
plot.data <- gather(d, variable, value)
plot.data$test_data <- rep(point_var, each = nrow(d))
ggplot(plot.data, aes(x=variable, y=value)) +
geom_boxplot() +
geom_point(aes(x=factor(variable), y = test_data), color = "red") +
coord_flip() +
xlab("Variables") +
theme(legend.position="none")
If you can live with having the text of the x axis above the plot, and having the order of the graphs a bit messed-up this could work:
library(grid)
p = ggplot(plot.data, aes(x = 0, y=value)) +
geom_boxplot() +
geom_point(aes(x = 0, y = test_data), color = "red") +
facet_wrap(~variable, scales = "free_y", switch = "y") +
xlab("Variables") +
theme(legend.position="none") + theme_bw() + theme(axis.text.x=element_blank())
print(p, vp=viewport(angle=270, width = unit(.75, "npc"), height = unit(.75, "npc")))
I'm actually just creating the graph without flipping coords, so that scales = 'free_y' works, swithcing the position of the strip labels, and then rotating the graph.
If you don't like the text above graph (which is understandable), I would consider creating a list of single plots and then putting them together with grid.arrange.
HTH,
Lorenzo

Scatterplot with marginal histograms in ggplot2

Is there a way of creating scatterplots with marginal histograms just like in the sample below in ggplot2? In Matlab it is the scatterhist() function and there exist equivalents for R as well. However, I haven't seen it for ggplot2.
I started an attempt by creating the single graphs but don't know how to arrange them properly.
require(ggplot2)
x<-rnorm(300)
y<-rt(300,df=2)
xy<-data.frame(x,y)
xhist <- qplot(x, geom="histogram") + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 5/16, axis.text.y = theme_blank(), axis.title.y=theme_blank(), background.colour="white")
yhist <- qplot(y, geom="histogram") + coord_flip() + opts(background.fill = "white", background.color ="black")
yhist <- yhist + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 16/5, axis.text.y = theme_blank(), axis.title.y=theme_blank() )
scatter <- qplot(x,y, data=xy) + scale_x_continuous(limits=c(min(x),max(x))) + scale_y_continuous(limits=c(min(y),max(y)))
none <- qplot(x,y, data=xy) + geom_blank()
and arranging them with the function posted here. But to make long story short: Is there a way of creating these graphs?
This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
This might be a bit late, but I decided to make a package (ggExtra) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
Link to ggExtra package
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
The gridExtra package should work here. Start by making each of the ggplot objects:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
One addition, just to save some searching time for people doing this after us.
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
You can correct this by using some of these theme settings,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
I tried those options, but wasn't satisfied by the results or the messy code one would need to use to get there. Lucky me, Thomas Lin Pedersen just developed a package called patchwork, which gets the job done in a pretty elegant manner.
If you want to create a scatterplot with marginal histograms, first you'd have to create those three plots seperately.
library(ggplot2)
x <- rnorm(300)
y <- rt(300, df = 2)
xy <- data.frame(x, y)
plot1 <- ggplot(xy, aes(x = x, y = y)) +
geom_point()
dens1 <- ggplot(xy, aes(x = x)) +
geom_histogram(color = "black", fill = "white") +
theme_void()
dens2 <- ggplot(xy, aes(x = y)) +
geom_histogram(color = "black", fill = "white") +
theme_void() +
coord_flip()
The only thing left to do, is to add those plots with a simple + and specify the layout with the function plot_layout().
library(patchwork)
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(
ncol = 2,
nrow = 2,
widths = c(4, 1),
heights = c(1, 4)
)
The function plot_spacer() adds an empty plot to the top right corner. All the other arguments should be self-explanatory.
Since histograms heavily depend on the chosen binwidth, one might argue to prefer density plots. With some small modifications one would get e.g. for eye tracking data a beautiful plot.
library(ggpubr)
plot1 <- ggplot(df, aes(x = Density, y = Face_sum, color = Group)) +
geom_point(aes(color = Group), size = 3) +
geom_point(shape = 1, color = "black", size = 3) +
stat_smooth(method = "lm", fullrange = TRUE) +
geom_rug() +
scale_y_continuous(name = "Number of fixated faces",
limits = c(0, 205), expand = c(0, 0)) +
scale_x_continuous(name = "Population density (lg10)",
limits = c(1, 4), expand = c(0, 0)) +
theme_pubr() +
theme(legend.position = c(0.15, 0.9))
dens1 <- ggplot(df, aes(x = Density, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none")
dens2 <- ggplot(df, aes(x = Face_sum, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none") +
coord_flip()
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(ncol = 2, nrow = 2, widths = c(4, 1), heights = c(1, 4))
Though the data is not provided at this point, the underlying principles should be clear.
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
It works for both grouped and ungrouped data and accepts additional graphical parameters:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
Iris set marginal histograms scatterplot
You can easily create attractive scatterplots with marginal histograms using ggstatsplot (it will also fit and describe a model):
data(iris)
library(ggstatsplot)
ggscatterstats(
data = iris,
x = Sepal.Length,
y = Sepal.Width,
xlab = "Sepal Length",
ylab = "Sepal Width",
marginal = TRUE,
marginal.type = "histogram",
centrality.para = "mean",
margins = "both",
title = "Relationship between Sepal Length and Sepal Width",
messages = FALSE
)
Or slightly more appealing (by default) ggpubr:
devtools::install_github("kassambara/ggpubr")
library(ggpubr)
ggscatterhist(
iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", # comment out this and last line to remove the split by species
margin.plot = "histogram", # I'd suggest removing this line to get density plots
margin.params = list(fill = "Species", color = "black", size = 0.2)
)
UPDATE:
As suggested by #aickley I used the developmental version to create the plot.
To build on the answer by #alf-pascu, setting up each plot manually and arranging them with cowplot grants a lot of flexibility with respect to both the main and the marginal plots (compared to some of the other solutions). Distributions by groups is one example. Changing the main plot to a 2D-density plot is another.
The following creates a scatterplot with (properly aligned) marginal histograms.
library("ggplot2")
library("cowplot")
# Set up scatterplot
scatterplot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 3, alpha = 0.6) +
guides(color = FALSE) +
theme(plot.margin = margin())
# Define marginal histogram
marginal_distribution <- function(x, var, group) {
ggplot(x, aes_string(x = var, fill = group)) +
geom_histogram(bins = 30, alpha = 0.4, position = "identity") +
# geom_density(alpha = 0.4, size = 0.1) +
guides(fill = FALSE) +
theme_void() +
theme(plot.margin = margin())
}
# Set up marginal histograms
x_hist <- marginal_distribution(iris, "Sepal.Length", "Species")
y_hist <- marginal_distribution(iris, "Sepal.Width", "Species") +
coord_flip()
# Align histograms with scatterplot
aligned_x_hist <- align_plots(x_hist, scatterplot, align = "v")[[1]]
aligned_y_hist <- align_plots(y_hist, scatterplot, align = "h")[[1]]
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, scatterplot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
To plot a 2D-density plot instead, just change the main plot.
# Set up 2D-density plot
contour_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
stat_density_2d(aes(alpha = ..piece..)) +
guides(color = FALSE, alpha = FALSE) +
theme(plot.margin = margin())
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, contour_plot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
This is an old question, but I thought it would be useful to post an update here since I've come across this same problem recently (thanks to Stefanie Mueller for the help!).
The most upvoted answer using gridExtra works, but aligning axes is difficult/hacky, as has been pointed out in the comments. This can now be solved using the command ggMarginal from the ggExtra package, as such:
#load packages
library(tidyverse) #for creating dummy dataset only
library(ggExtra)
#create dummy data
a = round(rnorm(1000,mean=10,sd=6),digits=0)
b = runif(1000,min=1.0,max=1.6)*a
b = b+runif(1000,min=9,max=15)
DummyData <- data.frame(var1 = b, var2 = a) %>%
filter(var1 > 0 & var2 > 0)
#plot
p = ggplot(DummyData, aes(var1, var2)) + geom_point(alpha=0.3)
ggMarginal(p, type = "histogram")
Another solution using ggpubr and cowplot, but here we create plots using cowplot::axis_canvas and add them to original plot with cowplot::insert_xaxis_grob:
library(cowplot)
library(ggpubr)
# Create main plot
plot_main <- ggplot(faithful, aes(eruptions, waiting)) +
geom_point()
# Create marginal plots
# Use geom_density/histogram for whatever you plotted on x/y axis
plot_x <- axis_canvas(plot_main, axis = "x") +
geom_density(aes(eruptions), faithful)
plot_y <- axis_canvas(plot_main, axis = "y", coord_flip = TRUE) +
geom_density(aes(waiting), faithful) +
coord_flip()
# Combine all plots into one
plot_final <- insert_xaxis_grob(plot_main, plot_x, position = "top")
plot_final <- insert_yaxis_grob(plot_final, plot_y, position = "right")
ggdraw(plot_final)
Nowadays, there is at least one CRAN package that makes the scatterplot with its marginal histograms.
library(psych)
scatterHist(rnorm(1000), runif(1000))
You can use the interactive form of ggExtra::ggMarginalGadget(yourplot) and choose between boxplots, violin plots, density plots and histograms whit easy.
like that

Resources