Related
Is there a way of creating scatterplots with marginal histograms just like in the sample below in ggplot2? In Matlab it is the scatterhist() function and there exist equivalents for R as well. However, I haven't seen it for ggplot2.
I started an attempt by creating the single graphs but don't know how to arrange them properly.
require(ggplot2)
x<-rnorm(300)
y<-rt(300,df=2)
xy<-data.frame(x,y)
xhist <- qplot(x, geom="histogram") + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 5/16, axis.text.y = theme_blank(), axis.title.y=theme_blank(), background.colour="white")
yhist <- qplot(y, geom="histogram") + coord_flip() + opts(background.fill = "white", background.color ="black")
yhist <- yhist + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 16/5, axis.text.y = theme_blank(), axis.title.y=theme_blank() )
scatter <- qplot(x,y, data=xy) + scale_x_continuous(limits=c(min(x),max(x))) + scale_y_continuous(limits=c(min(y),max(y)))
none <- qplot(x,y, data=xy) + geom_blank()
and arranging them with the function posted here. But to make long story short: Is there a way of creating these graphs?
This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
This might be a bit late, but I decided to make a package (ggExtra) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
Link to ggExtra package
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
The gridExtra package should work here. Start by making each of the ggplot objects:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
One addition, just to save some searching time for people doing this after us.
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
You can correct this by using some of these theme settings,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
I tried those options, but wasn't satisfied by the results or the messy code one would need to use to get there. Lucky me, Thomas Lin Pedersen just developed a package called patchwork, which gets the job done in a pretty elegant manner.
If you want to create a scatterplot with marginal histograms, first you'd have to create those three plots seperately.
library(ggplot2)
x <- rnorm(300)
y <- rt(300, df = 2)
xy <- data.frame(x, y)
plot1 <- ggplot(xy, aes(x = x, y = y)) +
geom_point()
dens1 <- ggplot(xy, aes(x = x)) +
geom_histogram(color = "black", fill = "white") +
theme_void()
dens2 <- ggplot(xy, aes(x = y)) +
geom_histogram(color = "black", fill = "white") +
theme_void() +
coord_flip()
The only thing left to do, is to add those plots with a simple + and specify the layout with the function plot_layout().
library(patchwork)
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(
ncol = 2,
nrow = 2,
widths = c(4, 1),
heights = c(1, 4)
)
The function plot_spacer() adds an empty plot to the top right corner. All the other arguments should be self-explanatory.
Since histograms heavily depend on the chosen binwidth, one might argue to prefer density plots. With some small modifications one would get e.g. for eye tracking data a beautiful plot.
library(ggpubr)
plot1 <- ggplot(df, aes(x = Density, y = Face_sum, color = Group)) +
geom_point(aes(color = Group), size = 3) +
geom_point(shape = 1, color = "black", size = 3) +
stat_smooth(method = "lm", fullrange = TRUE) +
geom_rug() +
scale_y_continuous(name = "Number of fixated faces",
limits = c(0, 205), expand = c(0, 0)) +
scale_x_continuous(name = "Population density (lg10)",
limits = c(1, 4), expand = c(0, 0)) +
theme_pubr() +
theme(legend.position = c(0.15, 0.9))
dens1 <- ggplot(df, aes(x = Density, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none")
dens2 <- ggplot(df, aes(x = Face_sum, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none") +
coord_flip()
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(ncol = 2, nrow = 2, widths = c(4, 1), heights = c(1, 4))
Though the data is not provided at this point, the underlying principles should be clear.
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
It works for both grouped and ungrouped data and accepts additional graphical parameters:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
Iris set marginal histograms scatterplot
You can easily create attractive scatterplots with marginal histograms using ggstatsplot (it will also fit and describe a model):
data(iris)
library(ggstatsplot)
ggscatterstats(
data = iris,
x = Sepal.Length,
y = Sepal.Width,
xlab = "Sepal Length",
ylab = "Sepal Width",
marginal = TRUE,
marginal.type = "histogram",
centrality.para = "mean",
margins = "both",
title = "Relationship between Sepal Length and Sepal Width",
messages = FALSE
)
Or slightly more appealing (by default) ggpubr:
devtools::install_github("kassambara/ggpubr")
library(ggpubr)
ggscatterhist(
iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", # comment out this and last line to remove the split by species
margin.plot = "histogram", # I'd suggest removing this line to get density plots
margin.params = list(fill = "Species", color = "black", size = 0.2)
)
UPDATE:
As suggested by #aickley I used the developmental version to create the plot.
To build on the answer by #alf-pascu, setting up each plot manually and arranging them with cowplot grants a lot of flexibility with respect to both the main and the marginal plots (compared to some of the other solutions). Distributions by groups is one example. Changing the main plot to a 2D-density plot is another.
The following creates a scatterplot with (properly aligned) marginal histograms.
library("ggplot2")
library("cowplot")
# Set up scatterplot
scatterplot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 3, alpha = 0.6) +
guides(color = FALSE) +
theme(plot.margin = margin())
# Define marginal histogram
marginal_distribution <- function(x, var, group) {
ggplot(x, aes_string(x = var, fill = group)) +
geom_histogram(bins = 30, alpha = 0.4, position = "identity") +
# geom_density(alpha = 0.4, size = 0.1) +
guides(fill = FALSE) +
theme_void() +
theme(plot.margin = margin())
}
# Set up marginal histograms
x_hist <- marginal_distribution(iris, "Sepal.Length", "Species")
y_hist <- marginal_distribution(iris, "Sepal.Width", "Species") +
coord_flip()
# Align histograms with scatterplot
aligned_x_hist <- align_plots(x_hist, scatterplot, align = "v")[[1]]
aligned_y_hist <- align_plots(y_hist, scatterplot, align = "h")[[1]]
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, scatterplot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
To plot a 2D-density plot instead, just change the main plot.
# Set up 2D-density plot
contour_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
stat_density_2d(aes(alpha = ..piece..)) +
guides(color = FALSE, alpha = FALSE) +
theme(plot.margin = margin())
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, contour_plot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
This is an old question, but I thought it would be useful to post an update here since I've come across this same problem recently (thanks to Stefanie Mueller for the help!).
The most upvoted answer using gridExtra works, but aligning axes is difficult/hacky, as has been pointed out in the comments. This can now be solved using the command ggMarginal from the ggExtra package, as such:
#load packages
library(tidyverse) #for creating dummy dataset only
library(ggExtra)
#create dummy data
a = round(rnorm(1000,mean=10,sd=6),digits=0)
b = runif(1000,min=1.0,max=1.6)*a
b = b+runif(1000,min=9,max=15)
DummyData <- data.frame(var1 = b, var2 = a) %>%
filter(var1 > 0 & var2 > 0)
#plot
p = ggplot(DummyData, aes(var1, var2)) + geom_point(alpha=0.3)
ggMarginal(p, type = "histogram")
Another solution using ggpubr and cowplot, but here we create plots using cowplot::axis_canvas and add them to original plot with cowplot::insert_xaxis_grob:
library(cowplot)
library(ggpubr)
# Create main plot
plot_main <- ggplot(faithful, aes(eruptions, waiting)) +
geom_point()
# Create marginal plots
# Use geom_density/histogram for whatever you plotted on x/y axis
plot_x <- axis_canvas(plot_main, axis = "x") +
geom_density(aes(eruptions), faithful)
plot_y <- axis_canvas(plot_main, axis = "y", coord_flip = TRUE) +
geom_density(aes(waiting), faithful) +
coord_flip()
# Combine all plots into one
plot_final <- insert_xaxis_grob(plot_main, plot_x, position = "top")
plot_final <- insert_yaxis_grob(plot_final, plot_y, position = "right")
ggdraw(plot_final)
Nowadays, there is at least one CRAN package that makes the scatterplot with its marginal histograms.
library(psych)
scatterHist(rnorm(1000), runif(1000))
You can use the interactive form of ggExtra::ggMarginalGadget(yourplot) and choose between boxplots, violin plots, density plots and histograms whit easy.
like that
Is there a way of creating scatterplots with marginal histograms just like in the sample below in ggplot2? In Matlab it is the scatterhist() function and there exist equivalents for R as well. However, I haven't seen it for ggplot2.
I started an attempt by creating the single graphs but don't know how to arrange them properly.
require(ggplot2)
x<-rnorm(300)
y<-rt(300,df=2)
xy<-data.frame(x,y)
xhist <- qplot(x, geom="histogram") + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 5/16, axis.text.y = theme_blank(), axis.title.y=theme_blank(), background.colour="white")
yhist <- qplot(y, geom="histogram") + coord_flip() + opts(background.fill = "white", background.color ="black")
yhist <- yhist + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 16/5, axis.text.y = theme_blank(), axis.title.y=theme_blank() )
scatter <- qplot(x,y, data=xy) + scale_x_continuous(limits=c(min(x),max(x))) + scale_y_continuous(limits=c(min(y),max(y)))
none <- qplot(x,y, data=xy) + geom_blank()
and arranging them with the function posted here. But to make long story short: Is there a way of creating these graphs?
This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
This might be a bit late, but I decided to make a package (ggExtra) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
Link to ggExtra package
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
The gridExtra package should work here. Start by making each of the ggplot objects:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
One addition, just to save some searching time for people doing this after us.
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
You can correct this by using some of these theme settings,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
I tried those options, but wasn't satisfied by the results or the messy code one would need to use to get there. Lucky me, Thomas Lin Pedersen just developed a package called patchwork, which gets the job done in a pretty elegant manner.
If you want to create a scatterplot with marginal histograms, first you'd have to create those three plots seperately.
library(ggplot2)
x <- rnorm(300)
y <- rt(300, df = 2)
xy <- data.frame(x, y)
plot1 <- ggplot(xy, aes(x = x, y = y)) +
geom_point()
dens1 <- ggplot(xy, aes(x = x)) +
geom_histogram(color = "black", fill = "white") +
theme_void()
dens2 <- ggplot(xy, aes(x = y)) +
geom_histogram(color = "black", fill = "white") +
theme_void() +
coord_flip()
The only thing left to do, is to add those plots with a simple + and specify the layout with the function plot_layout().
library(patchwork)
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(
ncol = 2,
nrow = 2,
widths = c(4, 1),
heights = c(1, 4)
)
The function plot_spacer() adds an empty plot to the top right corner. All the other arguments should be self-explanatory.
Since histograms heavily depend on the chosen binwidth, one might argue to prefer density plots. With some small modifications one would get e.g. for eye tracking data a beautiful plot.
library(ggpubr)
plot1 <- ggplot(df, aes(x = Density, y = Face_sum, color = Group)) +
geom_point(aes(color = Group), size = 3) +
geom_point(shape = 1, color = "black", size = 3) +
stat_smooth(method = "lm", fullrange = TRUE) +
geom_rug() +
scale_y_continuous(name = "Number of fixated faces",
limits = c(0, 205), expand = c(0, 0)) +
scale_x_continuous(name = "Population density (lg10)",
limits = c(1, 4), expand = c(0, 0)) +
theme_pubr() +
theme(legend.position = c(0.15, 0.9))
dens1 <- ggplot(df, aes(x = Density, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none")
dens2 <- ggplot(df, aes(x = Face_sum, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none") +
coord_flip()
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(ncol = 2, nrow = 2, widths = c(4, 1), heights = c(1, 4))
Though the data is not provided at this point, the underlying principles should be clear.
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
It works for both grouped and ungrouped data and accepts additional graphical parameters:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
Iris set marginal histograms scatterplot
You can easily create attractive scatterplots with marginal histograms using ggstatsplot (it will also fit and describe a model):
data(iris)
library(ggstatsplot)
ggscatterstats(
data = iris,
x = Sepal.Length,
y = Sepal.Width,
xlab = "Sepal Length",
ylab = "Sepal Width",
marginal = TRUE,
marginal.type = "histogram",
centrality.para = "mean",
margins = "both",
title = "Relationship between Sepal Length and Sepal Width",
messages = FALSE
)
Or slightly more appealing (by default) ggpubr:
devtools::install_github("kassambara/ggpubr")
library(ggpubr)
ggscatterhist(
iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", # comment out this and last line to remove the split by species
margin.plot = "histogram", # I'd suggest removing this line to get density plots
margin.params = list(fill = "Species", color = "black", size = 0.2)
)
UPDATE:
As suggested by #aickley I used the developmental version to create the plot.
To build on the answer by #alf-pascu, setting up each plot manually and arranging them with cowplot grants a lot of flexibility with respect to both the main and the marginal plots (compared to some of the other solutions). Distributions by groups is one example. Changing the main plot to a 2D-density plot is another.
The following creates a scatterplot with (properly aligned) marginal histograms.
library("ggplot2")
library("cowplot")
# Set up scatterplot
scatterplot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 3, alpha = 0.6) +
guides(color = FALSE) +
theme(plot.margin = margin())
# Define marginal histogram
marginal_distribution <- function(x, var, group) {
ggplot(x, aes_string(x = var, fill = group)) +
geom_histogram(bins = 30, alpha = 0.4, position = "identity") +
# geom_density(alpha = 0.4, size = 0.1) +
guides(fill = FALSE) +
theme_void() +
theme(plot.margin = margin())
}
# Set up marginal histograms
x_hist <- marginal_distribution(iris, "Sepal.Length", "Species")
y_hist <- marginal_distribution(iris, "Sepal.Width", "Species") +
coord_flip()
# Align histograms with scatterplot
aligned_x_hist <- align_plots(x_hist, scatterplot, align = "v")[[1]]
aligned_y_hist <- align_plots(y_hist, scatterplot, align = "h")[[1]]
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, scatterplot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
To plot a 2D-density plot instead, just change the main plot.
# Set up 2D-density plot
contour_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
stat_density_2d(aes(alpha = ..piece..)) +
guides(color = FALSE, alpha = FALSE) +
theme(plot.margin = margin())
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, contour_plot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
This is an old question, but I thought it would be useful to post an update here since I've come across this same problem recently (thanks to Stefanie Mueller for the help!).
The most upvoted answer using gridExtra works, but aligning axes is difficult/hacky, as has been pointed out in the comments. This can now be solved using the command ggMarginal from the ggExtra package, as such:
#load packages
library(tidyverse) #for creating dummy dataset only
library(ggExtra)
#create dummy data
a = round(rnorm(1000,mean=10,sd=6),digits=0)
b = runif(1000,min=1.0,max=1.6)*a
b = b+runif(1000,min=9,max=15)
DummyData <- data.frame(var1 = b, var2 = a) %>%
filter(var1 > 0 & var2 > 0)
#plot
p = ggplot(DummyData, aes(var1, var2)) + geom_point(alpha=0.3)
ggMarginal(p, type = "histogram")
Another solution using ggpubr and cowplot, but here we create plots using cowplot::axis_canvas and add them to original plot with cowplot::insert_xaxis_grob:
library(cowplot)
library(ggpubr)
# Create main plot
plot_main <- ggplot(faithful, aes(eruptions, waiting)) +
geom_point()
# Create marginal plots
# Use geom_density/histogram for whatever you plotted on x/y axis
plot_x <- axis_canvas(plot_main, axis = "x") +
geom_density(aes(eruptions), faithful)
plot_y <- axis_canvas(plot_main, axis = "y", coord_flip = TRUE) +
geom_density(aes(waiting), faithful) +
coord_flip()
# Combine all plots into one
plot_final <- insert_xaxis_grob(plot_main, plot_x, position = "top")
plot_final <- insert_yaxis_grob(plot_final, plot_y, position = "right")
ggdraw(plot_final)
Nowadays, there is at least one CRAN package that makes the scatterplot with its marginal histograms.
library(psych)
scatterHist(rnorm(1000), runif(1000))
You can use the interactive form of ggExtra::ggMarginalGadget(yourplot) and choose between boxplots, violin plots, density plots and histograms whit easy.
like that
To make it clear, I am looking for a simple way of adding a 90-degree-rotated histogram or density plot whose x-axis aligns with the y-axis of the example plot given below.
library(ggplot2)
library(tibble)
x <- seq(100)
y <- rnorm(100)
my_data <- tibble(x = x, y = y)
ggplot(data = my_data, mapping = aes(x = x, y = y)) +
geom_line()
Created on 2019-01-28 by the reprex package (v0.2.1)
I'd try it with either geom_histogram or geom_density, the patchwork library, and dynamically setting limits to match the plots.
Rather than manually setting limits, get the range of y-values, set that as the limits in scale_y_continuous or scale_x_continuous as appropriate, and add some padding with expand_scale. The first plot is the line plot, and the second and third are distribution plots, with the axes flipped. All have the scales set to match.
library(ggplot2)
library(tibble)
library(patchwork)
y_range <- range(my_data$y)
p1 <- ggplot(data = my_data, mapping = aes(x = x, y = y)) +
geom_line() +
scale_y_continuous(limits = y_range, expand = expand_scale(mult = 0.1))
p2_hist <- ggplot(my_data, aes(x = y)) +
geom_histogram(binwidth = 0.2) +
coord_flip() +
scale_x_continuous(limits = y_range, expand = expand_scale(mult = 0.1))
p2_dens <- ggplot(my_data, aes(x = y)) +
geom_density() +
coord_flip() +
scale_x_continuous(limits = y_range, expand = expand_scale(mult = 0.1))
patchwork allows you to simply add plots to each other, then add the plot_layout function where you can customize the layout.
p1 + p2_hist + plot_layout(nrow = 1)
p1 + p2_dens + plot_layout(nrow = 1)
I've generally seen these types of plots where the distribution is shown in a "marginal" plot—that is, setup to be secondary to the main (in this case, line) plot. The ggExtra package has a marginal plot, but it only seems to work where the main plot is a scatterplot.
To do this styling manually, I'm setting theme arguments on each plot inline as I pass them to plot_layout. I took off the axis markings from the histogram so its left side is clean, and shrunk the margins on the sides of the two plots that meet. In plot_layout, I'm scaling the widths so the histogram appears more in the margins of the line chart. The same could be done with the density plot.
(p1 +
theme(plot.margin = margin(r = 0, unit = "pt"))
) +
(p2_hist +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
plot.margin = margin(l = 0, unit = "pt"))
) +
plot_layout(nrow = 1, widths = c(1, 0.2))
Created on 2019-01-28 by the reprex package (v0.2.1)
You can try using geom_histogram or geom_density, however it's a little bit complicated as you have to rotate axis for them (while keeping original orientation for geom_line). I would use geom_violin (which is a density plot, but mirrored). If you want to get only one sided violin plot you can use custom geom_flat_violin geom. It was first posted by #David Robinson on his gists.
I used this geom in different answer, however I don't think that it's a duplicate as you need to put it at the end of the plot and combine with different geom.
Final code is:
library(ggplot2)
ggplot(data.frame(x = seq(100), y = rnorm(100))) +
geom_flat_violin(aes(100, y), color = "red", fill = "red", alpha = 0.5, width = 10) +
geom_line(aes(x, y))
geom_flat_violin code:
library(dplyr)
"%||%" <- function(a, b) {
if (!is.null(a)) a else b
}
geom_flat_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
position = "dodge", trim = TRUE, scale = "area",
show.legend = NA, inherit.aes = TRUE, ...) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomFlatViolin,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
trim = trim,
scale = scale,
...
)
)
}
GeomFlatViolin <-
ggproto(
"GeomFlatViolin",
Geom,
setup_data = function(data, params) {
data$width <- data$width %||%
params$width %||% (resolution(data$x, FALSE) * 0.9)
# ymin, ymax, xmin, and xmax define the bounding rectangle for each group
data %>%
dplyr::group_by(.data = ., group) %>%
dplyr::mutate(
.data = .,
ymin = min(y),
ymax = max(y),
xmin = x,
xmax = x + width / 2
)
},
draw_group = function(data, panel_scales, coord)
{
# Find the points for the line to go all the way around
data <- base::transform(data,
xminv = x,
xmaxv = x + violinwidth * (xmax - x))
# Make sure it's sorted properly to draw the outline
newdata <-
base::rbind(
dplyr::arrange(.data = base::transform(data, x = xminv), y),
dplyr::arrange(.data = base::transform(data, x = xmaxv), -y)
)
# Close the polygon: set first and last point the same
# Needed for coord_polar and such
newdata <- rbind(newdata, newdata[1,])
ggplot2:::ggname("geom_flat_violin",
GeomPolygon$draw_panel(newdata, panel_scales, coord))
},
draw_key = draw_key_polygon,
default_aes = ggplot2::aes(
weight = 1,
colour = "grey20",
fill = "white",
size = 0.5,
alpha = NA,
linetype = "solid"
),
required_aes = c("x", "y")
)
You could use egg::ggarrange(). So basically what you want is this:
p <- ggplot(data=my_data, mapping=aes(x=x, y=y)) +
geom_line() + ylim(c(-2, 2))
q <- ggplot(data=my_data, mapping=aes(x=y)) +
geom_histogram(binwidth=.05) + coord_flip() + xlim(c(-2, 2))
egg::ggarrange(p, q, nrow=1)
Result
Data
set.seed(42)
my_data <- data.frame(x=seq(100), rnorm(100))
my_data1 <- count(my_data, vars=c("y"))
p1 <- ggplot(data = my_data, mapping = aes(x = x, y = y)) + geom_line()
p2 <- ggplot(my_data1,aes(x=freq,y=y))+geom_line()+theme(axis.title.y = element_blank(),axis.text.y = element_blank())
grid.draw(cbind(ggplotGrob(p1), ggplotGrob(p2), size = "last"))
Is there a way of creating scatterplots with marginal histograms just like in the sample below in ggplot2? In Matlab it is the scatterhist() function and there exist equivalents for R as well. However, I haven't seen it for ggplot2.
I started an attempt by creating the single graphs but don't know how to arrange them properly.
require(ggplot2)
x<-rnorm(300)
y<-rt(300,df=2)
xy<-data.frame(x,y)
xhist <- qplot(x, geom="histogram") + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 5/16, axis.text.y = theme_blank(), axis.title.y=theme_blank(), background.colour="white")
yhist <- qplot(y, geom="histogram") + coord_flip() + opts(background.fill = "white", background.color ="black")
yhist <- yhist + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 16/5, axis.text.y = theme_blank(), axis.title.y=theme_blank() )
scatter <- qplot(x,y, data=xy) + scale_x_continuous(limits=c(min(x),max(x))) + scale_y_continuous(limits=c(min(y),max(y)))
none <- qplot(x,y, data=xy) + geom_blank()
and arranging them with the function posted here. But to make long story short: Is there a way of creating these graphs?
This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
This might be a bit late, but I decided to make a package (ggExtra) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
Link to ggExtra package
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
The gridExtra package should work here. Start by making each of the ggplot objects:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
One addition, just to save some searching time for people doing this after us.
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
You can correct this by using some of these theme settings,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
I tried those options, but wasn't satisfied by the results or the messy code one would need to use to get there. Lucky me, Thomas Lin Pedersen just developed a package called patchwork, which gets the job done in a pretty elegant manner.
If you want to create a scatterplot with marginal histograms, first you'd have to create those three plots seperately.
library(ggplot2)
x <- rnorm(300)
y <- rt(300, df = 2)
xy <- data.frame(x, y)
plot1 <- ggplot(xy, aes(x = x, y = y)) +
geom_point()
dens1 <- ggplot(xy, aes(x = x)) +
geom_histogram(color = "black", fill = "white") +
theme_void()
dens2 <- ggplot(xy, aes(x = y)) +
geom_histogram(color = "black", fill = "white") +
theme_void() +
coord_flip()
The only thing left to do, is to add those plots with a simple + and specify the layout with the function plot_layout().
library(patchwork)
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(
ncol = 2,
nrow = 2,
widths = c(4, 1),
heights = c(1, 4)
)
The function plot_spacer() adds an empty plot to the top right corner. All the other arguments should be self-explanatory.
Since histograms heavily depend on the chosen binwidth, one might argue to prefer density plots. With some small modifications one would get e.g. for eye tracking data a beautiful plot.
library(ggpubr)
plot1 <- ggplot(df, aes(x = Density, y = Face_sum, color = Group)) +
geom_point(aes(color = Group), size = 3) +
geom_point(shape = 1, color = "black", size = 3) +
stat_smooth(method = "lm", fullrange = TRUE) +
geom_rug() +
scale_y_continuous(name = "Number of fixated faces",
limits = c(0, 205), expand = c(0, 0)) +
scale_x_continuous(name = "Population density (lg10)",
limits = c(1, 4), expand = c(0, 0)) +
theme_pubr() +
theme(legend.position = c(0.15, 0.9))
dens1 <- ggplot(df, aes(x = Density, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none")
dens2 <- ggplot(df, aes(x = Face_sum, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none") +
coord_flip()
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(ncol = 2, nrow = 2, widths = c(4, 1), heights = c(1, 4))
Though the data is not provided at this point, the underlying principles should be clear.
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
It works for both grouped and ungrouped data and accepts additional graphical parameters:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
Iris set marginal histograms scatterplot
You can easily create attractive scatterplots with marginal histograms using ggstatsplot (it will also fit and describe a model):
data(iris)
library(ggstatsplot)
ggscatterstats(
data = iris,
x = Sepal.Length,
y = Sepal.Width,
xlab = "Sepal Length",
ylab = "Sepal Width",
marginal = TRUE,
marginal.type = "histogram",
centrality.para = "mean",
margins = "both",
title = "Relationship between Sepal Length and Sepal Width",
messages = FALSE
)
Or slightly more appealing (by default) ggpubr:
devtools::install_github("kassambara/ggpubr")
library(ggpubr)
ggscatterhist(
iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", # comment out this and last line to remove the split by species
margin.plot = "histogram", # I'd suggest removing this line to get density plots
margin.params = list(fill = "Species", color = "black", size = 0.2)
)
UPDATE:
As suggested by #aickley I used the developmental version to create the plot.
To build on the answer by #alf-pascu, setting up each plot manually and arranging them with cowplot grants a lot of flexibility with respect to both the main and the marginal plots (compared to some of the other solutions). Distributions by groups is one example. Changing the main plot to a 2D-density plot is another.
The following creates a scatterplot with (properly aligned) marginal histograms.
library("ggplot2")
library("cowplot")
# Set up scatterplot
scatterplot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 3, alpha = 0.6) +
guides(color = FALSE) +
theme(plot.margin = margin())
# Define marginal histogram
marginal_distribution <- function(x, var, group) {
ggplot(x, aes_string(x = var, fill = group)) +
geom_histogram(bins = 30, alpha = 0.4, position = "identity") +
# geom_density(alpha = 0.4, size = 0.1) +
guides(fill = FALSE) +
theme_void() +
theme(plot.margin = margin())
}
# Set up marginal histograms
x_hist <- marginal_distribution(iris, "Sepal.Length", "Species")
y_hist <- marginal_distribution(iris, "Sepal.Width", "Species") +
coord_flip()
# Align histograms with scatterplot
aligned_x_hist <- align_plots(x_hist, scatterplot, align = "v")[[1]]
aligned_y_hist <- align_plots(y_hist, scatterplot, align = "h")[[1]]
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, scatterplot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
To plot a 2D-density plot instead, just change the main plot.
# Set up 2D-density plot
contour_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
stat_density_2d(aes(alpha = ..piece..)) +
guides(color = FALSE, alpha = FALSE) +
theme(plot.margin = margin())
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, contour_plot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
This is an old question, but I thought it would be useful to post an update here since I've come across this same problem recently (thanks to Stefanie Mueller for the help!).
The most upvoted answer using gridExtra works, but aligning axes is difficult/hacky, as has been pointed out in the comments. This can now be solved using the command ggMarginal from the ggExtra package, as such:
#load packages
library(tidyverse) #for creating dummy dataset only
library(ggExtra)
#create dummy data
a = round(rnorm(1000,mean=10,sd=6),digits=0)
b = runif(1000,min=1.0,max=1.6)*a
b = b+runif(1000,min=9,max=15)
DummyData <- data.frame(var1 = b, var2 = a) %>%
filter(var1 > 0 & var2 > 0)
#plot
p = ggplot(DummyData, aes(var1, var2)) + geom_point(alpha=0.3)
ggMarginal(p, type = "histogram")
Another solution using ggpubr and cowplot, but here we create plots using cowplot::axis_canvas and add them to original plot with cowplot::insert_xaxis_grob:
library(cowplot)
library(ggpubr)
# Create main plot
plot_main <- ggplot(faithful, aes(eruptions, waiting)) +
geom_point()
# Create marginal plots
# Use geom_density/histogram for whatever you plotted on x/y axis
plot_x <- axis_canvas(plot_main, axis = "x") +
geom_density(aes(eruptions), faithful)
plot_y <- axis_canvas(plot_main, axis = "y", coord_flip = TRUE) +
geom_density(aes(waiting), faithful) +
coord_flip()
# Combine all plots into one
plot_final <- insert_xaxis_grob(plot_main, plot_x, position = "top")
plot_final <- insert_yaxis_grob(plot_final, plot_y, position = "right")
ggdraw(plot_final)
Nowadays, there is at least one CRAN package that makes the scatterplot with its marginal histograms.
library(psych)
scatterHist(rnorm(1000), runif(1000))
You can use the interactive form of ggExtra::ggMarginalGadget(yourplot) and choose between boxplots, violin plots, density plots and histograms whit easy.
like that
I know that when you use par( fig=c( ... ), new=T ), you can create inset graphs. However, I was wondering if it is possible to use ggplot2 library to create 'inset' graphs.
UPDATE 1: I tried using the par() with ggplot2, but it does not work.
UPDATE 2: I found a working solution at ggplot2 GoogleGroups using grid::viewport().
Section 8.4 of the book explains how to do this. The trick is to use the grid package's viewports.
#Any old plot
a_plot <- ggplot(cars, aes(speed, dist)) + geom_line()
#A viewport taking up a fraction of the plot area
vp <- viewport(width = 0.4, height = 0.4, x = 0.8, y = 0.2)
#Just draw the plot twice
png("test.png")
print(a_plot)
print(a_plot, vp = vp)
dev.off()
Much simpler solution utilizing ggplot2 and egg. Most importantly this solution works with ggsave.
library(ggplot2)
library(egg)
plotx <- ggplot(mpg, aes(displ, hwy)) + geom_point()
plotx +
annotation_custom(
ggplotGrob(plotx),
xmin = 5, xmax = 7, ymin = 30, ymax = 44
)
ggsave(filename = "inset-plot.png")
Alternatively, can use the cowplot R package by Claus O. Wilke (cowplot is a powerful extension of ggplot2). The author has an example about plotting an inset inside a larger graph in this intro vignette. Here is some adapted code:
library(cowplot)
main.plot <-
ggplot(data = mpg, aes(x = cty, y = hwy, colour = factor(cyl))) +
geom_point(size = 2.5)
inset.plot <- main.plot + theme(legend.position = "none")
plot.with.inset <-
ggdraw() +
draw_plot(main.plot) +
draw_plot(inset.plot, x = 0.07, y = .7, width = .3, height = .3)
# Can save the plot with ggsave()
ggsave(filename = "plot.with.inset.png",
plot = plot.with.inset,
width = 17,
height = 12,
units = "cm",
dpi = 300)
I prefer solutions that work with ggsave. After a lot of googling around I ended up with this (which is a general formula for positioning and sizing the plot that you insert.
library(tidyverse)
plot1 = qplot(1.00*mpg, 1.00*wt, data=mtcars) # Make sure x and y values are floating values in plot 1
plot2 = qplot(hp, cyl, data=mtcars)
plot(plot1)
# Specify position of plot2 (in percentages of plot1)
# This is in the top left and 25% width and 25% height
xleft = 0.05
xright = 0.30
ybottom = 0.70
ytop = 0.95
# Calculate position in plot1 coordinates
# Extract x and y values from plot1
l1 = ggplot_build(plot1)
x1 = l1$layout$panel_ranges[[1]]$x.range[1]
x2 = l1$layout$panel_ranges[[1]]$x.range[2]
y1 = l1$layout$panel_ranges[[1]]$y.range[1]
y2 = l1$layout$panel_ranges[[1]]$y.range[2]
xdif = x2-x1
ydif = y2-y1
xmin = x1 + (xleft*xdif)
xmax = x1 + (xright*xdif)
ymin = y1 + (ybottom*ydif)
ymax = y1 + (ytop*ydif)
# Get plot2 and make grob
g2 = ggplotGrob(plot2)
plot3 = plot1 + annotation_custom(grob = g2, xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax)
plot(plot3)
ggsave(filename = "test.png", plot = plot3)
# Try and make a weird combination of plots
g1 <- ggplotGrob(plot1)
g2 <- ggplotGrob(plot2)
g3 <- ggplotGrob(plot3)
library(gridExtra)
library(grid)
t1 = arrangeGrob(g1,ncol=1, left = textGrob("A", y = 1, vjust=1, gp=gpar(fontsize=20)))
t2 = arrangeGrob(g2,ncol=1, left = textGrob("B", y = 1, vjust=1, gp=gpar(fontsize=20)))
t3 = arrangeGrob(g3,ncol=1, left = textGrob("C", y = 1, vjust=1, gp=gpar(fontsize=20)))
final = arrangeGrob(t1,t2,t3, layout_matrix = cbind(c(1,2), c(3,3)))
grid.arrange(final)
ggsave(filename = "test2.png", plot = final)
'ggplot2' >= 3.0.0 makes possible new approaches for adding insets, as now tibble objects containing lists as member columns can be passed as data. The objects in the list column can be even whole ggplots... The latest version of my package 'ggpmisc' provides geom_plot(), geom_table() and geom_grob(), and also versions that use npc units instead of native data units for locating the insets. These geoms can add multiple insets per call and obey faceting, which annotation_custom() does not. I copy the example from the help page, which adds an inset with a zoom-in detail of the main plot as an inset.
library(tibble)
library(ggpmisc)
p <-
ggplot(data = mtcars, mapping = aes(wt, mpg)) +
geom_point()
df <- tibble(x = 0.01, y = 0.01,
plot = list(p +
coord_cartesian(xlim = c(3, 4),
ylim = c(13, 16)) +
labs(x = NULL, y = NULL) +
theme_bw(10)))
p +
expand_limits(x = 0, y = 0) +
geom_plot_npc(data = df, aes(npcx = x, npcy = y, label = plot))
Or a barplot as inset, taken from the package vignette.
library(tibble)
library(ggpmisc)
p <- ggplot(mpg, aes(factor(cyl), hwy, fill = factor(cyl))) +
stat_summary(geom = "col", fun.y = mean, width = 2/3) +
labs(x = "Number of cylinders", y = NULL, title = "Means") +
scale_fill_discrete(guide = FALSE)
data.tb <- tibble(x = 7, y = 44,
plot = list(p +
theme_bw(8)))
ggplot(mpg, aes(displ, hwy, colour = factor(cyl))) +
geom_plot(data = data.tb, aes(x, y, label = plot)) +
geom_point() +
labs(x = "Engine displacement (l)", y = "Fuel use efficiency (MPG)",
colour = "Engine cylinders\n(number)") +
theme_bw()
The next example shows how to add different inset plots to different panels in a faceted plot. The next example uses the same example data after splitting it according to the century. This particular data set once split adds the problem of one missing level in one of the inset plots. As these plots are built on their own we need to use manual scales to make sure the colors and fill are consistent across the plots. With other data sets this may not be needed.
library(tibble)
library(ggpmisc)
my.mpg <- mpg
my.mpg$century <- factor(ifelse(my.mpg$year < 2000, "XX", "XXI"))
my.mpg$cyl.f <- factor(my.mpg$cyl)
my_scale_fill <- scale_fill_manual(guide = FALSE,
values = c("red", "orange", "darkgreen", "blue"),
breaks = levels(my.mpg$cyl.f))
p1 <- ggplot(subset(my.mpg, century == "XX"),
aes(factor(cyl), hwy, fill = cyl.f)) +
stat_summary(geom = "col", fun = mean, width = 2/3) +
labs(x = "Number of cylinders", y = NULL, title = "Means") +
my_scale_fill
p2 <- ggplot(subset(my.mpg, century == "XXI"),
aes(factor(cyl), hwy, fill = cyl.f)) +
stat_summary(geom = "col", fun = mean, width = 2/3) +
labs(x = "Number of cylinders", y = NULL, title = "Means") +
my_scale_fill
data.tb <- tibble(x = c(7, 7),
y = c(44, 44),
century = factor(c("XX", "XXI")),
plot = list(p1, p2))
ggplot() +
geom_plot(data = data.tb, aes(x, y, label = plot)) +
geom_point(data = my.mpg, aes(displ, hwy, colour = cyl.f)) +
labs(x = "Engine displacement (l)", y = "Fuel use efficiency (MPG)",
colour = "Engine cylinders\n(number)") +
scale_colour_manual(guide = FALSE,
values = c("red", "orange", "darkgreen", "blue"),
breaks = levels(my.mpg$cyl.f)) +
facet_wrap(~century, ncol = 1)
In 2019, the patchwork package entered the stage, with which you can create
insets
easily by using the inset_element() function:
require(ggplot2)
require(patchwork)
gg1 = ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point()
gg2 = ggplot(iris, aes(Sepal.Length)) +
geom_density()
gg1 +
inset_element(gg2, left = 0.65, bottom = 0.75, right = 1, top = 1)