I am trying to display some information about the data below the plot created in ggplot2. I would like to plot the N variable using the X axis coordinate of the plot but the Y coordinate needs to be 10% from the bottom of the screen . In fact, the desired Y coordinates are already in the data frame as y_pos variable.
I can think of 3 approaches using ggplot2:
1) Create an empty plot below the actual plot, use the same scale and then use geom_text to plot the data over the blank plot. This approach sort of works but is extremely complicated.
2) Use geom_text to plot the data but somehow use y coordinate as percent of the screen (10%). This would force the numbers to be displayed below the plot. I can't figure out the proper syntax.
3) Use grid.text to display the text. I can easily set it at the 10% from the bottom of the screen but I can't figure how set the X coordindate to match the plot. I tried to use grconvert to capture the initial X position but could not get that to work as well.
Below is the basic plot with the dummy data:
graphics.off() # close graphics windows
library(car)
library(ggplot2) #load ggplot
library(gridExtra) #load Grid
library(RGraphics) # support of the "R graphics" book, on CRAN
#create dummy data
test= data.frame(
Group = c("A", "B", "A","B", "A", "B"),
x = c(1 ,1,2,2,3,3 ),
y = c(33,25,27,36,43,25),
n=c(71,55,65,58,65,58),
y_pos=c(9,6,9,6,9,6)
)
#create ggplot
p1 <- qplot(x, y, data=test, colour=Group) +
ylab("Mean change from baseline") +
geom_line()+
scale_x_continuous("Weeks", breaks=seq(-1,3, by = 1) ) +
opts(
legend.position=c(.1,0.9))
#display plot
p1
The modified gplot below displays numbers of subjects, however they are displayed WITHIN the plot. They force the Y scale to be extended. I would like to display these numbers BELOW the plot.
p1 <- qplot(x, y, data=test, colour=Group) +
ylab("Mean change from baseline") +
geom_line()+
scale_x_continuous("Weeks", breaks=seq(-1,3, by = 1) ) +
opts( plot.margin = unit(c(0,2,2,1), "lines"),
legend.position=c(.1,0.9))+
geom_text(data = test,aes(x=x,y=y_pos,label=n))
p1
A different approach of displaying the numbers involves creating a dummy plot below the actual plot. Here is the code:
graphics.off() # close graphics windows
library(car)
library(ggplot2) #load ggplot
library(gridExtra) #load Grid
library(RGraphics) # support of the "R graphics" book, on CRAN
#create dummy data
test= data.frame(
group = c("A", "B", "A","B", "A", "B"),
x = c(1 ,1,2,2,3,3 ),
y = c(33,25,27,36,43,25),
n=c(71,55,65,58,65,58),
y_pos=c(15,6,15,6,15,6)
)
p1 <- qplot(x, y, data=test, colour=group) +
ylab("Mean change from baseline") +
opts(plot.margin = unit(c(1,2,-1,1), "lines")) +
geom_line()+
scale_x_continuous("Weeks", breaks=seq(-1,3, by = 1) ) +
opts(legend.position="bottom",
legend.title=theme_blank(),
title.text="Line plot using GGPLOT")
p1
p2 <- qplot(x, y, data=test, geom="blank")+
ylab(" ")+
opts( plot.margin = unit(c(0,2,-2,1), "lines"),
axis.line = theme_blank(),
axis.ticks = theme_segment(colour = "white"),
axis.text.x=theme_text(angle=-90,colour="white"),
axis.text.y=theme_text(angle=-90,colour="white"),
panel.background = theme_rect(fill = "transparent",colour = NA),
panel.grid.minor = theme_blank(),
panel.grid.major = theme_blank()
)+
geom_text(data = test,aes(x=x,y=y_pos,label=n))
p2
grid.arrange(p1, p2, heights = c(8.5, 1.5), nrow=2 )
However, that is very complicated and would be hard to modify for different data. Ideally, I'd like to be able to pass Y coordinates as percent of the screen.
The current version (>2.1) has a + labs(caption = "text"), which displays an annotation below the plot. This is themeable (font properties,... left/right aligned). See https://github.com/hadley/ggplot2/pull/1582 for examples.
Edited opts has been deprecated, replaced by theme; element_blank has replaced theme_blank; and ggtitle() is used in place of opts(title = ...
Sandy- thank you so much!!!! This does exactly what I want. I do wish we could control the clipping in geom.text or geom.annotate.
I put together the following program if anybody else is interested.
rm(list = ls()) # clear objects
graphics.off() # close graphics windows
library(ggplot2)
library(gridExtra)
#create dummy data
test= data.frame(
group = c("Group 1", "Group 1", "Group 1","Group 2", "Group 2", "Group 2"),
x = c(1 ,2,3,1,2,3 ),
y = c(33,25,27,36,23,25),
n=c(71,55,65,58,65,58),
ypos=c(18,18,18,17,17,17)
)
p1 <- qplot(x=x, y=y, data=test, colour=group) +
ylab("Mean change from baseline") +
theme(plot.margin = unit(c(1,3,8,1), "lines")) +
geom_line()+
scale_x_continuous("Visits", breaks=seq(-1,3) ) +
theme(legend.position="bottom",
legend.title=element_blank())+
ggtitle("Line plot")
# Create the textGrobs
for (ii in 1:nrow(test))
{
#display numbers at each visit
p1=p1+ annotation_custom(grob = textGrob(test$n[ii]),
xmin = test$x[ii],
xmax = test$x[ii],
ymin = test$ypos[ii],
ymax = test$ypos[ii])
#display group text
if (ii %in% c(1,4)) #there is probably a better way
{
p1=p1+ annotation_custom(grob = textGrob(test$group[ii]),
xmin = 0.85,
xmax = 0.85,
ymin = test$ypos[ii],
ymax = test$ypos[ii])
}
}
# Code to override clipping
gt <- ggplot_gtable(ggplot_build(p1))
gt$layout$clip[gt$layout$name=="panel"] <- "off"
grid.draw(gt)
Updated opts() has been replaced with theme()
In the code below, a base plot is drawn, with a wider margin at the bottom of the plot. The textGrob is created, then inserted into the plot using annotation_custom(). Except the text is not visible because it is outside the plot panel - the output is clipped to the panel. But using baptiste's code from here, the clipping can be overrridden. The position is in terms of data units, and both text labels are centred.
library(ggplot2)
library(grid)
# Base plot
df = data.frame(x=seq(1:10), y = seq(1:10))
p = ggplot(data = df, aes(x = x, y = y)) + geom_point() + ylim(0,10) +
theme(plot.margin = unit(c(1,1,3,1), "cm"))
p
# Create the textGrobs
Text1 = textGrob(paste("Largest x-value is", round(max(df$x), 2), sep = " "))
Text2 = textGrob(paste("Mean = ", mean(df$x), sep = ""))
p1 = p + annotation_custom(grob = Text1, xmin = 4, xmax = 4, ymin = -3, ymax = -3) +
annotation_custom(grob = Text2, xmin = 8, xmax = 8, ymin = -3, ymax = -3)
p1
# Code to override clipping
gt <- ggplotGrob(p1)
gt$layout$clip[gt$layout$name=="panel"] <- "off"
grid.draw(gt)
Or, using grid functions to create and position the label.
p
grid.text((paste("Largest x-value is", max(df$x), sep = " ")),
x = unit(.2, "npc"), y = unit(.1, "npc"), just = c("left", "bottom"),
gp = gpar(fontface = "bold", fontsize = 18, col = "blue"))
Edit
Or, add text grob using gtable functions.
library(ggplot2)
library(grid)
library(gtable)
# Base plot
df = data.frame(x=seq(1:10), y = seq(1:10))
p = ggplot(data = df, aes(x = x, y = y)) + geom_point() + ylim(0,10)
# Construct the text grob
lab = textGrob((paste("Largest x-value is", max(df$x), sep = " ")),
x = unit(.1, "npc"), just = c("left"),
gp = gpar(fontface = "bold", fontsize = 18, col = "blue"))
gp = ggplotGrob(p)
# Add a row below the 2nd from the bottom
gp = gtable_add_rows(gp, unit(2, "grobheight", lab), -2)
# Add 'lab' grob to that row, under the plot panel
gp = gtable_add_grob(gp, lab, t = -2, l = gp$layout[gp$layout$name == "panel",]$l)
grid.newpage()
grid.draw(gp)
Actually the best answer and easiest solution is to use the cowplot package.
Version 0.5.0 of the cowplot package (on CRAN) handles ggplot2 subtitles using the add_sub function.
Use it like so:
diamondsCubed <-ggplot(aes(carat, price), data = diamonds) +
geom_point() +
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
breaks = c(0.2, 0.5, 1, 2, 3)) +
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
breaks = c(350, 1000, 5000, 10000, 15000)) +
ggtitle('Price log10 by Cube-Root of Carat') +
theme_xkcd()
ggdraw(add_sub(diamondsCubed, "This is an annotation.\nAnnotations can span multiple lines."))
Related
Is there a way of creating scatterplots with marginal histograms just like in the sample below in ggplot2? In Matlab it is the scatterhist() function and there exist equivalents for R as well. However, I haven't seen it for ggplot2.
I started an attempt by creating the single graphs but don't know how to arrange them properly.
require(ggplot2)
x<-rnorm(300)
y<-rt(300,df=2)
xy<-data.frame(x,y)
xhist <- qplot(x, geom="histogram") + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 5/16, axis.text.y = theme_blank(), axis.title.y=theme_blank(), background.colour="white")
yhist <- qplot(y, geom="histogram") + coord_flip() + opts(background.fill = "white", background.color ="black")
yhist <- yhist + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 16/5, axis.text.y = theme_blank(), axis.title.y=theme_blank() )
scatter <- qplot(x,y, data=xy) + scale_x_continuous(limits=c(min(x),max(x))) + scale_y_continuous(limits=c(min(y),max(y)))
none <- qplot(x,y, data=xy) + geom_blank()
and arranging them with the function posted here. But to make long story short: Is there a way of creating these graphs?
This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
This might be a bit late, but I decided to make a package (ggExtra) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
Link to ggExtra package
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
The gridExtra package should work here. Start by making each of the ggplot objects:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
One addition, just to save some searching time for people doing this after us.
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
You can correct this by using some of these theme settings,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
I tried those options, but wasn't satisfied by the results or the messy code one would need to use to get there. Lucky me, Thomas Lin Pedersen just developed a package called patchwork, which gets the job done in a pretty elegant manner.
If you want to create a scatterplot with marginal histograms, first you'd have to create those three plots seperately.
library(ggplot2)
x <- rnorm(300)
y <- rt(300, df = 2)
xy <- data.frame(x, y)
plot1 <- ggplot(xy, aes(x = x, y = y)) +
geom_point()
dens1 <- ggplot(xy, aes(x = x)) +
geom_histogram(color = "black", fill = "white") +
theme_void()
dens2 <- ggplot(xy, aes(x = y)) +
geom_histogram(color = "black", fill = "white") +
theme_void() +
coord_flip()
The only thing left to do, is to add those plots with a simple + and specify the layout with the function plot_layout().
library(patchwork)
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(
ncol = 2,
nrow = 2,
widths = c(4, 1),
heights = c(1, 4)
)
The function plot_spacer() adds an empty plot to the top right corner. All the other arguments should be self-explanatory.
Since histograms heavily depend on the chosen binwidth, one might argue to prefer density plots. With some small modifications one would get e.g. for eye tracking data a beautiful plot.
library(ggpubr)
plot1 <- ggplot(df, aes(x = Density, y = Face_sum, color = Group)) +
geom_point(aes(color = Group), size = 3) +
geom_point(shape = 1, color = "black", size = 3) +
stat_smooth(method = "lm", fullrange = TRUE) +
geom_rug() +
scale_y_continuous(name = "Number of fixated faces",
limits = c(0, 205), expand = c(0, 0)) +
scale_x_continuous(name = "Population density (lg10)",
limits = c(1, 4), expand = c(0, 0)) +
theme_pubr() +
theme(legend.position = c(0.15, 0.9))
dens1 <- ggplot(df, aes(x = Density, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none")
dens2 <- ggplot(df, aes(x = Face_sum, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none") +
coord_flip()
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(ncol = 2, nrow = 2, widths = c(4, 1), heights = c(1, 4))
Though the data is not provided at this point, the underlying principles should be clear.
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
It works for both grouped and ungrouped data and accepts additional graphical parameters:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
Iris set marginal histograms scatterplot
You can easily create attractive scatterplots with marginal histograms using ggstatsplot (it will also fit and describe a model):
data(iris)
library(ggstatsplot)
ggscatterstats(
data = iris,
x = Sepal.Length,
y = Sepal.Width,
xlab = "Sepal Length",
ylab = "Sepal Width",
marginal = TRUE,
marginal.type = "histogram",
centrality.para = "mean",
margins = "both",
title = "Relationship between Sepal Length and Sepal Width",
messages = FALSE
)
Or slightly more appealing (by default) ggpubr:
devtools::install_github("kassambara/ggpubr")
library(ggpubr)
ggscatterhist(
iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", # comment out this and last line to remove the split by species
margin.plot = "histogram", # I'd suggest removing this line to get density plots
margin.params = list(fill = "Species", color = "black", size = 0.2)
)
UPDATE:
As suggested by #aickley I used the developmental version to create the plot.
To build on the answer by #alf-pascu, setting up each plot manually and arranging them with cowplot grants a lot of flexibility with respect to both the main and the marginal plots (compared to some of the other solutions). Distributions by groups is one example. Changing the main plot to a 2D-density plot is another.
The following creates a scatterplot with (properly aligned) marginal histograms.
library("ggplot2")
library("cowplot")
# Set up scatterplot
scatterplot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 3, alpha = 0.6) +
guides(color = FALSE) +
theme(plot.margin = margin())
# Define marginal histogram
marginal_distribution <- function(x, var, group) {
ggplot(x, aes_string(x = var, fill = group)) +
geom_histogram(bins = 30, alpha = 0.4, position = "identity") +
# geom_density(alpha = 0.4, size = 0.1) +
guides(fill = FALSE) +
theme_void() +
theme(plot.margin = margin())
}
# Set up marginal histograms
x_hist <- marginal_distribution(iris, "Sepal.Length", "Species")
y_hist <- marginal_distribution(iris, "Sepal.Width", "Species") +
coord_flip()
# Align histograms with scatterplot
aligned_x_hist <- align_plots(x_hist, scatterplot, align = "v")[[1]]
aligned_y_hist <- align_plots(y_hist, scatterplot, align = "h")[[1]]
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, scatterplot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
To plot a 2D-density plot instead, just change the main plot.
# Set up 2D-density plot
contour_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
stat_density_2d(aes(alpha = ..piece..)) +
guides(color = FALSE, alpha = FALSE) +
theme(plot.margin = margin())
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, contour_plot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
This is an old question, but I thought it would be useful to post an update here since I've come across this same problem recently (thanks to Stefanie Mueller for the help!).
The most upvoted answer using gridExtra works, but aligning axes is difficult/hacky, as has been pointed out in the comments. This can now be solved using the command ggMarginal from the ggExtra package, as such:
#load packages
library(tidyverse) #for creating dummy dataset only
library(ggExtra)
#create dummy data
a = round(rnorm(1000,mean=10,sd=6),digits=0)
b = runif(1000,min=1.0,max=1.6)*a
b = b+runif(1000,min=9,max=15)
DummyData <- data.frame(var1 = b, var2 = a) %>%
filter(var1 > 0 & var2 > 0)
#plot
p = ggplot(DummyData, aes(var1, var2)) + geom_point(alpha=0.3)
ggMarginal(p, type = "histogram")
Another solution using ggpubr and cowplot, but here we create plots using cowplot::axis_canvas and add them to original plot with cowplot::insert_xaxis_grob:
library(cowplot)
library(ggpubr)
# Create main plot
plot_main <- ggplot(faithful, aes(eruptions, waiting)) +
geom_point()
# Create marginal plots
# Use geom_density/histogram for whatever you plotted on x/y axis
plot_x <- axis_canvas(plot_main, axis = "x") +
geom_density(aes(eruptions), faithful)
plot_y <- axis_canvas(plot_main, axis = "y", coord_flip = TRUE) +
geom_density(aes(waiting), faithful) +
coord_flip()
# Combine all plots into one
plot_final <- insert_xaxis_grob(plot_main, plot_x, position = "top")
plot_final <- insert_yaxis_grob(plot_final, plot_y, position = "right")
ggdraw(plot_final)
Nowadays, there is at least one CRAN package that makes the scatterplot with its marginal histograms.
library(psych)
scatterHist(rnorm(1000), runif(1000))
You can use the interactive form of ggExtra::ggMarginalGadget(yourplot) and choose between boxplots, violin plots, density plots and histograms whit easy.
like that
Is there a way of creating scatterplots with marginal histograms just like in the sample below in ggplot2? In Matlab it is the scatterhist() function and there exist equivalents for R as well. However, I haven't seen it for ggplot2.
I started an attempt by creating the single graphs but don't know how to arrange them properly.
require(ggplot2)
x<-rnorm(300)
y<-rt(300,df=2)
xy<-data.frame(x,y)
xhist <- qplot(x, geom="histogram") + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 5/16, axis.text.y = theme_blank(), axis.title.y=theme_blank(), background.colour="white")
yhist <- qplot(y, geom="histogram") + coord_flip() + opts(background.fill = "white", background.color ="black")
yhist <- yhist + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 16/5, axis.text.y = theme_blank(), axis.title.y=theme_blank() )
scatter <- qplot(x,y, data=xy) + scale_x_continuous(limits=c(min(x),max(x))) + scale_y_continuous(limits=c(min(y),max(y)))
none <- qplot(x,y, data=xy) + geom_blank()
and arranging them with the function posted here. But to make long story short: Is there a way of creating these graphs?
This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
This might be a bit late, but I decided to make a package (ggExtra) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
Link to ggExtra package
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
The gridExtra package should work here. Start by making each of the ggplot objects:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
One addition, just to save some searching time for people doing this after us.
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
You can correct this by using some of these theme settings,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
I tried those options, but wasn't satisfied by the results or the messy code one would need to use to get there. Lucky me, Thomas Lin Pedersen just developed a package called patchwork, which gets the job done in a pretty elegant manner.
If you want to create a scatterplot with marginal histograms, first you'd have to create those three plots seperately.
library(ggplot2)
x <- rnorm(300)
y <- rt(300, df = 2)
xy <- data.frame(x, y)
plot1 <- ggplot(xy, aes(x = x, y = y)) +
geom_point()
dens1 <- ggplot(xy, aes(x = x)) +
geom_histogram(color = "black", fill = "white") +
theme_void()
dens2 <- ggplot(xy, aes(x = y)) +
geom_histogram(color = "black", fill = "white") +
theme_void() +
coord_flip()
The only thing left to do, is to add those plots with a simple + and specify the layout with the function plot_layout().
library(patchwork)
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(
ncol = 2,
nrow = 2,
widths = c(4, 1),
heights = c(1, 4)
)
The function plot_spacer() adds an empty plot to the top right corner. All the other arguments should be self-explanatory.
Since histograms heavily depend on the chosen binwidth, one might argue to prefer density plots. With some small modifications one would get e.g. for eye tracking data a beautiful plot.
library(ggpubr)
plot1 <- ggplot(df, aes(x = Density, y = Face_sum, color = Group)) +
geom_point(aes(color = Group), size = 3) +
geom_point(shape = 1, color = "black", size = 3) +
stat_smooth(method = "lm", fullrange = TRUE) +
geom_rug() +
scale_y_continuous(name = "Number of fixated faces",
limits = c(0, 205), expand = c(0, 0)) +
scale_x_continuous(name = "Population density (lg10)",
limits = c(1, 4), expand = c(0, 0)) +
theme_pubr() +
theme(legend.position = c(0.15, 0.9))
dens1 <- ggplot(df, aes(x = Density, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none")
dens2 <- ggplot(df, aes(x = Face_sum, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none") +
coord_flip()
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(ncol = 2, nrow = 2, widths = c(4, 1), heights = c(1, 4))
Though the data is not provided at this point, the underlying principles should be clear.
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
It works for both grouped and ungrouped data and accepts additional graphical parameters:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
Iris set marginal histograms scatterplot
You can easily create attractive scatterplots with marginal histograms using ggstatsplot (it will also fit and describe a model):
data(iris)
library(ggstatsplot)
ggscatterstats(
data = iris,
x = Sepal.Length,
y = Sepal.Width,
xlab = "Sepal Length",
ylab = "Sepal Width",
marginal = TRUE,
marginal.type = "histogram",
centrality.para = "mean",
margins = "both",
title = "Relationship between Sepal Length and Sepal Width",
messages = FALSE
)
Or slightly more appealing (by default) ggpubr:
devtools::install_github("kassambara/ggpubr")
library(ggpubr)
ggscatterhist(
iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", # comment out this and last line to remove the split by species
margin.plot = "histogram", # I'd suggest removing this line to get density plots
margin.params = list(fill = "Species", color = "black", size = 0.2)
)
UPDATE:
As suggested by #aickley I used the developmental version to create the plot.
To build on the answer by #alf-pascu, setting up each plot manually and arranging them with cowplot grants a lot of flexibility with respect to both the main and the marginal plots (compared to some of the other solutions). Distributions by groups is one example. Changing the main plot to a 2D-density plot is another.
The following creates a scatterplot with (properly aligned) marginal histograms.
library("ggplot2")
library("cowplot")
# Set up scatterplot
scatterplot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 3, alpha = 0.6) +
guides(color = FALSE) +
theme(plot.margin = margin())
# Define marginal histogram
marginal_distribution <- function(x, var, group) {
ggplot(x, aes_string(x = var, fill = group)) +
geom_histogram(bins = 30, alpha = 0.4, position = "identity") +
# geom_density(alpha = 0.4, size = 0.1) +
guides(fill = FALSE) +
theme_void() +
theme(plot.margin = margin())
}
# Set up marginal histograms
x_hist <- marginal_distribution(iris, "Sepal.Length", "Species")
y_hist <- marginal_distribution(iris, "Sepal.Width", "Species") +
coord_flip()
# Align histograms with scatterplot
aligned_x_hist <- align_plots(x_hist, scatterplot, align = "v")[[1]]
aligned_y_hist <- align_plots(y_hist, scatterplot, align = "h")[[1]]
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, scatterplot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
To plot a 2D-density plot instead, just change the main plot.
# Set up 2D-density plot
contour_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
stat_density_2d(aes(alpha = ..piece..)) +
guides(color = FALSE, alpha = FALSE) +
theme(plot.margin = margin())
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, contour_plot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
This is an old question, but I thought it would be useful to post an update here since I've come across this same problem recently (thanks to Stefanie Mueller for the help!).
The most upvoted answer using gridExtra works, but aligning axes is difficult/hacky, as has been pointed out in the comments. This can now be solved using the command ggMarginal from the ggExtra package, as such:
#load packages
library(tidyverse) #for creating dummy dataset only
library(ggExtra)
#create dummy data
a = round(rnorm(1000,mean=10,sd=6),digits=0)
b = runif(1000,min=1.0,max=1.6)*a
b = b+runif(1000,min=9,max=15)
DummyData <- data.frame(var1 = b, var2 = a) %>%
filter(var1 > 0 & var2 > 0)
#plot
p = ggplot(DummyData, aes(var1, var2)) + geom_point(alpha=0.3)
ggMarginal(p, type = "histogram")
Another solution using ggpubr and cowplot, but here we create plots using cowplot::axis_canvas and add them to original plot with cowplot::insert_xaxis_grob:
library(cowplot)
library(ggpubr)
# Create main plot
plot_main <- ggplot(faithful, aes(eruptions, waiting)) +
geom_point()
# Create marginal plots
# Use geom_density/histogram for whatever you plotted on x/y axis
plot_x <- axis_canvas(plot_main, axis = "x") +
geom_density(aes(eruptions), faithful)
plot_y <- axis_canvas(plot_main, axis = "y", coord_flip = TRUE) +
geom_density(aes(waiting), faithful) +
coord_flip()
# Combine all plots into one
plot_final <- insert_xaxis_grob(plot_main, plot_x, position = "top")
plot_final <- insert_yaxis_grob(plot_final, plot_y, position = "right")
ggdraw(plot_final)
Nowadays, there is at least one CRAN package that makes the scatterplot with its marginal histograms.
library(psych)
scatterHist(rnorm(1000), runif(1000))
You can use the interactive form of ggExtra::ggMarginalGadget(yourplot) and choose between boxplots, violin plots, density plots and histograms whit easy.
like that
Title pretty well covers it.
I have two legends, relating to size and colour, and wish to have one,say, on the top and one within the graph.
Is this possible and, if so, how
TIA
It can be done by extracting separate legends from plots, then arranging the legends in the relevant plot. The code here uses functions from the gtable package to do the extraction, then functions from the gridExtra package to do the arranging. The aim is to have a plot that contains a color legend and a size legend. First, extract the colour legend from a plot that contains the colour legend only. Second, extract the size legend from a plot that contains the size legend only. Third, draw a plot that contains no legend. Fourth, arrange the plot and the two legends into one new plot.
# Some data
df <- data.frame(
x = 1:10,
y = 1:10,
colour = factor(sample(1:3, 10, replace = TRUE)),
size = factor(sample(1:3, 10, replace = TRUE)))
library(ggplot2)
library(gridExtra)
library(gtable)
library(grid)
### Step 1
# Draw a plot with the colour legend
(p1 <- ggplot(data = df, aes(x=x, y=y)) +
geom_point(aes(colour = colour)) +
theme_bw() +
theme(legend.position = "top"))
# Extract the colour legend - leg1
leg1 <- gtable_filter(ggplot_gtable(ggplot_build(p1)), "guide-box")
### Step 2
# Draw a plot with the size legend
(p2 <- ggplot(data = df, aes(x=x, y=y)) +
geom_point(aes(size = size)) +
theme_bw())
# Extract the size legend - leg2
leg2 <- gtable_filter(ggplot_gtable(ggplot_build(p2)), "guide-box")
# Step 3
# Draw a plot with no legends - plot
(plot <- ggplot(data = df, aes(x=x, y=y)) +
geom_point(aes(size = size, colour = colour)) +
theme_bw() +
theme(legend.position = "none"))
### Step 4
# Arrange the three components (plot, leg1, leg2)
# The two legends are positioned outside the plot:
# one at the top and the other to the side.
plotNew <- arrangeGrob(leg1, plot,
heights = unit.c(leg1$height, unit(1, "npc") - leg1$height), ncol = 1)
plotNew <- arrangeGrob(plotNew, leg2,
widths = unit.c(unit(1, "npc") - leg2$width, leg2$width), nrow = 1)
grid.newpage()
grid.draw(plotNew)
# OR, arrange one legend at the top and the other inside the plot.
plotNew <- plot +
annotation_custom(grob = leg2, xmin = 7, xmax = 10, ymin = 0, ymax = 4)
plotNew <- arrangeGrob(leg1, plotNew,
heights = unit.c(leg1$height, unit(1, "npc") - leg1$height), ncol = 1)
grid.newpage()
grid.draw(plotNew)
Using ggplot2and cowplot (= ggplot2 extension).
The approach is similar to Sandy's one as it takes out the legend as seperate objects and lets you do the placement independently. It was primarly designed for multiple legends which belong to two or more plots in a grid of plots.
The idea is as follows:
Create Plot1, Plot2,...,PlotX without legends
Create Plot1, Plot2,...,PlotX with legends
Extract legends from step 1 & 2 into separate objects
Set up legend grid and arrange legends they way you want to
Create grid combining plots and legends
It seems kinda complicated and time/code consuming but set up once, you can adapt and use it for every kind of plot/legend customization.
library(ggplot2)
library(cowplot)
# Some data
df <- data.frame(
Name = factor(rep(c("A", "B", "C"), 12)),
Month = factor(rep(1:12, each = 3)),
Temp = sample(0:40, 12),
Precip = sample(50:400, 12)
)
# 1. create plot1
plot1 <- ggplot(df, aes(Month, Temp, fill = Name)) +
geom_point(
show.legend = F, aes(group = Name, colour = Name),
size = 3, shape = 17
) +
geom_smooth(
method = "loess", se = F,
aes(group = Name, colour = Name),
show.legend = F, size = 0.5, linetype = "dashed"
)
# 2. create plot2
plot2 <- ggplot(df, aes(Month, Precip, fill = Name)) +
geom_bar(stat = "identity", position = "dodge", show.legend = F) +
geom_smooth(
method = "loess", se = F,
aes(group = Name, colour = Name),
show.legend = F, size = 1, linetype = "dashed"
) +
scale_fill_grey()
# 3.1 create legend1
legend1 <- ggplot(df, aes(Month, Temp)) +
geom_point(
show.legend = T, aes(group = Name, colour = Name),
size = 3, shape = 17
) +
geom_smooth(
method = "loess", se = F, aes(group = Name, colour = Name),
show.legend = T, size = 0.5, linetype = "dashed"
) +
labs(colour = "Station") +
theme(
legend.text = element_text(size = 8),
legend.title = element_text(
face = "italic",
angle = -0, size = 10
)
)
# 3.2 create legend2
legend2 <- ggplot(df, aes(Month, Precip, fill = Name)) +
geom_bar(stat = "identity", position = "dodge", show.legend = T) +
scale_fill_grey() +
guides(
fill =
guide_legend(
title = "",
title.theme = element_text(
face = "italic",
angle = -0, size = 10
)
)
) +
theme(legend.text = element_text(size = 8))
# 3.3 extract "legends only" from ggplot object
legend1 <- get_legend(legend1)
legend2 <- get_legend(legend2)
# 4.1 setup legends grid
legend1_grid <- cowplot::plot_grid(legend1, align = "v", nrow = 2)
# 4.2 add second legend to grid, specifying its location
legends <- legend1_grid +
ggplot2::annotation_custom(
grob = legend2,
xmin = 0.5, xmax = 0.5, ymin = 0.55, ymax = 0.55
)
# 5. plot "plots" + "legends" (with legends in between plots)
cowplot::plot_grid(plot1, legends, plot2,
ncol = 3,
rel_widths = c(0.45, 0.1, 0.45)
)
Created on 2019-10-05 by the reprex package (v0.3.0)
Changing the order of the final plot_grid() call moves the legends to the right:
cowplot::plot_grid(plot1, plot2, legends, ncol = 3,
rel_widths = c(0.45, 0.45, 0.1))
From my understanding, basically there is very limited control over legends in ggplot2. Here is a paragraph from the Hadley's book (page 111):
ggplot2 tries to use the smallest possible number of legends that accurately conveys the aesthetics used in the plot. It does this by combining legends if a variable is used with more than one aesthetic. Figure 6.14 shows an example of this for the points geom: if both colour and shape are mapped to the same variable, then only a single legend is necessary. In order for legends to be merged, they must have the same name (the same legend title). For this reason, if you change the name of one of the merged legends, you’ll need to change it for all of them.
I am trying to display some information about the data below the plot created in ggplot2. I would like to plot the N variable using the X axis coordinate of the plot but the Y coordinate needs to be 10% from the bottom of the screen . In fact, the desired Y coordinates are already in the data frame as y_pos variable.
I can think of 3 approaches using ggplot2:
1) Create an empty plot below the actual plot, use the same scale and then use geom_text to plot the data over the blank plot. This approach sort of works but is extremely complicated.
2) Use geom_text to plot the data but somehow use y coordinate as percent of the screen (10%). This would force the numbers to be displayed below the plot. I can't figure out the proper syntax.
3) Use grid.text to display the text. I can easily set it at the 10% from the bottom of the screen but I can't figure how set the X coordindate to match the plot. I tried to use grconvert to capture the initial X position but could not get that to work as well.
Below is the basic plot with the dummy data:
graphics.off() # close graphics windows
library(car)
library(ggplot2) #load ggplot
library(gridExtra) #load Grid
library(RGraphics) # support of the "R graphics" book, on CRAN
#create dummy data
test= data.frame(
Group = c("A", "B", "A","B", "A", "B"),
x = c(1 ,1,2,2,3,3 ),
y = c(33,25,27,36,43,25),
n=c(71,55,65,58,65,58),
y_pos=c(9,6,9,6,9,6)
)
#create ggplot
p1 <- qplot(x, y, data=test, colour=Group) +
ylab("Mean change from baseline") +
geom_line()+
scale_x_continuous("Weeks", breaks=seq(-1,3, by = 1) ) +
opts(
legend.position=c(.1,0.9))
#display plot
p1
The modified gplot below displays numbers of subjects, however they are displayed WITHIN the plot. They force the Y scale to be extended. I would like to display these numbers BELOW the plot.
p1 <- qplot(x, y, data=test, colour=Group) +
ylab("Mean change from baseline") +
geom_line()+
scale_x_continuous("Weeks", breaks=seq(-1,3, by = 1) ) +
opts( plot.margin = unit(c(0,2,2,1), "lines"),
legend.position=c(.1,0.9))+
geom_text(data = test,aes(x=x,y=y_pos,label=n))
p1
A different approach of displaying the numbers involves creating a dummy plot below the actual plot. Here is the code:
graphics.off() # close graphics windows
library(car)
library(ggplot2) #load ggplot
library(gridExtra) #load Grid
library(RGraphics) # support of the "R graphics" book, on CRAN
#create dummy data
test= data.frame(
group = c("A", "B", "A","B", "A", "B"),
x = c(1 ,1,2,2,3,3 ),
y = c(33,25,27,36,43,25),
n=c(71,55,65,58,65,58),
y_pos=c(15,6,15,6,15,6)
)
p1 <- qplot(x, y, data=test, colour=group) +
ylab("Mean change from baseline") +
opts(plot.margin = unit(c(1,2,-1,1), "lines")) +
geom_line()+
scale_x_continuous("Weeks", breaks=seq(-1,3, by = 1) ) +
opts(legend.position="bottom",
legend.title=theme_blank(),
title.text="Line plot using GGPLOT")
p1
p2 <- qplot(x, y, data=test, geom="blank")+
ylab(" ")+
opts( plot.margin = unit(c(0,2,-2,1), "lines"),
axis.line = theme_blank(),
axis.ticks = theme_segment(colour = "white"),
axis.text.x=theme_text(angle=-90,colour="white"),
axis.text.y=theme_text(angle=-90,colour="white"),
panel.background = theme_rect(fill = "transparent",colour = NA),
panel.grid.minor = theme_blank(),
panel.grid.major = theme_blank()
)+
geom_text(data = test,aes(x=x,y=y_pos,label=n))
p2
grid.arrange(p1, p2, heights = c(8.5, 1.5), nrow=2 )
However, that is very complicated and would be hard to modify for different data. Ideally, I'd like to be able to pass Y coordinates as percent of the screen.
The current version (>2.1) has a + labs(caption = "text"), which displays an annotation below the plot. This is themeable (font properties,... left/right aligned). See https://github.com/hadley/ggplot2/pull/1582 for examples.
Edited opts has been deprecated, replaced by theme; element_blank has replaced theme_blank; and ggtitle() is used in place of opts(title = ...
Sandy- thank you so much!!!! This does exactly what I want. I do wish we could control the clipping in geom.text or geom.annotate.
I put together the following program if anybody else is interested.
rm(list = ls()) # clear objects
graphics.off() # close graphics windows
library(ggplot2)
library(gridExtra)
#create dummy data
test= data.frame(
group = c("Group 1", "Group 1", "Group 1","Group 2", "Group 2", "Group 2"),
x = c(1 ,2,3,1,2,3 ),
y = c(33,25,27,36,23,25),
n=c(71,55,65,58,65,58),
ypos=c(18,18,18,17,17,17)
)
p1 <- qplot(x=x, y=y, data=test, colour=group) +
ylab("Mean change from baseline") +
theme(plot.margin = unit(c(1,3,8,1), "lines")) +
geom_line()+
scale_x_continuous("Visits", breaks=seq(-1,3) ) +
theme(legend.position="bottom",
legend.title=element_blank())+
ggtitle("Line plot")
# Create the textGrobs
for (ii in 1:nrow(test))
{
#display numbers at each visit
p1=p1+ annotation_custom(grob = textGrob(test$n[ii]),
xmin = test$x[ii],
xmax = test$x[ii],
ymin = test$ypos[ii],
ymax = test$ypos[ii])
#display group text
if (ii %in% c(1,4)) #there is probably a better way
{
p1=p1+ annotation_custom(grob = textGrob(test$group[ii]),
xmin = 0.85,
xmax = 0.85,
ymin = test$ypos[ii],
ymax = test$ypos[ii])
}
}
# Code to override clipping
gt <- ggplot_gtable(ggplot_build(p1))
gt$layout$clip[gt$layout$name=="panel"] <- "off"
grid.draw(gt)
Updated opts() has been replaced with theme()
In the code below, a base plot is drawn, with a wider margin at the bottom of the plot. The textGrob is created, then inserted into the plot using annotation_custom(). Except the text is not visible because it is outside the plot panel - the output is clipped to the panel. But using baptiste's code from here, the clipping can be overrridden. The position is in terms of data units, and both text labels are centred.
library(ggplot2)
library(grid)
# Base plot
df = data.frame(x=seq(1:10), y = seq(1:10))
p = ggplot(data = df, aes(x = x, y = y)) + geom_point() + ylim(0,10) +
theme(plot.margin = unit(c(1,1,3,1), "cm"))
p
# Create the textGrobs
Text1 = textGrob(paste("Largest x-value is", round(max(df$x), 2), sep = " "))
Text2 = textGrob(paste("Mean = ", mean(df$x), sep = ""))
p1 = p + annotation_custom(grob = Text1, xmin = 4, xmax = 4, ymin = -3, ymax = -3) +
annotation_custom(grob = Text2, xmin = 8, xmax = 8, ymin = -3, ymax = -3)
p1
# Code to override clipping
gt <- ggplotGrob(p1)
gt$layout$clip[gt$layout$name=="panel"] <- "off"
grid.draw(gt)
Or, using grid functions to create and position the label.
p
grid.text((paste("Largest x-value is", max(df$x), sep = " ")),
x = unit(.2, "npc"), y = unit(.1, "npc"), just = c("left", "bottom"),
gp = gpar(fontface = "bold", fontsize = 18, col = "blue"))
Edit
Or, add text grob using gtable functions.
library(ggplot2)
library(grid)
library(gtable)
# Base plot
df = data.frame(x=seq(1:10), y = seq(1:10))
p = ggplot(data = df, aes(x = x, y = y)) + geom_point() + ylim(0,10)
# Construct the text grob
lab = textGrob((paste("Largest x-value is", max(df$x), sep = " ")),
x = unit(.1, "npc"), just = c("left"),
gp = gpar(fontface = "bold", fontsize = 18, col = "blue"))
gp = ggplotGrob(p)
# Add a row below the 2nd from the bottom
gp = gtable_add_rows(gp, unit(2, "grobheight", lab), -2)
# Add 'lab' grob to that row, under the plot panel
gp = gtable_add_grob(gp, lab, t = -2, l = gp$layout[gp$layout$name == "panel",]$l)
grid.newpage()
grid.draw(gp)
Actually the best answer and easiest solution is to use the cowplot package.
Version 0.5.0 of the cowplot package (on CRAN) handles ggplot2 subtitles using the add_sub function.
Use it like so:
diamondsCubed <-ggplot(aes(carat, price), data = diamonds) +
geom_point() +
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
breaks = c(0.2, 0.5, 1, 2, 3)) +
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
breaks = c(350, 1000, 5000, 10000, 15000)) +
ggtitle('Price log10 by Cube-Root of Carat') +
theme_xkcd()
ggdraw(add_sub(diamondsCubed, "This is an annotation.\nAnnotations can span multiple lines."))
Is there a way of creating scatterplots with marginal histograms just like in the sample below in ggplot2? In Matlab it is the scatterhist() function and there exist equivalents for R as well. However, I haven't seen it for ggplot2.
I started an attempt by creating the single graphs but don't know how to arrange them properly.
require(ggplot2)
x<-rnorm(300)
y<-rt(300,df=2)
xy<-data.frame(x,y)
xhist <- qplot(x, geom="histogram") + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 5/16, axis.text.y = theme_blank(), axis.title.y=theme_blank(), background.colour="white")
yhist <- qplot(y, geom="histogram") + coord_flip() + opts(background.fill = "white", background.color ="black")
yhist <- yhist + scale_x_continuous(limits=c(min(x),max(x))) + opts(axis.text.x = theme_blank(), axis.title.x=theme_blank(), axis.ticks = theme_blank(), aspect.ratio = 16/5, axis.text.y = theme_blank(), axis.title.y=theme_blank() )
scatter <- qplot(x,y, data=xy) + scale_x_continuous(limits=c(min(x),max(x))) + scale_y_continuous(limits=c(min(y),max(y)))
none <- qplot(x,y, data=xy) + geom_blank()
and arranging them with the function posted here. But to make long story short: Is there a way of creating these graphs?
This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
This might be a bit late, but I decided to make a package (ggExtra) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
Link to ggExtra package
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
The gridExtra package should work here. Start by making each of the ggplot objects:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
One addition, just to save some searching time for people doing this after us.
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
You can correct this by using some of these theme settings,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
I tried those options, but wasn't satisfied by the results or the messy code one would need to use to get there. Lucky me, Thomas Lin Pedersen just developed a package called patchwork, which gets the job done in a pretty elegant manner.
If you want to create a scatterplot with marginal histograms, first you'd have to create those three plots seperately.
library(ggplot2)
x <- rnorm(300)
y <- rt(300, df = 2)
xy <- data.frame(x, y)
plot1 <- ggplot(xy, aes(x = x, y = y)) +
geom_point()
dens1 <- ggplot(xy, aes(x = x)) +
geom_histogram(color = "black", fill = "white") +
theme_void()
dens2 <- ggplot(xy, aes(x = y)) +
geom_histogram(color = "black", fill = "white") +
theme_void() +
coord_flip()
The only thing left to do, is to add those plots with a simple + and specify the layout with the function plot_layout().
library(patchwork)
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(
ncol = 2,
nrow = 2,
widths = c(4, 1),
heights = c(1, 4)
)
The function plot_spacer() adds an empty plot to the top right corner. All the other arguments should be self-explanatory.
Since histograms heavily depend on the chosen binwidth, one might argue to prefer density plots. With some small modifications one would get e.g. for eye tracking data a beautiful plot.
library(ggpubr)
plot1 <- ggplot(df, aes(x = Density, y = Face_sum, color = Group)) +
geom_point(aes(color = Group), size = 3) +
geom_point(shape = 1, color = "black", size = 3) +
stat_smooth(method = "lm", fullrange = TRUE) +
geom_rug() +
scale_y_continuous(name = "Number of fixated faces",
limits = c(0, 205), expand = c(0, 0)) +
scale_x_continuous(name = "Population density (lg10)",
limits = c(1, 4), expand = c(0, 0)) +
theme_pubr() +
theme(legend.position = c(0.15, 0.9))
dens1 <- ggplot(df, aes(x = Density, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none")
dens2 <- ggplot(df, aes(x = Face_sum, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none") +
coord_flip()
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(ncol = 2, nrow = 2, widths = c(4, 1), heights = c(1, 4))
Though the data is not provided at this point, the underlying principles should be clear.
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
It works for both grouped and ungrouped data and accepts additional graphical parameters:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
Iris set marginal histograms scatterplot
You can easily create attractive scatterplots with marginal histograms using ggstatsplot (it will also fit and describe a model):
data(iris)
library(ggstatsplot)
ggscatterstats(
data = iris,
x = Sepal.Length,
y = Sepal.Width,
xlab = "Sepal Length",
ylab = "Sepal Width",
marginal = TRUE,
marginal.type = "histogram",
centrality.para = "mean",
margins = "both",
title = "Relationship between Sepal Length and Sepal Width",
messages = FALSE
)
Or slightly more appealing (by default) ggpubr:
devtools::install_github("kassambara/ggpubr")
library(ggpubr)
ggscatterhist(
iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", # comment out this and last line to remove the split by species
margin.plot = "histogram", # I'd suggest removing this line to get density plots
margin.params = list(fill = "Species", color = "black", size = 0.2)
)
UPDATE:
As suggested by #aickley I used the developmental version to create the plot.
To build on the answer by #alf-pascu, setting up each plot manually and arranging them with cowplot grants a lot of flexibility with respect to both the main and the marginal plots (compared to some of the other solutions). Distributions by groups is one example. Changing the main plot to a 2D-density plot is another.
The following creates a scatterplot with (properly aligned) marginal histograms.
library("ggplot2")
library("cowplot")
# Set up scatterplot
scatterplot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 3, alpha = 0.6) +
guides(color = FALSE) +
theme(plot.margin = margin())
# Define marginal histogram
marginal_distribution <- function(x, var, group) {
ggplot(x, aes_string(x = var, fill = group)) +
geom_histogram(bins = 30, alpha = 0.4, position = "identity") +
# geom_density(alpha = 0.4, size = 0.1) +
guides(fill = FALSE) +
theme_void() +
theme(plot.margin = margin())
}
# Set up marginal histograms
x_hist <- marginal_distribution(iris, "Sepal.Length", "Species")
y_hist <- marginal_distribution(iris, "Sepal.Width", "Species") +
coord_flip()
# Align histograms with scatterplot
aligned_x_hist <- align_plots(x_hist, scatterplot, align = "v")[[1]]
aligned_y_hist <- align_plots(y_hist, scatterplot, align = "h")[[1]]
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, scatterplot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
To plot a 2D-density plot instead, just change the main plot.
# Set up 2D-density plot
contour_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
stat_density_2d(aes(alpha = ..piece..)) +
guides(color = FALSE, alpha = FALSE) +
theme(plot.margin = margin())
# Arrange plots
plot_grid(
aligned_x_hist
, NULL
, contour_plot
, aligned_y_hist
, ncol = 2
, nrow = 2
, rel_heights = c(0.2, 1)
, rel_widths = c(1, 0.2)
)
This is an old question, but I thought it would be useful to post an update here since I've come across this same problem recently (thanks to Stefanie Mueller for the help!).
The most upvoted answer using gridExtra works, but aligning axes is difficult/hacky, as has been pointed out in the comments. This can now be solved using the command ggMarginal from the ggExtra package, as such:
#load packages
library(tidyverse) #for creating dummy dataset only
library(ggExtra)
#create dummy data
a = round(rnorm(1000,mean=10,sd=6),digits=0)
b = runif(1000,min=1.0,max=1.6)*a
b = b+runif(1000,min=9,max=15)
DummyData <- data.frame(var1 = b, var2 = a) %>%
filter(var1 > 0 & var2 > 0)
#plot
p = ggplot(DummyData, aes(var1, var2)) + geom_point(alpha=0.3)
ggMarginal(p, type = "histogram")
Another solution using ggpubr and cowplot, but here we create plots using cowplot::axis_canvas and add them to original plot with cowplot::insert_xaxis_grob:
library(cowplot)
library(ggpubr)
# Create main plot
plot_main <- ggplot(faithful, aes(eruptions, waiting)) +
geom_point()
# Create marginal plots
# Use geom_density/histogram for whatever you plotted on x/y axis
plot_x <- axis_canvas(plot_main, axis = "x") +
geom_density(aes(eruptions), faithful)
plot_y <- axis_canvas(plot_main, axis = "y", coord_flip = TRUE) +
geom_density(aes(waiting), faithful) +
coord_flip()
# Combine all plots into one
plot_final <- insert_xaxis_grob(plot_main, plot_x, position = "top")
plot_final <- insert_yaxis_grob(plot_final, plot_y, position = "right")
ggdraw(plot_final)
Nowadays, there is at least one CRAN package that makes the scatterplot with its marginal histograms.
library(psych)
scatterHist(rnorm(1000), runif(1000))
You can use the interactive form of ggExtra::ggMarginalGadget(yourplot) and choose between boxplots, violin plots, density plots and histograms whit easy.
like that