How to add text to a specific/fixed location in rasterVis levelplot - r

In fact, this question is consist of two questions targeting the same behaviour.
How can I add text (varies by each panel) to a fixed location in
panel area? I'm aware of panel.text and latticeExtra::layer
solution but it adds text using plotting area coordinates. For
instance, I want to add text to bottom-right corner of each panel
even if their scales are different.
How to add text out of levelplot panel area(s)? Method explained
here requires that levelplot has a plot_01.legend.top.vp area
to add text which I don't have and the trellis object was plotted
before. Besides, I want to add text to left of ylab shown in the
figure below. I used ylab here to state the meaning of rows but I
need a second ylab that represents y-axis values. I found another
question for this problem but It does not work.
The plot above is created by raster::stack object and a rasterVis::levelplot method. I consent to a dirty solution even if I prefer an elegant one. Also despite the question above, I'm open to other approaches that use levelplot.

A very similar issue is currently being discussed on R-sig-Geo, just have a look at the solution I provided there. Here is the corresponding sample code which lets you add custom text annotations inside or outside the panel regions of a trellis graph using trellis.focus(..., clip.off = TRUE) from lattice.
library(rasterVis)
library(grid)
## sample data
f <- system.file("external/test.grd", package="raster")
r <- raster(f)
s <- stack(r, r+500, r-500, r+200)
p <- levelplot(s, layout = c(2, 2), names.att = rep("", 4),
scales = list(y = list(rot = 90)))
## labels
cls <- c("col1", "col2")
rws <- c("row1", "row2")
png("~/rasterVis.png", width = 14, height = 16, units = "cm", res = 300L)
grid.newpage()
print(p, newpage = FALSE)
## loop over panels to be labelled (ie 1:3)
panels = trellis.currentLayout()
for (i in 1:3) {
# focus on current panel of interest and disable clipping
ids <- which(panels == i, arr.ind = TRUE)
trellis.focus("panel", ids[2], ids[1], clip.off = TRUE)
# add labels
if (i %in% c(1, 3)) {
if (i == 1) {
grid.text(cls[1], x = .5, y = 1.1) # add 'col1'
grid.text(rws[1], x = -.35, y = .5, rot = 90) # add 'row1'
} else {
grid.text(rws[2], x = -.35, y = .5, rot = 90) # add 'row2'
}
} else {
grid.text(cls[2], x = .5, y = 1.1) # add 'col2'
}
trellis.unfocus()
}
dev.off()
You may find some further information here:
https://stat.ethz.ch/pipermail/r-help/2005-June/072745.html
http://r.789695.n4.nabble.com/How-to-put-text-outside-an-xyplot-td975850.html

Related

How do I add multiple subplots into a multirow figure in R?

i need to overlay multiple subplots onto a single plot which is already contained inside a multirow figure (see image)
the reason why i need subplots instead of screen layout is because the figure will be possibly multicolumn, also (a 5 by 3 plot, in total)
there are packages which assist in doing subplots, but they break when you use multirow figures, and sequential subplots, except the first one, are rendered next to the overall figure border, not relative to the current row/column plot borders
i understand large packages such as ggplot2 allow this relatively easily, but base R plots are highly preferable
UPD:
the minimum reproducible example depicting the problem is here:
require(Hmisc)
COL.1 <- c('red','orange','yellow'); COL.2 <- c('blue','green','turquoise')
SUBPLOT.FUN <- function(COL) {plot(rnorm(100), type='l', col=COL)}
PLOT.FUN <- function(i) {
plot(rnorm(100),ylim=c(-1,1))
subplot(SUBPLOT.FUN(COL.1[i]), 100,1, vadj=1,hadj=1,pars=list(mfg=c(1,i)))
subplot(SUBPLOT.FUN(COL.2[i]), 100,-1,vadj=0,hadj=1,pars=list(mfg=c(1,i)))
}
plot.new(); par(mfrow=c(1,3))
for (i in 1:3) {
PLOT.FUN(i)
}
which looks like that:
while what is required is shown on the first image (meaning, EACH of the three plots must contain 3 subplots in their respective locations (along the right border, arranged vertically))
N.B. either the figure is multirow or multicolumn (as depicted) does not matter
Something like this? Inspired in this R-bloggers post.
# reproducible test data
set.seed(2022)
x <- rnorm(1000)
y <- rbinom(1000, 1, 0.5)
z <- rbinom(1000, 4, 0.5)
# save default values and prepare
# to create custom plot areas
old_par <- par(fig = c(0,1,0,1))
# set x axis limits based on data
h <- hist(x, plot = FALSE)
xlim <- c(h$breaks[1] - 0.5, h$breaks[length(h$breaks)] + 2)
hist(x, xlim = xlim)
# x = c(0.6, 1) right part of plot
# y = c(0.5, 1) top part of plot
par(fig = c(0.6, 1, 0.5, 1), new = TRUE)
boxplot(x ~ y)
# x = c(0.6, 1) right part of plot
# y = c(0.1, 0.6) bottom part of plot
par(fig = c(0.6, 1, 0.1, 0.6), new = TRUE)
boxplot(x ~ z)
# put default values back
par(old_par)
Created on 2022-08-18 by the reprex package (v2.0.1)

Is it possible to draw the axis line first, before the data?

This is a follow up to my previous question where I was looking for a solution to get the axis drawn first, then the data. The answer works for that specific question and example, but it opened a more general question how to change the plotting order of the underlying grobs. First the axis, then the data.
Very much in the way that the panel grid grob can be drawn on top or not.
Panel grid and axis grobs are apparently generated differently - axes more as guide objects rather than "simple" grobs. (Axes are drawn with ggplot2:::draw_axis(), whereas the panel grid is built as part of the ggplot2:::Layout object).
I guess this is why axes are drawn on top, and I wondered if the drawing order can be changed.
# An example to play with
library(ggplot2)
df <- data.frame(var = "", val = 0)
ggplot(df) +
geom_point(aes(val, var), color = "red", size = 10) +
scale_x_continuous(
expand = c(0, 0),
limits = c(0,1)
) +
coord_cartesian(clip = "off") +
theme_classic()
A ggplot can be represented by its gtable. The position of the grobs are given by the layout element, and "the z-column is used to define the drawing order of the grobs".
The z value for the panel, which contains the points grob, can then be increased so that it is drawn last.
So if p is your plot then
g <- ggplotGrob(p) ;
g$layout[g$layout$name == "panel", "z"] <- max(g$layout$z) + 1L
grid::grid.draw(g)
However, as noted in the comment this changes how the axis look, which perhaps, is due to the panel being drawn over some of the axis.
But in new exciting news from dww
if we add theme(panel.background = element_rect(fill = NA)) to the plot, the axes are no longer partially obscured. This both proves that this is the cause of the thinner axis lines, and also provides a reasonable workaround, provided you don't need a colored panel background.
Since you are looking for a more "on the draw level" solution, then the place to start is to ask "how is the ggplot drawn in the first place?". The answer can be found in the print method for ggplot objects:
ggplot2:::print.ggplot
#> function (x, newpage = is.null(vp), vp = NULL, ...)
#> {
#> set_last_plot(x)
#> if (newpage)
#> grid.newpage()
#> grDevices::recordGraphics(requireNamespace("ggplot2",
#> quietly = TRUE), list(), getNamespace("ggplot2"))
#> data <- ggplot_build(x)
#> gtable <- ggplot_gtable(data)
#> if (is.null(vp)) {
#> grid.draw(gtable)
#> }
#> else {
#> if (is.character(vp))
#> seekViewport(vp)
#> else pushViewport(vp)
#> grid.draw(gtable)
#> upViewport()
#> }
#> invisible(x)
#> }
where you can see that a ggplot is actually drawn by calling ggplot_build on the ggplot object, then ggplot_gtable on the output of ggplot_build.
The difficulty is that the panel, with its background, gridlines and data is created as a distinct grob tree. This is then nested as a single entity inside the final grob table produced by ggplot_build. The axis lines are drawn "on top" of that panel. If you draw these lines first, part of their thickness will be over-drawn with the panel. As mentioned in user20650's answer, this is not a problem if you don't need your plot to have a background color.
To my knowledge, there is no native way to include the axis lines as part of the panel unless you add them yourself as grobs.
The following little suite of functions allows you to take a plot object, remove the axis lines from it and add axis lines into the panel:
get_axis_grobs <- function(p_table)
{
axes <- grep("axis", p_table$layout$name)
axes[sapply(p_table$grobs[axes], function(x) class(x)[1] == "absoluteGrob")]
}
remove_lines_from_axis <- function(axis_grob)
{
axis_grob$children[[grep("polyline", names(axis_grob$children))]] <- zeroGrob()
axis_grob
}
remove_all_axis_lines <- function(p_table)
{
axes <- get_axis_grobs(p_table)
for(i in axes) p_table$grobs[[i]] <- remove_lines_from_axis(p_table$grobs[[i]])
p_table
}
get_panel_grob <- function(p_table)
{
p_table$grobs[[grep("panel", p_table$layout$name)]]
}
add_axis_lines_to_panel <- function(panel)
{
old_order <- panel$childrenOrder
panel <- grid::addGrob(panel, grid::linesGrob(x = unit(c(0, 0), "npc")))
panel <- grid::addGrob(panel, grid::linesGrob(y = unit(c(0, 0), "npc")))
panel$childrenOrder <- c(old_order[1],
setdiff(panel$childrenOrder, old_order),
old_order[2:length(old_order)])
panel
}
These can all be co-ordinated into a single function now to make the whole process much easier:
underplot_axes <- function(p)
{
p_built <- ggplot_build(p)
p_table <- ggplot_gtable(p_built)
p_table <- remove_all_axis_lines(p_table)
p_table$grobs[[grep("panel", p_table$layout$name)]] <-
add_axis_lines_to_panel(get_panel_grob(p_table))
grid::grid.newpage()
grid::grid.draw(p_table)
invisible(p_table)
}
And now you can just call underplot_axes on a ggplot object. I have modified your example a little to create a gray background panel, so that we can see more clearly what's going on:
library(ggplot2)
df <- data.frame(var = "", val = 0)
p <- ggplot(df) +
geom_point(aes(val, var), color = "red", size = 10) +
scale_x_continuous(
expand = c(0, 0),
limits = c(0,1)
) +
coord_cartesian(clip = "off") +
theme_classic() +
theme(panel.background = element_rect(fill = "gray90"))
p
underplot_axes(p)
Created on 2021-05-07 by the reprex package (v0.3.0)
Now, you may consider this "creating fake axes", but I would consider it more as "moving" the axis lines from one place in the grob tree to another. It's a shame that the option doesn't seem to be built into ggplot, but I can also see that it would take a pretty major overhaul of how a ggplot is constructed to allow that option.
Here's a hack that doesn't require going "under the hood", but rather uses patchwork to add another layer on top that is just the geom layer.
a <- [your plot above]
library(patchwork)
a + inset_element(a + them_void(), left = 0, bottom = 0, right = 1, top = 1)

Histogram to decide whether two distributions have the same shape in R [duplicate]

I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).

xyplot bottom axis when last row has fewer panels than columns

Consider a lattice xyplot that has relation='fixed', alternating=FALSE, and as.table=TRUE.
If the last row of panels is incomplete (i.e. there are fewer panels than columns of the layout), the x-axis is not plotted. For example, panel 4 in the plot below does not have x-axis ticks/labels.
library(lattice)
d <- data.frame(x=runif(100), y=runif(100), grp=gl(5, 20))
xyplot(y~x|grp, d, as.table=TRUE, scales=list(alternating=FALSE, tck=c(1, 0)))
How can I add that axis?
Ideally I want axes only at bottom and left sides, and the incomplete row of panels at bottom (unlike when using as.table=FALSE, which plots the incomplete row at the top). For the example above, I'd like the axis plotted on the bottom border of panel 4, rather than in line with the x-axis of panel 5.
I know that this is easily solved with, e.g., a base graphics approach. I'm specifically interested in a lattice solution.
I am not a lattice expert, but I believe this might work. The idea was originally posted here. First I will regenerate the example:
library(lattice)
set.seed(1)
d <- data.frame(x=runif(100), y=runif(100), grp=gl(5, 20))
Next, lets define a function that will control the panel settings:
trellis.par.set(clip = list(panel = "off"))
myPan <- function(...){
panel.xyplot(...)
if(panel.number() == 4) {
at = seq(0,1,by = 0.2)
panel.axis("bottom", at = at, outside = T,
labels = T, half = F)
}
if(panel.number() == 5) {
at = seq(0,1,by = 0.2)
panel.axis("bottom",at = at, outside = T,
labels = T, half = F)
}
}
Now to the plot:
xyplot(y~x|grp, d, as.table=TRUE,
scales = list(
x = list(draw = F, relation="same"),
y = list(tck=c(1,0), alternating=F)),
layout = c(2,3),
panel = myPan)
As can be seen, in the xyplot command we asked not to draw the x axis (draw = F) but later panel calls myPan function. There we specifically demand to draw x-axis for panels 4 and 5.
output
Hope it can give you some direction for improvements.
Here's another approach based on code provided in a (now deleted) answer by #user20650. It uses grid directly, focussing on panels of the active trellis plot that are missing axes (or at least assumed to be missing axes), and adding them. We also assume that the x-scale is fixed.
The function (which also exists as a gist here):
add_axes <- function() {
library(grid)
library(lattice)
l <- trellis.currentLayout()
pan <- which(l[nrow(l), ]==0)
if(length(pan) > 0) {
g <- grid.ls(print=FALSE)
# use an existing panel as a template for ticks
ticks <- grid.get(g$name[grep("ticks.bottom.panel", g$name)][[1]])
# use an existing panel as a template for labels
labels <- grid.get(g$name[grep("ticklabels.bottom.panel", g$name)][[1]])
ax <- grobTree(ticks, labels)
invisible(sapply(pan, function(x) {
trellis.focus("panel", x, nrow(l)-1, clip.off=TRUE)
grid.draw(ax)
trellis.unfocus()
}))
}
}
An example:
library(lattice)
d <- data.frame(x=runif(100), y=runif(100), grp=gl(5, 20))
xyplot(y~x|grp, d, as.table=TRUE, scales=list(tck=c(1,0), alternating=FALSE),
layout=c(4, 2), xlim=c(-0.1, 1.1))
add_axes()

r program grouping 3 histograms into one grouped histogram [duplicate]

I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).

Resources