R Two graphs with lines going from one to the other - r

I want to plot some point in a normal graph and link those points to a map displayed under it. What I would like to have basically is that (here I added manually the links):
Somehow I should use segments with pdt=T to write outside the margins, but I am not sure what mathematical transformation I need to do in order to set the right coordinates for the segment extremity that go into the map.
And I would prefere to use the traditional plot function and not ggplot2
Here the source used to draw the exemple (warning it may take time to load the open street map):
library(OpenStreetMap)
#Random point to plot in the graph
fdata=cbind.data.frame(runif(12),runif(12),c(rep("A",4),rep("B",4),rep("C",4)))
colnames(fdata)=c("x","y","city")
#random coordinate to plot in the map
cities=cbind.data.frame(runif(3,4.8,5),runif(3,50.95,51),c("A","B","C"))
colnames(cities)=c("long","lat","name")
#city to color correspondance
color=1:length(cities$name)
names(color)=cities$name
maxlat=max(cities$lat)
maxlong=max(cities$long)
minlat=min(cities$lat)
minlong=min(cities$long)
#get some open street map
map = openmap(c(lat=maxlat+0.02,long=minlong-0.04 ) ,
c(lat=minlat-0.02,long=maxlong+.04) ,
minNumTiles=9,type="osm")
longlat=openproj(map) #Change coordinate projection
par(mfrow=c(2,1),mar=c(0,5,4,6))
plot( fdata$y ~ fdata$x ,xaxt="n",ylab="Comp.2",xlab="",col=color[fdata$city],pch=20)
axis(3)
mtext(side=3,"-Comp.1",line=3)
par(mar=rep(1,4))
#plot the map
plot(longlat,removeMargin=F)
points(cities$lat ~ cities$long, col= color[cities$name],cex=1,pch=20)
text(cities$long,cities$lat-0.005,labels=cities$name)

The grid graphical system (which underlies both the lattice and ggplot2 graphics packages) is much better suited to this sort of operation than is R's base graphical system. Unfortunately, both of your plots use the base graphical system. Fortunately, though, the superb gridBase package supplies functions that allow one to translate between the two systems.
In the following (which starts with your call to par(mfrow=c(2,1),...)), I've marked the lines I added with comments indicating that they are My addition. For another, somewhat simpler example of this strategy in action, see here.
library(grid) ## <-- My addition
library(gridBase) ## <-- My addition
par(mfrow=c(2,1),mar=c(0,5,4,6))
plot(fdata$y ~ fdata$x, xaxt = "n", ylab = "Comp.2", xlab = "",
col = color[fdata$city],pch=20)
vps1 <- do.call(vpStack, baseViewports()) ## <-- My addition
axis(3)
mtext(side = 3,"-Comp.1",line=3)
par(mar = rep(1,4))
#plot the map
plot(longlat,removeMargin=F)
vps2 <- do.call(vpStack, baseViewports()) ## <-- My addition
points(cities$lat ~ cities$long, col= color[cities$name],cex=1,pch=20)
text(cities$long,cities$lat-0.005,labels=cities$name)
## My addition from here on out...
## A function that draws a line segment between two points (each a
## length two vector of x-y coordinates), the first point in the top
## plot and the second in the bottom plot.
drawBetween <- function(ptA, ptB, gp = gpar()) {
## Find coordinates of ptA in "Normalized Parent Coordinates"
pushViewport(vps1)
X1 <- convertX(unit(ptA[1],"native"), "npc")
Y1 <- convertY(unit(ptA[2],"native"), "npc")
popViewport(3)
## Find coordinates of ptB in "Normalized Parent Coordinates"
pushViewport(vps2)
X2 <- convertX(unit(ptB[1],"native"), "npc")
Y2 <- convertY(unit(ptB[2],"native"), "npc")
popViewport(3)
## Plot line between the two points
grid.move.to(x = X1, y = Y1, vp = vps1)
grid.line.to(x = X2, y = Y2, vp = vps2, gp = gp)
}
## Try the function out on one pair of points
ptA <- fdata[1, c("x", "y")]
ptB <- cities[1, c("long", "lat")]
drawBetween(ptA, ptB, gp = gpar(col = "gold"))
## Using a loop, draw lines from each point in `fdata` to its
## corresponding city in `cities`
for(i in seq_len(nrow(fdata))) {
ptA <- fdata[i, c("x", "y")]
ptB <- cities[match(fdata[i,"city"], cities$name), c("long", "lat")]
drawBetween(ptA, ptB, gp = gpar(col = color[fdata[i,"city"]]))
}

You can create a new plot area over your plots and then add the lines:
#New plot area
par(new=T, mfrow = c(1,1))
plot(0:1, type = "n", xaxt='n', ann=FALSE, axes=FALSE, frame.plot=TRUE, bty="n")
The problem of this is that you need do the mapping between yours plot and the new plot area, if you ever use the same area you can get some references (see locator) and then interpolate all the other point.
For example, in mi plot B is {1.751671, 0.1046729} and 8th point is {1.320507, 0.6892523}:
points(c(1.320507, 1.751671), c(0.6892523, 0.1046729), col = "red", type = "l")
UPDATE (Plots mapping):
X11(7, 7)
par(mfrow=c(2,1),mar=c(0,5,4,6))
plot( fdata$y ~ fdata$x ,xaxt="n",ylab="Comp.2",xlab="",col=color[fdata$city],pch=20)
axis(3)
mtext(side=3,"-Comp.1",line=3)
usr1 <- par("usr")
#plot the map
par(mar=rep(1,4))
plot(longlat,removeMargin=F)
points(cities$lat ~ cities$long, col= color[cities$name],cex=1,pch=20)
text(cities$long,cities$lat-0.005,labels=cities$name)
usr2 <- par("usr")
par(new=T, mfrow = c(1,1))
plot(0:1, type = "n", xaxt='n', ann=FALSE, axes=FALSE, frame.plot=TRUE, bty="n")
# Position of the corners (0, 0) and (1, 1) of the two graphs in the window X11(7, 7)
#ref <- locator()
ref <- list(x = c(1.09261365729382, 1.8750001444129, 1.06363637999312, 1.93636379046146),
y = c(0.501704460496285, 0.941477257177598,
-0.0335228967050026, 0.45909081740701))
fdata$x_map <- approxfun(usr1[1:2], ref$x[1:2])(fdata$x)
fdata$y_map <- approxfun(usr1[3:4], ref$y[1:2])(fdata$y)
points(fdata$y_map ~ fdata$x_map ,pch=6)
Keep in mind that the interpolation of the map must consider the projection, the linear projection can only be used with UTM coordinates.

Related

Histogram to decide whether two distributions have the same shape in R [duplicate]

I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).

How to define color of intersection in a Venn diagram?

I found many resources on how to draw Venn diagrams in R. Stack Overflow has a lot of them. However, I still can't draw my diagrams the way I want. Take the following code as an example:
library("VennDiagram")
A <- 1:4
B <- 3:6
d <- list(A, B)
vp <- venn.diagram(d, fill = c("white", "white"), alpha = 1, filename = NULL,
category.names=c("A", "B"))
grid.draw(vp)
I want the intersection between the sets to be red. However, if I change any of the white colors to red, I get the following:
vp_red <- venn.diagram(d, fill = c("red", "white"), alpha = 1, filename = NULL,
category.names=c("A", "B"))
grid.draw(vp_red)
That's not quite what I want. I want only the intersection to be red. If I change the alpha, this is what I get:
vp_alpha <- venn.diagram(d, fill = c("red", "white"), alpha = 0.5, filename = NULL,
category.names=c("A", "B"))
grid.draw(vp_alpha)
Now I have pink in my intersection. This is not what I want as well. What I want is something like this image from Wikipedia:
How can I do this? Maybe VennDiagram package can't do it and I need some other package, but I've been testing different ways to do it, and I'm not being able to find a solution.
I will show two different possibilities. In the first example, polyclip::polyclip is used to get the intersection. In the second example, circles are converted to sp::SpatialPolygons and we get the intersection using rgeos::gIntersection. Then we re-plot the circles and fill the intersecting area.
The resulting object when using venn.diagram is
"of class gList containing the grid objects that make up the diagram"
Thus, in both cases we can grab relevant data from "vp". First, check the structure and list the grobs of the object:
str(vp)
grid.ls()
# GRID.polygon.234
# GRID.polygon.235
# GRID.polygon.236 <~~ these are the empty circles
# GRID.polygon.237 <~~ $ col : chr "black"; $ fill: chr "transparent"
# GRID.text.238 <~~ labels
# GRID.text.239
# GRID.text.240
# GRID.text.241
# GRID.text.242
1. polyclip
Grab x- and y-values, and put them in the format required for polyclip:
A <- list(list(x = as.vector(vp[[3]][[1]]), y = as.vector(vp[[3]][[2]])))
B <- list(list(x = as.vector(vp[[4]][[1]]), y = as.vector(vp[[4]][[2]])))
Find intersection:
library(polyclip)
AintB <- polyclip(A, B)
Grab labels:
ix <- sapply(vp, function(x) grepl("text", x$name, fixed = TRUE))
labs <- do.call(rbind.data.frame, lapply(vp[ix], `[`, c("x", "y", "label")))
Plot it!
plot(c(0, 1), c(0, 1), type = "n", axes = FALSE, xlab = "", ylab = "")
polygon(A[[1]])
polygon(B[[1]])
polygon(AintB[[1]], col = "red")
text(x = labs$x, y = labs$y, labels = labs$label)
2. SpatialPolygons and gIntersection
Grab the coordinates of the circles:
# grab x- and y-values from first circle
x1 <- vp[[3]][["x"]]
y1 <- vp[[3]][["y"]]
# grab x- and y-values from second circle
x2 <- vp[[4]][["x"]]
y2 <- vp[[4]][["y"]]
Convert points to SpatialPolygons and find their intersection:
library(sp)
library(rgeos)
p1 <- SpatialPolygons(list(Polygons(list(Polygon(cbind(x1, y1))), ID = 1)))
p2 <- SpatialPolygons(list(Polygons(list(Polygon(cbind(x2, y2))), ID = 2)))
ip <- gIntersection(p1, p2)
Plot it!
# plot circles
plot(p1, xlim = range(c(x1, x2)), ylim = range(c(y1, y2)))
plot(p2, add = TRUE)
# plot intersection
plot(ip, add = TRUE, col = "red")
# add labels (see above)
text(x = labs$x, y = labs$y, labels = labs$label)
I'm quite sure you could work directly on the grobs using clipping functions in grid or gridSVG package.
It's very easy in eulerr R package
library(eulerr)
plot(euler(c("A"=5,"B"=4,"A&B"=2)),quantities = TRUE,fills=c("white","white","red"))
euler set colours

r program grouping 3 histograms into one grouped histogram [duplicate]

I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).

Axes at minimum extent, no padding, in plots of raster* objects

Is there a way to ensure that the box around a plot matches the raster extents exactly? In the following there is a gap above and below or to the left and right of the raster depending on the device proportions:
require(raster)
r = raster()
r[]= 1
plot(r, xlim=c(xmin(r), xmax(r)), ylim=c(ymin(r), ymax(r)))
One element of the problem with raster objects is that asp=1 to ensure proper display. The following basic scatterplot has the same issue when asp=1:
plot(c(1:10), c(1:10), asp=1)
Try vectorplot(r) from the rasterVis package to see what I want the axes to look like.
EDIT:
Solutions need to play nice with SpatialPoints overlays, not showing points outside the specified raster limits:
require(raster)
require(maptools)
# Raster
r = raster()
r[]= 1
# Spatial points
x = c(-100, 0, 100)
y = c(100, 0, 100)
points = SpatialPoints(data.frame(x,y))
plot(r, xlim=c(xmin(r), xmax(r)), ylim=c(ymin(r), ymax(r)))
plot(points, add=T)
You'd probably do best to go with one of the lattice-based functions for plotting spatial raster objects provided by the raster and rasterVis packages. You discovered one of them in vectorplot(), but spplot() or levelplot() better match your needs in this case.
(The base graphics-based plot() method for "RasterLayer" objects just doesn't allow any easy way for you to set axes with the appropriate aspect ratio. For anyone interested, I go into more detail about why that's so in a section at the bottom of the post.)
As an example of the kind of plot that levelplot() produces:
require(raster)
require(rasterVis)
## Create a raster and a SpatialPoints object.
r <- raster()
r[] <- 1:ncell(r)
SP <- spsample(Spatial(bbox=bbox(r)), 10, type="random")
## Then plot them
levelplot(r, col.regions = rev(terrain.colors(255)), cuts=254, margin=FALSE) +
layer(sp.points(SP, col = "red"))
## Or use this, which produces the same plot.
# spplot(r, scales = list(draw=TRUE),
# col.regions = rev(terrain.colors(255)), cuts=254) +
# layer(sp.points(SP, col = "red"))
Either of these methods may still plot some portion of the symbol representing points that fall just outside of the plotted raster. If you want to avoid that possibility, you can just subset your SpatialPoints object to remove any points falling outside of the raster. Here's a simple function that'll do that for you:
## A function to test whether points fall within a raster's extent
inExtent <- function(SP_obj, r_obj) {
crds <- SP_obj#coord
ext <- extent(r_obj)
crds[,1] >= ext#xmin & crds[,1] <= ext#xmax &
crds[,2] >= ext#ymin & crds[,2] <= ext#ymax
}
## Remove any points in SP that don't fall within the extent of the raster 'r'
SP <- SP[inExtent(SP, r), ]
Additional weedy detail about why it's hard to make plot(r) produce snugly fitting axes
When plot is called on an object of type raster, the raster data is (ultimately) plotted using either rasterImage() or image(). Which path is followed depends on: (a) the type of device being plotted to; and (b) the value of the useRaster argument in the original plot() call.
In either case, the plotting region is set up in a way which produces axes that fill the plotting region, rather than in a way that gives them the appropriate aspect ratio.
Below, I show the chain of functions that's called on the way to this step, as well as the call that ultimately sets up the plotting region. In both cases, there appears to be no simple way to alter both the extent and the aspect ratio of the axes that are plotted.
useRaster=TRUE
## Chain of functions dispatched by `plot(r, useRaster=TRUE)`
getMethod("plot", c("RasterLayer", "missing"))
raster:::.plotraster2
raster:::.rasterImagePlot
## Call within .rasterImagePlot() that sets up the plotting region
plot(NA, NA, xlim = e[1:2], ylim = e[3:4], type = "n",
, xaxs = "i", yaxs = "i", asp = asp, ...)
## Example showing why the above call produces the 'wrong' y-axis limits
plot(c(-180,180), c(-90,90),
xlim = c(-180,180), ylim = c(-90,90), pch = 16,
asp = 1,
main = "plot(r, useRaster=TRUE) -> \nincorrect y-axis limits")
useRaster=FALSE
## Chain of functions dispatched by `plot(r, useRaster=FALSE)`
getMethod("plot", c("RasterLayer", "missing"))
raster:::.plotraster2
raster:::.imageplot
image.default
## Call within image.default() that sets up the plotting region
plot(NA, NA, xlim = xlim, ylim = ylim, type = "n", xaxs = xaxs,
yaxs = yaxs, xlab = xlab, ylab = ylab, ...)
## Example showing that the above call produces the wrong aspect ratio
plot(c(-180,180), c(-90,90),
xlim = c(-180,180), ylim = c(-90,90), pch = 16,
main = "plot(r,useRaster=FALSE) -> \nincorrect aspect ratio")
Man, I got stumped and ended up just turning the foreground color off to plot. Then you can take advantage of the fact that the raster plot method calls fields:::image.plot, which lets you just plot the legend (a second time, this time showing the ink!). This is inelegant, but worked in this case:
par("fg" = NA)
plot(r, xlim = c(xmin(r), xmax(r)), ylim = c(ymin(r), ymax(r)), axes = FALSE)
par(new = TRUE,"fg" = "black")
plot(r, xlim = c(xmin(r), xmax(r)), ylim = c(ymin(r), ymax(r)), axes = FALSE, legend.only = TRUE)
axis(1, pos = -90, xpd = TRUE)
rect(-180,-90,180,90,xpd = TRUE)
ticks <- (ymin(r):ymax(r))[(ymin(r):ymax(r)) %% 20 == 0]
segments(xmin(r),ticks,xmin(r)-5,ticks, xpd = TRUE)
text(xmin(r),ticks,ticks,xpd=TRUE,pos=2)
title("sorry, this could probably be done in some more elegant way")
This is way I solved this problem
require(raster)
r = raster()
# default for raster is 180 row by 360 cols = 64800 cells
# fill with some values to make more interesting
r[]= runif(64800, 1, 1000)
# Set margin for text
par(mar=c(2, 6, 6, 2))
# Set some controls for the raster cell colours and legend
MyBrks<-c(0,1,4,16,64,256,1E20)
MyLbls<-c("<1","<4","<16","<64","<256","<Max")
MyClrs<-c("blue","cyan","yellow","pink","purple","red")
# Plot raster without axes or box or legend
# Note xlim and ylim don't seem do much unless you want to trim x and y
plot(r,
col=MyClrs,
axes=FALSE,
box=FALSE,
legend=FALSE
)
# Set up the ranges and intervals for axes - you can get the min max
# using xmin(r) and ymax(r) and so on if you like
MyXFrm <- -180
MyXTo <- 180
MyXStp <- 60
MyYFrm <- -90
MyYTo <- 90
MyYStp <- 30
# Plot the axes
axis(1,tick=TRUE,pos=ymin(r),las=1,at=seq(MyXFrm,MyXTo ,MyXStp ))
axis(2,tick=TRUE,pos=xmin(r),las=1,at=seq(MyYFrm ,MyYTo ,MyYStp ))
# Plot the legend use xpd to plot the legend outside the plot region
par(xpd=TRUE)
legend(MyXTo ,MyYTo ,
legend=MyLbls[1:6],
col= MyClrs,
fill=Clrs[1:6],
bg=rgb(0,0,0,0.85),
cex=0.9,
text.col="white",
text.font=2,
border=NA
)
# Add some axis labels and a title
text(-220,0,"Y",font=2)
text(0,-130,"X",font=2)
text(0,120,"My Raster",font=4,cex=1.5)
I think the best (or simplest) solution is to use image():
library(raster)
# Raster
r = raster()
r[]= rnorm(ncell(r))
# Spatial points
x = c(-100, 0, 100)
y = c(100, 0, 100)
points = SpatialPoints(data.frame(x,y))
# plot
image(r)
plot(points, add=T, pch=16, cex=2)

R: change background color of plot for specific area only (based on x-values)

How do I change the background color for a plot, only for a specific area?
For example, from x=2 to x=4?
Bonus question: is it also possible for a combination of x and y coordinates? (for example from (1,2) to (3,4))?
Many thanks!
This can be achieved by thinking about the plot somewhat differently to your description. Basically, you want to draw a coloured rectangle between the desired positions on the x-axis, filling the entire y-axis limit range. This can be achieved using rect(), and note how, in the example below, I grab the user (usr) coordinates of the current plot to give me the limits on the y-axis and that we draw beyond these limits to ensure the full range is covered in the plot.
plot(1:10, 1:10, type = "n", axes = FALSE) ## no axes
lim <- par("usr")
rect(2, lim[3]-1, 4, lim[4]+1, border = "red", col = "red")
axis(1) ## add axes back
axis(2)
box() ## and the plot frame
rect() can draw a sequence of rectangles if we provide a vector of coordinates, and it can easily handle the case for the arbitrary x,y coordinates of your bonus, but for the latter it is easier to avoid mistakes if you start with a vector of X coordinates and another for the Y coordinates as below:
X <- c(1,3)
Y <- c(2,4)
plot(1:10, 1:10, type = "n", axes = FALSE) ## no axes
lim <- par("usr")
rect(X[1], Y[1], X[2], Y[2], border = "red", col = "red")
axis(1) ## add axes back
axis(2)
box() ## and the plot frame
You could just as easily have the data as you have it in the bonus:
botleft <- c(1,2)
topright <- c(3,4)
plot(1:10, 1:10, type = "n", axes = FALSE) ## no axes
lim <- par("usr")
rect(botleft[1], botleft[2], topright[1], topright[2], border = "red",
col = "red")
axis(1) ## add axes back
axis(2)
box() ## and the plot frame

Resources