Related
I am analyzing difference scores with polynomial regression in R. Based on [Edwards and Parry's (1993)][1] recommendations I have been trying to combine a persp() plot with a contour() plot. I would also need to plot the first two principal axes on the contour plot. My attempts so far have only provided me with each individual plot, but I don't know how to combine them. An example for the end-result is :
Edwards & Parry (1993) example difference score visualisation
I manage to get the persp() plot just fine. I have also obtained the contour plot. I can't seem to find any way to combine the two. I have managed to make the plot in plotly using the add_surface() option in the pipeline. My problem with the output is that the surface is smooth, and the contourplot lacks the values in the plot. Basically: persp() and contour() are visualised in a way that is extremely similar to the look I'm aiming for, per the example in the source.
My current attempt (in minimalistic code) is as follows:
surface <- function(e, i){
y <- .2*e + .14*i + .08*e^2 + + .1*e*i + .2*i^2
}
e <- i <- seq(-3, 3, length= 20)
y <- outer(e, i, surface)
persp(e, i, y,
xlab = 'Explicit',
ylab = 'Implicit',
zlab = 'Depression',
theta = 45)
contour(e,i,y)
So basically my question is: how can I make a plot like Edwards and Parry (1993) make, with a similar visual style, in R. It does not have to be base-R, I'm happy with any method. I've been stuck on this problem for a week now.
My attempt in plotly (to compare it to my desired end-result) is:
if(!"plotly" %in% installed.packages){install.packages('plotly')}
library(plotly)
plot_ly(z = ~y) %>% add_surface(x = ~e, y= ~i, z= ~y,
contours = list(
z = list(
show=TRUE,
usecolormap=FALSE,
highlightcolor="#ff0000",
project=list(z=TRUE)
)
)
) %>%
layout(
scene=list(
xaxis = list(title = "Explicit"),
yaxis = list(title = "Implicit"),
zaxis = list(title = "Depression")
)
)
[1]: Edwards, J. R., & Parry, M. E. (1993). On the use of polynomial regression as an alternative to difference scores. Academy of Management Journal, 36(6), 1577–1613. https://doi.org/10.2307/256822
I have found an answer and I will share it here. It seems it cannot be done in base-R. But the RSM-package allows for the addition of contour lines to the base of the plot.
In this answer I will give a minimal example of:
the persp() plot
the contour lines in the base
addition of x=y and x=-y axis
calculation and addition of the first and second principal axis
The only thing I could not solve is that the lines now are drawn over the surface. I don't know how to solve it.
library(rsm)
x <- seq(-3,3,by=0.25)
y <- seq(-3,3,by=0.25)
d <- expand.grid(x=x,y=y)
z <- c(data=NA,1089)
b0 = .140; b1 = -.441; b2 = -.154; b3 = .161 ; b4 =-.106; b5 = .168
k=1
for (i in 1:25) {
for (j in 1:25) {
z[k]=b0+b1*x[i]+b2*y[j]+b3*x[i]*x[i]+b4*x[i]*y[j]+ b5*y[j]*y[j]
k=k+1
} }
data.lm <- lm(z~poly(x,y,degree=2),data=d)
res1 <- persp(data.lm,x~y,
zlim=c(-2,max(z)),
xlabs = c('X','Y'),
zlab = 'Z',
contour=list(z="bottom"),
theta=55,
phi=25)
# draw x=y line (lightly dotted)
xy_pos <- matrix(c(-3,-3,3,3),ncol=2,byrow = T)
lines(trans3d(xy_pos[,2], xy_pos[,1], z=-2, pmat = res1$`y ~ x`$transf),
lty = 3,
col = 'darkgrey')
# draw x=-y line (lightly dotted)
xy_neg <- matrix(c(-3,3,3,-3),ncol=2,byrow = T)
lines(trans3d(xy_neg[,2], xy_neg[,1], z=-2, pmat = res1$`y ~ x`$transf),
lty = 3,
col = 'darkgrey')
# Find stationary points:
X0 <- (b2*b4 - 2*b1*b5) / (4*b3*b5 - b4^2)
Y0 <- (b1*b4 - 2*b2*b3) / (4*b3*b5 - b4^2)
# First Principal Axis
p11 = (b5-b3+sqrt((b3-b5)^2+b4^2))/b4
p10 = Y0 - p11*X0
Ypaf1 = p10 + p11*x
# plot first principal axis (full line)
xypaf1 <- matrix(c(Ypaf1[1], -3, Ypaf1[25], 3),ncol=2, byrow=T)
lines(trans3d(xypaf1[,2], xypaf1[,1], z=-2, pmat = res1$`y ~ x`$transf),
lty = 1,
col = 'black')
# Second Principal Axis
p21 = (b5-b3-sqrt((b3-b5)^2+b4^2))/b4
p20 = Y0 - p21*X0
Ypaf2 = p20 + p21*x
# plot second principal axis (dashed line)
xypaf2 <- matrix(c(Ypaf2[1], -3, Ypaf2[25], 3),ncol=2, byrow=T)
lines(trans3d(xypaf2[,2], xypaf2[,1], z=-2, pmat = res1$`y ~ x`$transf),
lty = 2,
col = 'black')
So I have this code that produces the exact surface
f = function(x, y){
z = ((x^2)+(3*y^2))*exp(-(x^2)-(y^2))
}
plot3d(f, col = colorRampPalette(c("blue", "white")),
xlab = "X", ylab = "Y", zlab = "Z",
xlim = c(-3, 3), ylim = c(-3, 3),
aspect = c(1, 1, 0.5))
Giving the following plot:
Now I have some code that does a random walk metropolis algorithm to reproduce the above image. I think it works as if I do another plot of these calculated values I get the next image with 500 points. Here is the code
open3d()
plot3d(x0, y0, f(x0, y0), type = "p")
Which gives the following plot:
I know it's hard looking at this still image but being able to rotate the sampling is working.
Now here is my question: How can I use plot3d() so that I can have a surface that connects all these points and gives a more jagged representation of the exact plot? Or how can I have each point in the z axis as a bar from the xy plane? I just want something more 3 dimensional than points and I can't find how to do this.
Thanks for your help
You can do this by triangulating the surface. You don't give us your actual data, but I can create some similar data using
f = function(x, y){
z = ((x^2)+(3*y^2))*exp(-(x^2)-(y^2))
}
x <- runif(500, -3, 3)
y <- runif(500, -3, 3)
z <- f(x, y)
Then the plotting is done using the method in ?persp3d.deldir:
library(deldir)
library(rgl)
col <- colorRampPalette(c("blue", "white"))(20)[1 + round(19*(z - min(z))/diff(range(z)))]
dxyz <- deldir::deldir(x, y, z = z, suppressMsge = TRUE)
persp3d(dxyz, col = col, front = "lines", back = "lines")
This might need some cosmetic fixes, e.g.
aspect3d(2, 2, 1)
After some rotation, this gives me the following plot:
I'm not sure to understand what you want. If my understanding is correct, here is a solution. Define a parametric representation of your surface:
fx <- function(u,v) u
fy <- function(u,v) v
fz <- function(u,v){
((u^2)+(3*v^2))*exp(-(u^2)-(v^2))
}
Let's say you have these points:
x0 <- seq(-3, 3, length.out = 20)
y0 <- seq(-3, 3, length.out = 20)
Then you can use the function parametric3d of the misc3d package, with the option fill=FALSE to get a wireframe:
library(misc3d)
parametric3d(fx, fy, fz, u=x0, v=y0,
color="blue", fill = FALSE)
Is it what you want?
To get some vertical bars, use the function segments3d of rgl:
i <- 8
bar <- rbind(c(x0[i],y0[i],0),c(x0[i],y0[i],f(x0[i],y0[i])))
segments3d(bar, color="red")
Here is a plot with only 50 points using my original code.
When I then apply what was said by Stéphane Laurent I then get this plot which feels too accurate when given the actual points I have
Perhaps you need to explain to me what is actually happening in the function parametric3d
I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).
I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).
I'm trying to put multiple lattice plots in one window using levelplot by setting par(mfrow=c(2,1)) but it seems to be ignoring this.
Is there a particular function for setting multiple plots in lattice?
The 'lattice' package is built on the grid package and attaches its namespace when 'lattice' loaded. However, in order to use the grid.layout function, you need to explicitly load() pkg::grid. The other alternative, that is probably easier, is the grid.arrange function in pkg::gridExtra:
install.packages("gridExtra")
require(gridExtra) # also loads grid
require(lattice)
x <- seq(pi/4, 5 * pi, length.out = 100)
y <- seq(pi/4, 5 * pi, length.out = 100)
r <- as.vector(sqrt(outer(x^2, y^2, "+")))
grid <- expand.grid(x=x, y=y)
grid$z <- cos(r^2) * exp(-r/(pi^3))
plot1 <- levelplot(z~x*y, grid, cuts = 50, scales=list(log="e"), xlab="",
ylab="", main="Weird Function", sub="with log scales",
colorkey = FALSE, region = TRUE)
plot2 <- levelplot(z~x*y, grid, cuts = 50, scales=list(log="e"), xlab="",
ylab="", main="Weird Function", sub="with log scales",
colorkey = FALSE, region = TRUE)
grid.arrange(plot1,plot2, ncol=2)
The Lattice Package often (but not always) ignores the par command, so i just avoid using it when plotting w/ Lattice.
To place multiple lattice plots on a single page:
create (but don't plot) the lattice/trellis plot objects, then
call print once for each plot
for each print call, pass in arguments for (i) the plot; (ii)
more, set to TRUE, and which is only passed in for the initial call to print, and (iii) pos, which gives the position of each plot on the page specified as x-y coordinate pairs for the plot's lower left-hand corner and upper right-hand
corner, respectively--ie, a vector with four numbers.
much easier to show than to tell:
data(AirPassengers) # a dataset supplied with base R
AP = AirPassengers # re-bind to save some typing
# split the AP data set into two pieces
# so that we have unique data for each of the two plots
w1 = window(AP, start=c(1949, 1), end=c(1952, 1))
w2 = window(AP, start=c(1952, 1), end=c(1960, 12))
px1 = xyplot(w1)
px2 = xyplot(w2)
# arrange the two plots vertically
print(px1, position=c(0, .6, 1, 1), more=TRUE)
print(px2, position=c(0, 0, 1, .4))
This is simple to do once you read ?print.trellis. Of particular interest is the split parameter. It may seem complicated at first sight, but it's quite straightforward once you understand what it means. From the documentation:
split: a vector of 4 integers, c(x,y,nx,ny), that says to position the current plot at the x,y position in a regular array of nx by ny plots. (Note: this has origin at top left)
You can see a couple of implementations on example(print.trellis), but here's one that I prefer:
library(lattice)
# Data
w <- as.matrix(dist(Loblolly))
x <- as.matrix(dist(HairEyeColor))
y <- as.matrix(dist(rock))
z <- as.matrix(dist(women))
# Plot assignments
pw <- levelplot(w, scales = list(draw = FALSE)) # "scales..." removes axes
px <- levelplot(x, scales = list(draw = FALSE))
py <- levelplot(y, scales = list(draw = FALSE))
pz <- levelplot(z, scales = list(draw = FALSE))
# Plot prints
print(pw, split = c(1, 1, 2, 2), more = TRUE)
print(px, split = c(2, 1, 2, 2), more = TRUE)
print(py, split = c(1, 2, 2, 2), more = TRUE)
print(pz, split = c(2, 2, 2, 2), more = FALSE) # more = FALSE is redundant
The code above gives you this figure:
As you can see, split takes four parameters. The last two refer to the size of your frame (similar to what mfrow does), whereas the first two parameters position your plot into the nx by ny frame.