Related
I have generated 3 bar plots using barplot() function. Now I need to combine these 3 plots in a single column and get another new plot. I have used cowplot to do so however it showed warning message
In as_grob.default(plot) :Cannot convert object of class matrixarray
into a grob.
I know it is easier with ggplot. But I find it hard to write this code in ggplot. Can someone please give me a solution? I am not an expert but I tried my best but could not find a solution. My code:
k <- readr::read.csv("maxcor_r_p.csv", TRUE, ",")
cols <- c("azure3", "#003f5c")[(k$p < 0.05) + 1]
maxi <- barplot(
k$r,
names.arg = k$parameter,
ylab = "Correlation coefficient",
col = cols,
main = expression("T"[max]),
las = 2
)
l <- readr::read.csv("meancor_r_p.csv", TRUE, ",")
cols <- c("azure3", "#27e52a")[(l$p < 0.05) + 1]
meany <- barplot(
l$r,
names.arg = l$parameter,
ylab = "Correlation coefficient",
col = cols,
main = expression("T"[mean]),
las = 2
)
m <- readr::read.csv("precipcor_r_p.csv", TRUE, ",")
cols <- c("azure3", "#27bac6")[(m$p < 0.05) + 1]
preci <- barplot(
m$r,
names.arg = m$parameter,
ylab = "Correlation coefficient",
col = cols,
main = expression("Precipitation"),
las = 2
)
cowplot::plot_grid(
maxi, meany, preci,
ncol = 1, align = "v", axis = 1
)
screenshot of my 1 csv file
The reason that cowplot is giving you the conversion error is because it is expecting a ggplot object, so you'd have to rewrite your code in ggplot if you wanted to use cowplot.
You should be able to combine the plots you've created by using the function par(mfrow = c(A, B). par() is a function for setting graphical parameters, and mfrow is a vector, with the first argument (A) referring to the number of rows you want in the graphic you are creating and the second argument (B) referring to the number of columns you want.
If you want the plots to be displayed in a single column, you could do the following:
# specify a graphic with three rows and one column
par(mfrow = c(3, 1))
# first plot
maxi <-
barplot(
k$r,
names.arg = k$parameter,
ylab = "Correlation coefficient",
col = cols,
main = expression("T"[max]),
las = 2
)
# second plot
meany <-
barplot(
l$r,
names.arg = l$parameter,
ylab = "Correlation coefficient",
col = cols,
main = expression("T"[mean]),
las = 2
)
# third plot
preci <-
barplot(
m$r,
names.arg = m$parameter,
ylab = "Correlation coefficient",
col = cols,
main = expression("Precipitation"),
las = 2
)
# reset parameters to default
dev.off()
If you wanted them to be displayed in a single row instead, you would just change your par() function to:
# specify a graphic with one row and three columns
par(mfrow = c(1, 3))
I am trying to create a data table whose cells are different colors based on the value in the cell. I can achieve this with the function addtable2plot from the plotrix package. The addtable2plot function lays a table on an already existing plot. The problem with that solution is that I don't want a plot, just the table.
I've also looked at the heatmap functions. The problem there is that some of the values in my table are character, and the heatmap functions, from what I can tell, only accept numeric matrices. Also, I want my column names to be at the top of the table, not the bottom, and that doesn't seem to be an option.
Here's the example code for addtable2plot. If I could get just the table, filling the whole screen, that would be great.
library(plotrix)
testdf<-data.frame(Before=c(10,7,5,9),During=c(8,6,2,5),After=c(5,3,4,3))
rownames(testdf)<-c("Red","Green","Blue","Lightblue")
barp(testdf,main="Test addtable2plot",ylab="Value",
names.arg=colnames(testdf),col=2:5)
# show most of the options including the christmas tree colors
abg<-matrix(c(2,3,5,6,7,8),nrow=4,ncol=3)
addtable2plot(2,8,testdf,bty="o",display.rownames=TRUE,hlines=TRUE,
vlines=TRUE,title="The table",bg=abg)
Any help would be greatly appreciated.
A heatmap alternative:
library(gplots)
# need data as matrix
mm <- as.matrix(testdf, ncol = 3)
heatmap.2(x = mm, Rowv = FALSE, Colv = FALSE, dendrogram = "none",
cellnote = mm, notecol = "black", notecex = 2,
trace = "none", key = FALSE, margins = c(7, 11))
In heatmap.2 the side of the plot the axis is to be drawn on is hard-coded. But if you type "heatmap.2" at the console and copy the output to an editor, you can search for axis(1, where the 1 is the side argument (two hits). You can then change from a 1 (axis below plot) to a 3 (axis above the plot). Assign the updated function to a new name, e.g. heatmap.3, and run it as above.
An addtable2plot alternative
library(plotrix)
# while plotrix is loaded anyway:
# set colors with color.scale
# need data as matrix*
mm <- as.matrix(testdf, ncol = 3)
cols <- color.scale(mm, extremes = c("red", "yellow"))
par(mar = c(0.5, 1, 2, 0.5))
# create empty plot
plot(1:10, axes = FALSE, xlab = "", ylab = "", type = "n")
# add table
addtable2plot(x = 1, y = 1, table = testdf,
bty = "o", display.rownames = TRUE,
hlines = TRUE, vlines = TRUE,
bg = cols,
xjust = 2, yjust = 1, cex = 3)
# *According to `?color.scale`, `x` can be a data frame.
# However, when I tried with `testdf`, I got "Error in `[.data.frame`(x, segindex) : undefined columns selected".
A color2D.matplot alternative
library(plotrix)
par(mar = c(0.5, 8, 3.5, 0.5))
color2D.matplot(testdf,
show.values = TRUE,
axes = FALSE,
xlab = "",
ylab = "",
vcex = 2,
vcol = "black",
extremes = c("red", "yellow"))
axis(3, at = seq_len(ncol(testdf)) - 0.5,
labels = names(testdf), tick = FALSE, cex.axis = 2)
axis(2, at = seq_len(nrow(testdf)) -0.5,
labels = rev(rownames(testdf)), tick = FALSE, las = 1, cex.axis = 2)
After this little exercise, I tend to agree with #Drew Steen that LaTeX alternatives may be investigated as well. For example, check here and here.
You can hack something with grid and gtable,
palette(c(RColorBrewer::brewer.pal(8, "Pastel1"),
RColorBrewer::brewer.pal(8, "Pastel2")))
library(gtable)
gtable_add_grobs <- gtable_add_grob # alias
d <- head(iris, 3)
nc <- ncol(d)
nr <- nrow(d)
extended_matrix <- cbind(c("", rownames(d)), rbind(colnames(d), as.matrix(d)))
## text for each cell
all_grobs <- matrix(lapply(extended_matrix, textGrob), ncol=ncol(d) + 1)
## define the fill background of cells
fill <- lapply(seq_len(nc*nr), function(ii)
rectGrob(gp=gpar(fill=ii)))
## some calculations of cell sizes
row_heights <- function(m){
do.call(unit.c, apply(m, 1, function(l)
max(do.call(unit.c, lapply(l, grobHeight)))))
}
col_widths <- function(m){
do.call(unit.c, apply(m, 2, function(l)
max(do.call(unit.c, lapply(l, grobWidth)))))
}
## place labels in a gtable
g <- gtable_matrix("table", grobs=all_grobs,
widths=col_widths(all_grobs) + unit(4,"mm"),
heights=row_heights(all_grobs) + unit(4,"mm"))
## add the background
g <- gtable_add_grobs(g, fill, t=rep(seq(2, nr+1), each=nc),
l=rep(seq(2, nc+1), nr), z=0,name="fill")
## draw
grid.newpage()
grid.draw(g)
Sort of a hacky solution based on ggplot2. I don't totally understand how you actually want to map your colors, since in your example the colors in the table are not mapped to the rownames of testdf, but here I've mapped the colors to the value (converted to a factor).
testdf$color <- rownames(testdf)
dfm <- melt(testdf, id.vars="color")
p <- ggplot(dfm, aes(x=variable, y=color, label=value, fill=as.factor(value))) +
geom_text(colour="black") +
geom_tile(alpha=0.2)
p
You can change what variable the values are mapped to using fill=, and you can change the mapping using scale_fill_manual(values=[a vector of values].
That said, I'd be curious to see a solution that produces an actual table, rather than a plot masquerading as a table. Possibly using Sweave and LaTeX tables?
I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).
I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).
I am trying to create a data table whose cells are different colors based on the value in the cell. I can achieve this with the function addtable2plot from the plotrix package. The addtable2plot function lays a table on an already existing plot. The problem with that solution is that I don't want a plot, just the table.
I've also looked at the heatmap functions. The problem there is that some of the values in my table are character, and the heatmap functions, from what I can tell, only accept numeric matrices. Also, I want my column names to be at the top of the table, not the bottom, and that doesn't seem to be an option.
Here's the example code for addtable2plot. If I could get just the table, filling the whole screen, that would be great.
library(plotrix)
testdf<-data.frame(Before=c(10,7,5,9),During=c(8,6,2,5),After=c(5,3,4,3))
rownames(testdf)<-c("Red","Green","Blue","Lightblue")
barp(testdf,main="Test addtable2plot",ylab="Value",
names.arg=colnames(testdf),col=2:5)
# show most of the options including the christmas tree colors
abg<-matrix(c(2,3,5,6,7,8),nrow=4,ncol=3)
addtable2plot(2,8,testdf,bty="o",display.rownames=TRUE,hlines=TRUE,
vlines=TRUE,title="The table",bg=abg)
Any help would be greatly appreciated.
A heatmap alternative:
library(gplots)
# need data as matrix
mm <- as.matrix(testdf, ncol = 3)
heatmap.2(x = mm, Rowv = FALSE, Colv = FALSE, dendrogram = "none",
cellnote = mm, notecol = "black", notecex = 2,
trace = "none", key = FALSE, margins = c(7, 11))
In heatmap.2 the side of the plot the axis is to be drawn on is hard-coded. But if you type "heatmap.2" at the console and copy the output to an editor, you can search for axis(1, where the 1 is the side argument (two hits). You can then change from a 1 (axis below plot) to a 3 (axis above the plot). Assign the updated function to a new name, e.g. heatmap.3, and run it as above.
An addtable2plot alternative
library(plotrix)
# while plotrix is loaded anyway:
# set colors with color.scale
# need data as matrix*
mm <- as.matrix(testdf, ncol = 3)
cols <- color.scale(mm, extremes = c("red", "yellow"))
par(mar = c(0.5, 1, 2, 0.5))
# create empty plot
plot(1:10, axes = FALSE, xlab = "", ylab = "", type = "n")
# add table
addtable2plot(x = 1, y = 1, table = testdf,
bty = "o", display.rownames = TRUE,
hlines = TRUE, vlines = TRUE,
bg = cols,
xjust = 2, yjust = 1, cex = 3)
# *According to `?color.scale`, `x` can be a data frame.
# However, when I tried with `testdf`, I got "Error in `[.data.frame`(x, segindex) : undefined columns selected".
A color2D.matplot alternative
library(plotrix)
par(mar = c(0.5, 8, 3.5, 0.5))
color2D.matplot(testdf,
show.values = TRUE,
axes = FALSE,
xlab = "",
ylab = "",
vcex = 2,
vcol = "black",
extremes = c("red", "yellow"))
axis(3, at = seq_len(ncol(testdf)) - 0.5,
labels = names(testdf), tick = FALSE, cex.axis = 2)
axis(2, at = seq_len(nrow(testdf)) -0.5,
labels = rev(rownames(testdf)), tick = FALSE, las = 1, cex.axis = 2)
After this little exercise, I tend to agree with #Drew Steen that LaTeX alternatives may be investigated as well. For example, check here and here.
You can hack something with grid and gtable,
palette(c(RColorBrewer::brewer.pal(8, "Pastel1"),
RColorBrewer::brewer.pal(8, "Pastel2")))
library(gtable)
gtable_add_grobs <- gtable_add_grob # alias
d <- head(iris, 3)
nc <- ncol(d)
nr <- nrow(d)
extended_matrix <- cbind(c("", rownames(d)), rbind(colnames(d), as.matrix(d)))
## text for each cell
all_grobs <- matrix(lapply(extended_matrix, textGrob), ncol=ncol(d) + 1)
## define the fill background of cells
fill <- lapply(seq_len(nc*nr), function(ii)
rectGrob(gp=gpar(fill=ii)))
## some calculations of cell sizes
row_heights <- function(m){
do.call(unit.c, apply(m, 1, function(l)
max(do.call(unit.c, lapply(l, grobHeight)))))
}
col_widths <- function(m){
do.call(unit.c, apply(m, 2, function(l)
max(do.call(unit.c, lapply(l, grobWidth)))))
}
## place labels in a gtable
g <- gtable_matrix("table", grobs=all_grobs,
widths=col_widths(all_grobs) + unit(4,"mm"),
heights=row_heights(all_grobs) + unit(4,"mm"))
## add the background
g <- gtable_add_grobs(g, fill, t=rep(seq(2, nr+1), each=nc),
l=rep(seq(2, nc+1), nr), z=0,name="fill")
## draw
grid.newpage()
grid.draw(g)
Sort of a hacky solution based on ggplot2. I don't totally understand how you actually want to map your colors, since in your example the colors in the table are not mapped to the rownames of testdf, but here I've mapped the colors to the value (converted to a factor).
testdf$color <- rownames(testdf)
dfm <- melt(testdf, id.vars="color")
p <- ggplot(dfm, aes(x=variable, y=color, label=value, fill=as.factor(value))) +
geom_text(colour="black") +
geom_tile(alpha=0.2)
p
You can change what variable the values are mapped to using fill=, and you can change the mapping using scale_fill_manual(values=[a vector of values].
That said, I'd be curious to see a solution that produces an actual table, rather than a plot masquerading as a table. Possibly using Sweave and LaTeX tables?