Align barplot with boxplot in R - r

I would like to plot a distribution of counts using the barplot function in R, and underlay it with a boxplot to include information on median, quartiles, and outliers. A not-too-elegant solution for this has been found for histogram and boxplots: http://rgraphgallery.blogspot.com/2013/04/rg-plotting-boxplot-and-histogram.html.
There are many places online where one can find the argument being made that numerical data should be plotted with histograms while categorical data should be plotted with bar plots. My data are numerical, and in fact on a ratio scale (as they are counts), but because they are discrete, I want columns with gaps, not columns that touch, which seems to be the only option for histogram().
I currently have the following, but bar- and boxplot do not align quite perfectly:
set.seed(476372)
counts1 <- rpois(10000,3)
nf <- layout(mat = matrix(c(1,2),2,1, byrow=TRUE), height = c(3,1))
par(mar=c(3.1, 3.1, 1.1, 2.1))
barplot(prop.table(table(counts1)))
boxplot(counts1, horizontal=TRUE, outline=TRUE,ylim=c(0,12), frame=F, width = 10)
Here my question: How can I make them align?

Another option that's similar but a little more work. This preserves the option for gaps between the bars:
tbl <- prop.table(table(counts1))
left <- -0.4 + do.call('seq', as.list(range(counts1)))
right <- left + (2 * 0.4)
bottom <- rep(0, length(left))
top <- tbl
xlim <- c(-0.5, 0.5) + range(counts1)
nf <- layout(mat = matrix(c(1,2),2,1, byrow=TRUE), height = c(3,1))
par(mar=c(3.1, 3.1, 1.1, 2.1))
plot(NA, xlim=xlim, ylim=c(0, max(tbl)))
rect(left, bottom, right, top, col='gray')
boxplot(counts1, horizontal=TRUE, outline=TRUE, ylim=xlim, frame=F, width = 10)

Maybe using a "fake" histogram at the end
ht=hist(counts1,breaks=12,plot = F)
ht$counts=as.numeric(table(counts1))
ht$density=as.numeric(prop.table(table(counts1)))
ht$breaks=as.numeric(names(table(counts1)))
ht$mids=sapply(1:(length(ht$breaks)-1),function(z)mean(ht$breaks[z:(z+1)]))
plot(ht,freq=F,col=3,main="")
boxplot(counts1, horizontal=TRUE,outline=TRUE,ylim=range(ht$breaks), frame=F, col="green1", width = 10)

Related

How do I add multiple subplots into a multirow figure in R?

i need to overlay multiple subplots onto a single plot which is already contained inside a multirow figure (see image)
the reason why i need subplots instead of screen layout is because the figure will be possibly multicolumn, also (a 5 by 3 plot, in total)
there are packages which assist in doing subplots, but they break when you use multirow figures, and sequential subplots, except the first one, are rendered next to the overall figure border, not relative to the current row/column plot borders
i understand large packages such as ggplot2 allow this relatively easily, but base R plots are highly preferable
UPD:
the minimum reproducible example depicting the problem is here:
require(Hmisc)
COL.1 <- c('red','orange','yellow'); COL.2 <- c('blue','green','turquoise')
SUBPLOT.FUN <- function(COL) {plot(rnorm(100), type='l', col=COL)}
PLOT.FUN <- function(i) {
plot(rnorm(100),ylim=c(-1,1))
subplot(SUBPLOT.FUN(COL.1[i]), 100,1, vadj=1,hadj=1,pars=list(mfg=c(1,i)))
subplot(SUBPLOT.FUN(COL.2[i]), 100,-1,vadj=0,hadj=1,pars=list(mfg=c(1,i)))
}
plot.new(); par(mfrow=c(1,3))
for (i in 1:3) {
PLOT.FUN(i)
}
which looks like that:
while what is required is shown on the first image (meaning, EACH of the three plots must contain 3 subplots in their respective locations (along the right border, arranged vertically))
N.B. either the figure is multirow or multicolumn (as depicted) does not matter
Something like this? Inspired in this R-bloggers post.
# reproducible test data
set.seed(2022)
x <- rnorm(1000)
y <- rbinom(1000, 1, 0.5)
z <- rbinom(1000, 4, 0.5)
# save default values and prepare
# to create custom plot areas
old_par <- par(fig = c(0,1,0,1))
# set x axis limits based on data
h <- hist(x, plot = FALSE)
xlim <- c(h$breaks[1] - 0.5, h$breaks[length(h$breaks)] + 2)
hist(x, xlim = xlim)
# x = c(0.6, 1) right part of plot
# y = c(0.5, 1) top part of plot
par(fig = c(0.6, 1, 0.5, 1), new = TRUE)
boxplot(x ~ y)
# x = c(0.6, 1) right part of plot
# y = c(0.1, 0.6) bottom part of plot
par(fig = c(0.6, 1, 0.1, 0.6), new = TRUE)
boxplot(x ~ z)
# put default values back
par(old_par)
Created on 2022-08-18 by the reprex package (v2.0.1)

Left-aligning ggplot when saved while using a fixed aspect ratio

I'm building a custom ggplot theme to standardize the look & feel of graphs I produce. The goal is more complex than this minimal example, so I'm looking for a general solution. I have a few key goals:
I want all graphs to export at the same size (3000 pixels wide, 1500 pixels high).
I want to control the aspect ratio of the plot panel itself.
I want to use textGrobs to include figure numbers.
I want the image to be left-aligned
The challenge I'm facing is that when combining these two constraints, the image that gets saved centers the ggplot graph within the window, which makes sense as a default, but looks bad in this case.
I'm hoping there's a general solution to left-align the ggplot panel when I export. Ideally, this will also work similarly for faceted graphs.
It seems that something should be possible using one of or some combination of the gridExtra, gtable, cowplot, and egg packages, but after experimenting for a few hours I'm at a bit of a loss. Does anybody know how I can accomplish this? My code is included below.
This is the image that gets produced. As you can see, the caption is left-aligned at the bottom, but the ggplot itself is horizontally centered. I want the ggplot graph left-aligned as well.
Graph output: https://i.stack.imgur.com/5EM2c.png
library(ggplot2)
# Generate dummy data
x <- paste0("var", seq(1,10))
y <- LETTERS[1:10]
data <- expand.grid(X=x, Y=y)
data$Z <- runif(100, -2, 2)
# Generate heatmap with fixed aspect ratio
p1 <- ggplot(data, aes(X, Y, fill= Z)) +
geom_tile() +
labs(title = 'A Heatmap Graph') +
theme(aspect.ratio = 1)
# A text grob for the footer
figure_number_grob <- grid::textGrob('Figure 10',
x = 0.004,
hjust = 0,
gp = grid::gpar(fontsize = 10,
col = '#01A184'))
plot_grid <- ggpubr::ggarrange(p1,
figure_number_grob,
ncol = 1,
nrow = 2,
heights = c(1,
0.05))
# save it
png(filename = '~/test.png', width = 3000, height = 1500, res = 300, type = 'cairo')
print(plot_grid)
dev.off()
I was able to find a solution to this that works for my needs, though it does feel a bit hacky.
Here's the core idea:
Generate the plot without a fixed aspect ratio.
Split the legend from the plot as its own component
Use GridExtra's arrangeGrob to combine the plot, a spacer, the legend, and another spacer horizontally
Set the width of the plot to some fraction of npc (normal parent coordinates), in this case 0.5. This means that the plot will take up 50% of the horizontal space of the output file.
Note that this is not exactly the same as setting a fixed aspect ratio for the plot. If you know the size of the output file, it's close to the same thing, but the size of axis text & axis titles will affect the output aspect ratio for the panel itself, so while it gets you close, it's not ideal if you need a truly fixed aspect ratio
Set the width of the spacers to the remaining portion of the npc (in this case, 0.5 again), minus the width of the legend to horizontally center the legend in the remaining space.
Here's my code:
library(ggplot2)
# Generate dummy data
x <- paste0("var", seq(1,10))
y <- LETTERS[1:10]
data <- expand.grid(X=x, Y=y)
data$Z <- runif(100, -2, 2)
# Generate heatmap WITHOUT fixed aspect ratio. I address this below
p1 <- ggplot(data, aes(X, Y, fill= Z)) +
geom_tile() +
labs(title = 'A Heatmap Graph')
# Extract the legend from our plot
legend = gtable::gtable_filter(ggplotGrob(p1), "guide-box")
plot_output <- gridExtra::arrangeGrob(
p1 + theme(legend.position="none"), # Remove legend from base plot
grid::rectGrob(gp=grid::gpar(col=NA)), # Add a spacer
legend, # Add the legend back
grid::rectGrob(gp=grid::gpar(col=NA)), # Add a spacer
nrow=1, # Format plots in 1 row
widths=grid::unit.c(unit(0.5, "npc"), # Plot takes up half of width
(unit(0.5, "npc") - legend$width) * 0.5, # Spacer width
legend$width, # Legend width
(unit(0.5, "npc") - legend$width) * 0.5)) # Spacer width
# A text grob for the footer
figure_number_grob <- grid::textGrob('Figure 10',
x = 0.004,
hjust = 0,
gp = grid::gpar(fontsize = 10,
col = '#01A184'))
plot_grid <- ggpubr::ggarrange(plot_output,
figure_number_grob,
ncol = 1,
nrow = 2,
heights = c(1,
0.05))
# save it
png(filename = '~/test.png', width = 3000, height = 1500, res = 300, type = 'cairo')
print(plot_grid)
dev.off()
And here's the output image: https://i.stack.imgur.com/rgzFy.png

How to alter the position of a legend in a bar plot in R?

I generated a barplot in R but the legend is covering almost all of the plot. How can I adjust it to a different position? For example besides/outside of the plot?
This is my code:
compare <- table(cats$color, cats$coat)
bar2 <- barplot(compare, legend = rownames(compare), main = 'Comparing coat design to color')
bar2
We can't give you the exact settings since you haven't provided an example of your data, but the general advice is as follows:
If you have a large legend, you can place it in the margins of the plot rather than over the plot. To do that, adjust mar so that there is room for it, and set xpd = TRUE. Here's an example with mtcars:
df <- head(mtcars, 14)
par(mar = c(5.1, 4.1, 4.1, 9.1), xpd = TRUE)
barplot(df$mpg)
legend("topright", inset = c(-0.5, -0.3), rownames(df))

Plotting marginal histograms (as factors) and scatterplot (as numeric) from the same variable in R

I'm trying to create a scatterplot with marginal histograms as in this question.
My data are two (numeric) variables which share seven discrete (somewhat) logarithmically-spaced levels.
I've successfully done this with the help of ggMarginal in the ggExtra package, however I'm not happy with the outcome as when plotting the marginal histograms using the same data as for the scatterplots, things don't line up.
As can be seen below, the histogram bars are biased a little to the right or left of the datapoints themselves.
library(ggMarginal)
library(ggplot2)
x <- rep(log10(c(1,2,3,4,5,6,7)), times=c(3,7,12,18,12,7,3))
y <- rep(log10(c(1,2,3,4,5,6,7)), times=c(3,1,13,28,13,1,3))
d <- data.frame("x" = x,"y" = y)
p1 <- ggMarginal(ggplot(d, aes(x,y)) + geom_point() + theme_bw(), type = "histogram")
A possible solution for this may be change the variables used in the histograms into factors, so they are nicely aligned with the scatterplot axes.
This works well when creating histograms using ggplot:
p2 <- ggplot(data.frame(lapply(d, as.factor)), aes(x = x)) + geom_histogram()
However, when I try to do this using ggMarginal, I do not get the desired result - it appears that the ggMarginal histogram is still treating my variables as numeric.
p3 <- ggMarginal(ggplot(d, aes(x,y)) + geom_point() + theme_bw(),
x = as.factor(x), y = as.factor(y), type = "histogram")
How can I ensure my histogram bars are centred over the data points?
I'm absolutely willing to accept an answer which does not involve use of ggMarginal.
Not sure if it is a good idea to replicate here the answer I gave to the question you mentioned but I have no rights to comment still, please let me know otherwise.
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
If you are willing to give baseplotting a try, here is a function:
plots$scatterWithHists <- function(x, y, histCols=c("lightblue","lightblue"), lhist=20, xlim=range(x), ylim=range(y), ...){
## set up layout and graphical parameters
layMat <- matrix(c(1,4,3,2), ncol=2)
layout(layMat, widths=c(5/7, 2/7), heights=c(2/7, 5/7))
ospc <- 0.5 # outer space
pext <- 4 # par extension down and to the left
bspc <- 1 # space between scatter plot and bar plots
par. <- par(mar=c(pext, pext, bspc, bspc), oma=rep(ospc, 4)) # plot parameters
## barplot and line for x (top)
xhist <- hist(x, breaks=seq(xlim[1], xlim[2], length.out=lhist), plot=FALSE)
par(mar=c(0, pext, 0, 0))
barplot(xhist$density, axes=FALSE, ylim=c(0, max(xhist$density)), space=0, col=histCols[1])
## barplot and line for y (right)
yhist <- hist(y, breaks=seq(ylim[1], ylim[2], length.out=lhist), plot=FALSE)
par(mar=c(pext, 0, 0, 0))
barplot(yhist$density, axes=FALSE, xlim=c(0, max(yhist$density)), space=0, col=histCols[2], horiz=TRUE)
## overlap
dx <- density(x)
dy <- density(y)
par(mar=c(0, 0, 0, 0))
plot(dx, col=histCols[1], xlim=range(c(dx$x, dy$x)), ylim=range(c(dx$y, dy$y)),
lwd=4, type="l", main="", xlab="", ylab="", yaxt="n", xaxt="n", bty="n"
)
points(dy, col=histCols[2], type="l", lwd=3)
## scatter plot
par(mar=c(pext, pext, 0, 0))
plot(x, y, xlim=xlim, ylim=ylim, ...)
}
Just do:
scatterWithHists(x,y, histCols=c("lightblue","orange"))
And you get:
If you absolutely want to use ggMargins then look up xparams and yparams. It says you can send additional arguments to x-margin and y-margin using those. I was only successful in sending trivial things like color. But maybe sending something like xlim would help.

How to plot the value of abline in R?

I used this code to make this plot:
plot(p, cv2,col=rgb(0,100,0,50,maxColorValue=255),pch=16,
panel.last=abline(h=67,v=1.89, lty=1,lwd=3))
My plot looks like this:
1.) How can I plot the value of the ablines in a simple plot?
2.) How can I scale my plot so that both lines appear in the middle?
to change scale of plot so lines are in the middle change the axes i.e.
x<-1:10
y<-1:10
plot(x,y)
abline(a=1,b=0,v=1)
changed to:
x<-1:10
y<-1:10
plot(x,y,xlim=c(-30,30))
abline(a=1,b=0,v=1)
by "value" I am assuming you mean where the line cuts the x-axis? Something like text? i.e.:
text((0), min(y), "number", pos=2)
if you want the label on the x axis then try:
abline(a=1,b=0,v=1)
axis(1, at=1,labels=1)
to prevent overlap between labels you could remove the zero i.e.:
plot(x,y,xlim=c(-30,30),yaxt="n")
axis(2, at=c(1.77,5,10,15,20,25))
or before you plot extend the margins and add the labels further from the axis
par(mar = c(6.5, 6.5, 6.5, 6.5))
plot(x,y,xlim=c(-30,30))
abline(a=1,b=0,v=1)
axis(2, at=1.77,labels=1.77,mgp = c(10, 2, 0))
Similar in spirit to the answer proposed by #user1317221, here is my suggestion
# generate some fake points
x <- rnorm(100)
y <- rnorm(100)
# positions of the lines
vert = 0.5
horiz = 1.3
To display the lines at the center of the plot, first compute the horizontal and vertical distances between the data points and the lines, then adjust the limits adequately.
# compute the limits, in order for the lines to be centered
# REM we add a small fraction (here 10%) to leave some empty space,
# available to plot the values inside the frame (useful for one the solutions, see below)
xlim = vert + c(-1.1, 1.1) * max(abs(x-vert))
ylim = horiz + c(-1.1, 1.1) * max(abs(y-horiz))
# do the main plotting
plot(x, y, xlim=xlim, ylim=ylim)
abline(h=horiz, v=vert)
Now, you could plot the 'values of the lines', either on the axes (the lineparameter allows you to control for possible overlapping):
mtext(c(vert, horiz), side=c(1,2))
or alternatively within the plotting frame:
text(x=vert, y=ylim[1], labels=vert, adj=c(1.1,1), col='blue')
text(x=xlim[1], y=horiz, labels=horiz, adj=c(0.9,-0.1), col='blue')
HTH

Resources