ggplot2 width of boxplot - r

I was trying to make 2 separate plots which I want to present side by side in my poster (I need to make them separate and cannot make use of facet_wrap). One of the plots has several boxplots, while the second plot only has one. How can I manipulate the width of the boxplots such that the second boxplot is the same dimension as the width of any one of the individual boxplots in plot 1, when I put the two plots side by side? A reproducible example:
tvalues <- sample(1:10000,1200)
sex <- c(rep('M',600),rep('F',600))
region <- c('R1','R2','R3','R4','R5')
df1 <- data.frame(tvalues,sex,region)
tvalues2 <- sample(1:10000,200)
sex2 <- sample(c('M','F'),200,replace=T)
region2 <- 'R6'
df2 <- data.frame(tvalues2,sex2,region2)
p1 <- ggplot(data=df1,aes(x=region,y=tvalues,color=sex)) +
geom_boxplot(width=0.5)
p2 <- ggplot(data=df2,aes(x=region2,y=tvalues2,color=sex2)) +
geom_boxplot(width=0.5)
Plot 1
Plot2

I suggest to divide the width of boxes in the second plot by the number of categories of region in the first plot.
p2 <- ggplot(data=df2,aes(x=region2,y=tvalues2,color=sex2)) +
geom_boxplot(width=0.5/length(unique(df1$region)))

In case of a single boxplot like in the following example:
a<- data.frame(obs=rep("A", 50),
value=rnorm(50, 100, 50))
ggplot(a, aes(y=value))+
geom_boxplot()
Wide boxplot
We can establish a false x/y axis and establish an axis limit so the width option of geom_boxplot() determines the width of the box
ggplot(a, aes(y=value, x=0))+
geom_boxplot(width=0.7) +
xlim(-1,1)
Thinner boxplot
You can add the following to remove all x.axis text and ticks
theme(theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())

Related

How do I add a separate legend for each variable in geom_tile?

I would like to have a separate scale bar for each variable.
I have measurements taken throughout the water column for which the means have been calculated into 50cm bins. I would like to use geom_tile to show the variation of each variable in each bin throughout the water column, so the plot has the variable (categorical) on the x-axis, the depth on the y-axis and a different colour scale for each variable representing the value. I am able to do this for one variable using
ggplot(data, aes(x=var, y=depth, fill=value, color=value)) +
geom_tile(size=0.6)+ theme_classic()+scale_y_continuous(limits = c(0,11), expand = c(0, 0))
But if I put all variables onto one plot, the legend is scaled to the min and max of all values so the variation between bins is lost.
To provide a reproducible example, I have used the mtcars, and I have included alpha = which, of course, doesn't help much because the scale of each variable is so different
data("mtcars")
# STACKS DATA
library(reshape2)
dat2b <- melt(mtcars, id.vars=1:2)
dat2b
ggplot(dat2b) +
geom_tile(aes(x=variable , y=cyl, fill=variable, alpha = value))
Which produces
Is there a way I can add a scale bar for each variable on the plot?
This question is similar to others (e.g. here and here), but they do not use a categorical variable on the x-axis, so I have not been able to modify them to produce the desired plot.
Here is a mock-up of the plot I have in mind using just four of the variables, except I would have all legends horizontal at the bottom of the plot using theme(legend.position="bottom")
Hope this helps:
The function myfun was originally posted by Duck here: R ggplot heatmap with multiple rows having separate legends on the same graph
library(purrr)
library(ggplot2)
library(patchwork)
data("mtcars")
# STACKS DATA
library(reshape2)
dat2b <- melt(mtcars, id.vars=1:2)
dat2b
#Split into list
List <- split(dat2b,dat2b$variable)
#Function for plots
myfun <- function(x)
{
G <- ggplot(x, aes(x=variable, y=cyl, fill = value)) +
geom_tile() +
theme(legend.direction = "vertical", legend.position="bottom")
return(G)
}
#Apply
List2 <- lapply(List,myfun)
#Plot
reduce(List2, `+`)+plot_annotation(title = 'My plot')
patchwork::wrap_plots(List2)

How to specify the legend box size in ggplot/ggplot2

In R/ggplot2, I have multiple plots, each of which has a legend box.
I want the legend box to be the same width for each plot, but ggplot2 tries to dynamically size the legend box based on the legend name, key values, etc. (which are unique to each plot).
The various plots must fit into a specified publication slot, with a specified width for the legend, and the plots must be made separately (so faceting to guarantee identical legend widths across the plots isn't possible).
Looking at theme I couldn't find an option to specify the legend box width ... any ideas?
To specify the legend box size you could use + theme(legend.key.size = unit(2, "cm")).
library(tidyverse)
tb <- tibble(a = 1:10, b = 10:1, c = rep(1:2, 5))
ggplot(tb, aes(a, b, colour = c)) +
geom_point() +
theme(legend.key.size = unit(0.2, "cm"))
More details and additional modifications are here and under the keywidth argument here.
#Z.lin had the right approach in the comments. Based on https://wilkelab.org/cowplot/articles/shared_legends.html this might look something like:
library(ggplot2)
library(cowplot)
Make a ggplot object
my_plot <- ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, colour = Species))+
geom_point()
Extract the legend
my_legend <- get_legend(
# create some space to the left of the legend
my_plot + theme(legend.box.margin = margin(0, 0, 0, 12))
)
Re-plot your plot in a grid without the legend (can combine multiple plots here if desired)
my_plot_nl <- <- plot_grid(
my_plot + theme(legend.position="none"),
align = 'vh',
hjust = -1,
nrow = 1
)
Recombine your legend-free plot and legend and specify the relative width of each. The plot now takes up 3/4 of the plot width and the legend 1/4.
plot_grid(my_plot_nl, my_legend, rel_widths = c(3,1))
If you do this for each plot, making sure to use the same rel_widths and saving the figures using the same dimensions, the plot area and legend should be consistent across them.
You might attempt to change your theme call as follows:
theme(legend.margin =margin(r=10,l=5,t=5,b=5))?

X-axis labels on top out of the plot area

I'm plotting a correlation heatmap with x-axis on top by using switch_axis_position.
The x-axis labels are somewhat long, so I want it to be rotated by using angle=90 and align them by using hjust=0.
But this makes the labels too far from the x-axis and even gets them out of the plot area.
library(gtable)
library(cowplot)
library(grid)
heatmap<-ggplot(data=meltedh, aes(x=variable, y=X, fill=value))+
geom_tile(color="White")+
ylab("")+xlab("")+
scale_fill_gradient2(low="blue3", high="red3", mid="white",
midpoint=0,limit=c(-1,1), space="Lab", breaks=c(-0.5,0,0.5),
name="Correlation Coefficient")+
theme(legend.position="bottom",
axis.text.x=element_text(angle=90, hjust=0))
heatmap
ggdraw(switch_axis_position(heatmap,axis='x'))
How can I make this pretty? Any help would be great. Thanks.
Lucky for you I rather enjoy making up data.
So this might be what you want. I did the following things:
Played with hjust to get it close to looking okay
Padded the names with spaces to make them all the same length
Changed the font family to "mono", so the axis text would be aligned
library(gtable)
library(cowplot)
library(grid)
set.seed(1234)
cn <- c("Eastside","Pygrate","Tapeworm","Annerose","Bund",
"Mountain","Appalacia","Summer","Treasure","Riveria",
"Persia","Raggout","Bengal","Siam","Norman")
# Pad out the names with spaces to all be the same length
mxl <- max(nchar(cn))
fmt <- sprintf("%%-%ds",mxl) # the minus adds spaces to the string end
cn <- sprintf(fmt,cn)
rn <- rev(letters[1:16])
ddf <- expand.grid( x=rn, y=cn )
n <- nrow(ddf)
ddf$v <- runif(n,-1,-0.1)
nr <- n/length(cn)
ddf[ddf$y==cn[3],]$v <- runif(nr,0.1,0.8)
ddf[ddf$y==cn[8],]$v <- runif(nr,0.1,0.8)
ddf[ddf$y==cn[13],]$v <- runif(nr,0.1,0.8)
ddf[ddf$x %in% c("i","j","n","o"),]$v <- 0
meltedh <- data.frame(X=ddf$x,variable=ddf$y,value=ddf$v)
heatmap<-ggplot(data=meltedh, aes(x=variable, y=X, fill=value))+
geom_tile(color="White")+
ylab("")+xlab("")+
scale_fill_gradient2(low="blue3", high="red3", mid="white",
midpoint=0,limit=c(-1,1), space="Lab", breaks=c(-0.5,0,0.5),
name="Correlation Coefficient")+
theme(legend.position="bottom",
axis.text.x=element_text(angle=90, hjust=0.5,family="mono"))
heatmap
ggdraw(switch_axis_position(heatmap,axis='x'))
It yields this:

Arrange common plot width with facetted ggplot 2.0.0 & gridExtra

Since I have updated to ggplot2 2.0.0, I cannot arrange charts propperly using gridExtra. The issue is that the faceted charts will get compressed while other will expand. The widths are basically messed up. I want to arrange them similar to the way these single facet plots are: left align two graph edges (ggplot)
I put a reproducible code
library(grid) # for unit.pmax()
library(gridExtra)
plot.iris <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point() +
facet_grid(. ~ Species) +
stat_smooth(method = "lm")
plot.mpg <- ggplot(mpg, aes(x = cty, y = hwy, colour = factor(cyl))) +
geom_point(size=2.5)
g.iris <- ggplotGrob(plot.iris) # convert to gtable
g.mpg <- ggplotGrob(plot.mpg) # convert to gtable
iris.widths <- g.iris$widths # extract the first three widths,
mpg.widths <- g.mpg$widths # same for mpg plot
max.widths <- unit.pmax(iris.widths, mpg.widths)
g.iris$widths <- max.widths # assign max. widths to iris gtable
g.mpg$widths <- max.widths # assign max widths to mpg gtable
grid.arrange(g.iris,g.mpg,ncol=1)
As you will see, the top chart, the first facet is expanded while the other 2 get compressed at the right. Bottom chart does not cover all width.
Could it be that the new ggplot2 version is messing with the gtable widths?
Anyone know a workaround?
Thank you very much
EDIT: Added picture of chart
I'm looking for something like:
one option is to massage each plot into a 3x3 gtable, where the central cell wraps all the plot panels.
Using the example from #SandyMuspratt
# devtools::install_github("baptiste/egg")
grid.draw(egg::ggarrange(plots=plots, ncol=1))
the advantage being that once in this standardised format, plots may be combined in various layouts much more easily, regardless of number of panels, legends, axes, strips, etc.
grid.newpage()
grid.draw(ggarrange(plots=list(p1, p4, p2, p3), widths = c(2,1), debug=TRUE))
I'm not sure if you're still looking for a solution, but this is fairly general. I'm using ggplot 2.1.0 (now on CRAN). It's based on this solution. I break the problem into two parts. First, I deal with the left side of the plots, making sure the widths for the axis material are the same. This has already been done by others, and there are solutions on SO. But I don't think the result looks good. I would prefer the panels to align on the right side as well. So second, the procedure makes sure the widths of the columns to the right of the panels are the same. It does this by adding a column of appropriate width to the right of each of the plots. (There's possibly neater ways to do it. There is - see #baptiste solution.)
library(grid) # for pmax
library(gridExtra) # to arrange the plots
library(ggplot2) # to construct the plots
library(gtable) # to add columns to gtables of plots without legends
mpg$g = "Strip text"
# Four fairly irregular plots: legends, faceting, strips
p1 <- ggplot(mpg, aes(displ, 1000*cty)) +
geom_point() +
facet_grid(. ~ drv) +
stat_smooth(method = "lm")
p2 <- ggplot(mpg, aes(x = hwy, y = cyl, colour = factor(cyl))) +
geom_point() +
theme(legend.position=c(.8,.6),
legend.key.size = unit(.3, "cm"))
p3 <- ggplot(mpg, aes(displ, cty, colour = factor(drv))) +
geom_point() +
facet_grid(. ~ drv)
p4 <- ggplot(mpg, aes(displ, cty, colour = factor(drv))) +
geom_point() +
facet_grid(g ~ .)
# Sometimes easier to work with lists, and it generalises nicely
plots = list(p1, p2, p3, p4)
# Convert to gtables
g = lapply(plots, ggplotGrob)
# Apply the un-exported unit.list function for grid package to each plot
g.widths = lapply(g, function(x) grid:::unit.list(x$widths))
## Part 1: Make sure the widths of left axis materials are the same across the plots
# Get first three widths from each plot
g3.widths <- lapply(g.widths, function(x) x[1:3])
# Get maximum widths for first three widths across the plots
g3max.widths <- do.call(unit.pmax, g3.widths)
# Apply the maximum widths to each plot
for(i in 1:length(plots)) g[[i]]$widths[1:3] = g3max.widths
# Draw it
do.call(grid.arrange, c(g, ncol = 1))
## Part 2: Get the right side of the panels aligned
# Locate the panels
panels <- lapply(g, function(x) x$layout[grepl("panel", x$layout$name), ])
# Get the position of right most panel
r.panel = lapply(panels, function(x) max(x$r)) # position of right most panel
# Get the number of columns to the right of the panels
n.cols = lapply(g.widths, function(x) length(x)) # right most column
# Get the widths of these columns to the right of the panels
r.widths <- mapply(function(x,y,z) x[(y+1):z], g.widths, r.panel, n.cols)
# Get the sum of these widths
sum.r.widths <- lapply(r.widths, sum)
# Get the maximum of these widths
r.width = do.call(unit.pmax, sum.r.widths)
# Add a column to the right of each gtable of width
# equal to the difference between the maximum
# and the width of each gtable's columns to the right of the panel.
for(i in 1:length(plots)) g[[i]] = gtable_add_cols(g[[i]], r.width - sum.r.widths[[i]], -1)
# Draw it
do.call(grid.arrange, c(g, ncol = 1))
Taking off these two lines and keeping the rest, it worked just fine.
g.iris$widths <- max.widths # assign max. widths to iris gtable
g.mpg$widths <- max.widths # assign max widths to mpg gtable
Probably it was limiting the width of them.
This is ugly but if you're under a time pressure this hack will work (not generalizable and dependent upon plot window size). Basically make the top plot 2 columns with a blank plot on the right and guess at the widths.
grid.arrange(
grid.arrange(plot.iris, ggplot() + theme_minimal(),ncol=2, widths = c(.9, .1)),
plot.mpg,
ncol=1
)

Line up columns of bar graph with points of line plot with ggplot

Is there any way to line up the points of a line plot with the bars of a bar graph using ggplot when they have the same x-axis? Here is the sample data I'm trying to do it with.
library(ggplot2)
library(gridExtra)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line()
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity")
grid.arrange(no, yes)
Here is the output:
The first point of the line plot is to the left of the first bar, and the last point of the line plot is to the right of the last bar.
Thank you for your time.
Extending #Stibu's post a little: To align the plots, use gtable (Or see answers to your earlier question)
library(ggplot2)
library(gtable)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line() +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity") +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
gYes = ggplotGrob(yes) # get the ggplot grobs
gNo = ggplotGrob(no)
plot(rbind(gNo, gYes, size = "first")) # Arrange and plot the grobs
Edit To change heights of plots:
g = rbind(gNo, gYes, size = "first") # Combine the plots
panels <- g$layout$t[grepl("panel", g$layout$name)] # Get the positions for plot panels
g$heights[panels] <- unit(c(0.7, 0.3), "null") # Replace heights with your relative heights
plot(g)
I can think of (at least) two ways to align the x-axes in the two plots:
The two axis do not align because in the bar plot, the geoms cover the x-axis from 0.5 to 27.5, while in the other plot, the data only ranges from 1 to 27. The reason is that the bars have a width and the points don't. You can force the axex to align by explicitly specifying an x-axis range. Using the definitions from your plot, this can be achieved by
yes <- yes + scale_x_continuous(limits=c(0,28))
no <- no + scale_x_continuous(limits=c(0,28))
grid.arrange(no, yes)
limits sets the range of the x-axis. Note, though, that the alginment is still not quite perfect. The y-axis labels take up a little more space in the upper plot, because the numbers have two digits. The plot looks as follows:
The other solution is a bit more complicated but it has the advantage that the x-axis is drawn only once and that ggplot makes sure that the alignment is perfect. It makes use of faceting and the trick described in this answer. First, the data must be combined into a single data frame by
all <- rbind(data.frame(other_data,type="other"),data.frame(data,type="data"))
and then the plot can be created as follows:
ggplot(all,aes(x=x,y=y)) + facet_grid(type~.,scales = "free_y") +
geom_bar(data=subset(all,type=="other"),stat="identity") +
geom_point(data=subset(all,type=="data")) +
geom_line(data=subset(all,type=="data"))
The trick is to let the facets be constructed by the variable type which was used before to label the two data sets. But then each geom only gets the subset of the data that should be drawn with that specific geom. In facet_grid, I also used scales = "free_y" because the two y-axes should be independent. This plot looks as follows:
You can change the labels of the facets by giving other names when you define the data frame all. If you want to remove them alltogether, then add the following to your plot:
+ theme(strip.background = element_blank(), strip.text = element_blank())

Resources