ggplot grid with constant x-axis scale but varying axis limits - r

Take a look at the following plotting code:
library(ggplot2)
library(cowplot)
a <- data.frame(a1=1:10, a2=1:10)
b <- data.frame(b1=1:5, b2=2*(1:5))
aplot <- ggplot(a, aes(x=a1, ymin=0, ymax=12)) +
geom_line(aes(y=a2))
bplot <- ggplot(b, aes(x=b1, ymin=0, ymax=12)) +
geom_line(aes(y=b2))
plot_grid(aplot,bplot, ncol=2)
It yields two side-by-side plots of identical dimensions showing similar lines. But the x-axis scales are rather different. In fact, the second line has twice the slope of the first.
I am looking for a way to plot this figure so that the width of a plot is scaled by the limits of its x-axis, so that the slopes can be compared visually. The real plots I am interested in visualizing are five in number and will lack y-axis labels except for the leftmost. I can use grid.arrange() to plot them all in a row with whatever widths I want, but the problem is that I don't know what width to assign to each panel to make sure they come out right (the panel width has to be large enough to accommodate the plot margins, the y-axis tick marks, and the y-axis text). I can set the margins myself and account for them in my panel widths, but I cannot find a good way to figure out how wide (e.g. in cm) the y-axis text is.

You can use the rel_widths option in plot_grid to achieve this. You need to calculate the relative size you want each plot to be using the ratio of the ranges xmax-xmin of each panel. But, there is an extra catch. rel_widths sets the relative width of the whole panel, including the margins. So we also need to account for the margins in calculating the relative size. In the following code, adding an offset value of 2 to the numerator and denominator of relative.size works for this. But note that this offset value may change if you alter the size of the margins.
aplot <- ggplot(a, aes(x=a1, ymin=0, ymax=12, xmin=0, xmax=max(a$a1))) +
geom_line(aes(y=a2))
bplot <- ggplot(b, aes(x=b1, ymin=0, ymax=12, xmin=0, xmax=max(b$b1))) +
geom_line(aes(y=b2))
relative.size <- (2+max(b$b1)) / (2+max(a$a1)) # the addition of 2 here is to account for the plot margins
plot_grid(aplot,bplot, ncol=2, rel_widths=c(1,relative.size), align = "h")
gives

Related

How can I ensure consistent axis lengths between plots with discrete variables in ggplot2?

I've been trying to standardise multiple bar plots so that the bars are all identical in width regardless of the number of bars. Note that this is over multiple distinct plots - faceting is not an option. It's easy enough to scale the plot area so that, for instance, a plot with 6 bars is 1.5* the width of a plot with 4 bars. This would work perfectly, except that each plot has an expanded x axis by default, which I would like to keep.
"The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables."
https://ggplot2.tidyverse.org/reference/scale_discrete.html
My problem is that I can't for the life of me work out what '0.6 units' actually means. I've manually measured the distance between the bars and the y axis in various design tools and gotten inconsistent answers, so I can't factor '0.6 units' into my calculations when working out what size the panel windows should be. Additionally I can't find any answers on how many 'units' long a discrete x axis is - I assumed at first it would be 1 unit per category but that doesn't fit with the visuals at all. I've included an image that hopefully shows what I mean - the two graphs
In this image, the top graph has a plot area exactly 1.5* that of the bottom graph. Seeing as it has 6 bars compared with 4, that would mean each bar is the same width, except that that extra space between the axis and the first bar messes this up. Setting expand = expansion(add = c(0, 0)) clears this up but results in not-so-pretty graphs. What I'd like is for the bars to be identical in width between the two plots, accounting for this extra space. I'm specifically looking for a general solution that I can use for future plots, not for the individual solution for this sample. As such, what I'd really like to know is how many 'units' long are these two x axes? Many thanks for any and all help!
Instead of using expansion for the axis, I would probably use the fact that categorical variables are actually plotted on the positive integers on Cartesian co-ordinates. This means that, provided you know the maximum number of columns you are going to use in your plots, you can set this as the range in coord_cartesian. There is a little arithmetic involved to keep the bars centred, but it should give consistent results.
We start with some reproducible data:
library(ggplot2)
set.seed(1)
df <- data.frame(group = letters[1:6], value = 100 * runif(6))
Now we set the value for the maximum number of bars we will need:
MAX_BARS <- 6
And the only thing "funny" about the plot code is the calculation of the x axis limits in coord_cartesian:
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
Now let us remove one factor level and run the exact same plot code:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
And again:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
And again:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
You will see the bars remain constant width and centralized, yet the panel size remains fixed.
Created on 2021-11-06 by the reprex package (v2.0.0)

How to alter distances between plots in a 4 X 4 graph panel?

I am trying to create a graph panel with 8 graphs in total ( 4 x 4). Each graph corresponds to a different gene, whereby there are three lines ( one for control, one for UC disease and one for Crohns), representing the average change in expression comparing a first measurement and a second.
The code I am using to run each of the plots is;
s <- ggplot(X876, aes(x=Timepoint, y=value, group=Group)) +
geom_line(aes(color=Group), size=1)+
geom_point(aes(color=Group), size=2.5) +
labs(y="X876") + ylim(0.35, 0.55) +
theme_classic() +
scale_color_manual(values=c("darkmagenta", "deepskyblue4", "dimgrey"))
Using grid.arrange(l, m, n, o, p, q, r, s, nrow=4, nrow=4), creates a graph panel where the y axes names overlap.
I have seen on here about changing the plot margins via,
pl = replicate(3, ggplot(), FALSE)
grid.arrange(grobs = pl)
margin = theme(plot.margin = unit(c(2,2,2,2), "cm"))
grid.arrange(grobs = lapply(pl, "+", margin))
However, I am unsure how this can be applied to increase the vertical height between the plots on the top and bottom rows. For each of the graphs l, m, n, o, p, q, r, s do I need to include
+ theme(plot.margin=unit(c(t,r,b,l),"cm"))
and then run the grid.arrange(l, m, n, o, p, q, r, s, nrow=4, ncol=4)
Please could somebody suggest which values do I need to include for top (t), right(r), bottom (b), left(l) to only increase the distance (by about 3cms) between the top and bottom row? I am trying different values and I'm not getting a decent graph panel yet.
Thank-you
Probably the easiest way is to create your own theme based on the theme_classic theme and then modify the plotting margins (and anything else) the way that you prefer.
theme_new <- theme_classic() +
theme(plot.margin=unit(c(1,0,1,0), "cm")) # t,r,b,l
Then set the theme (will revert back to the default on starting a new R session).
theme_set(theme_new)
The alternative is to use grid.arrange and modify the margins using the grobs as you've already mentioned.
Once the panels have been arranged, you can then modify the top and bottom margins (or left and right) by specifying the vp argument of grid.arrange, which allows you to modify the viewport of multiple grobs on a single page. You can specify the height and width using the viewport function from the grid package.
For example, if you have a list of ggplot() grobs called g.list that contain your individual plots (l,m,n,o,p,q,r,s), then the following would reduce the height of the viewport by 90%, which effectively increases the top and bottom margins equally by 5%.
library(grid)
library(gridExtra)
grid.arrange(grobs = g.list, vp=viewport(height=0.9))
Without your data, I can't test it, especially to see if the y-axes labels overlap. And I don't know why you think increasing the top and bottom margins can solve that problem since the y-axes are, by default, on the left-hand side of the graph.
Anyway, I'll use the txhousing dataset from the ggplot2 package to see if I can reproduce your problem.
library(ggplot2)
data(txhousing)
theme_new <- theme_classic() +
theme(plot.margin=unit(c(0.1,0.1,0.1,0.1), "cm"), text=element_text(size=8))
theme_set(theme_new)
tx.list <- split(txhousing, txhousing$year)
g.list <- lapply(tx.list, function(data)
{
ggplot(data, aes(x=listings, y=sales)) +
geom_point(size=0.5)
} )
grid.arrange(grobs = g.list, vp=viewport(height=0.9))
I don't see any overlapping. And I don't see why increasing the top and bottom margins would make much difference.
The question was asked a couple of years ago, but I bumped into it only now and thought that I might share a quick and dirty tip for this, which works good enough in many cases.
In some situations the theme is already so complex that this trick might be the easiest way: adding a few \n's (newlines) to the x and y axis names, as this will affect the distances between the plots in the panel. I've learned this trick for a slightly different purpose from here (originally from here).
I'll use the same logic for the example dataset (in this case: Orange from R built-in data sets) as in the excellent code by the previous answerer.
library(ggplot2)
library(gridExtra)
or.list <- split(Orange, Orange$Tree)
g.list <- lapply(or.list, function(data)
{
ggplot(data, aes(x=age, y=circumference)) +
theme_classic() +
geom_point(size=0.5) +
scale_x_continuous(name = "Age\n\n") +
scale_y_continuous(name = "\n\n\nCircumference")
} )
grid.arrange(grobs = g.list)

re-sizing ggplot geom_dotplot

I'm having trouble creating a figure with ggplot2. I am using geom_dotplot with center stacking to display my data which are discrete values for 4 categories.
For aesthetic reasons I want to customize the positions of the dots so that
reduce the empty space between dots along the y axis, (ie the dots are 1 value large)
The distributions fit and don't overlap
I've adjusted the bin and dotsize to achieve aesthetic goal 1, but that requires me to fiddle with the ylim() parameter to make sure that the groups fit in the plot. This results in a plot with more whitw space and few numbers on the y axis.
Question: Can anyone explain a way to resize the empty space on this plot?
My code is below:.
plot <- ggplot(figdata, aes(y=Counts, x=category, col=strain)) +
geom_dotplot(aes(fill=strain), dotsize=1, binwidth=.7,
binaxis= "y",stackdir ="centerwhole", stackratio=.7) +
ylim(18,59)
plot + scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_errorbar(stat="hline", yintercept="mean",
aes( ymax=..y..,ymin=..y.., group = category, width = 0.5),
color="black")
Which produces:
EDIT: Incorporating jitter will allow the all the data to fit, but I don't want to add noise to this data and would prefer to show it as discreet data.
adjusting the binwidth and dotsize to 0.3 as suggested below also fits all the data, however it leaves too much white space.
I think that I might have to transform my data so that the values are steps smaller than 1, in order to get everything to fit horizontally and dot sizes to big large enough to reduce white space.
I think the easiest way is using coord_cartesian:
plot + scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_errorbar(stat="hline", yintercept="mean",
aes( ymax=..y..,ymin=..y.., group = category, width = 0.5),
color="black") +
coord_cartesian(ylim=c(17,40))
Which gives me this plot (with fake data that are not as neatly distributed as yours):

How to put labels of legend inside plot in ggplot2

Context: R/ggplot2.
Is there an automated way (or even a manual way) to put the legend factors inside the plot like the energies here (Co, 4,6,10,...), instead of having them in a regular legend box next to the plot ?
Source: Radiation Oncology Physics: A Handbook for Teachers and Students, EB. Podgorsak
So this seems close. I'd characterize this as "semi-automatic": there's definitely some tweaking needed, but most of the work is done for you...
The tricky bit is not placing the text labels (geom_text(...)), but creating the breaks in the plotted curves. This is done with geom_rect(...), where the width of the rectangles are set to the maximum label width, as determined using strwidth(...).
# create sample data
df <- data.frame(x=rep(seq(0,20,.01),5),k=rep(1:5,each=2001))
df$y <- with(df,x*exp(-x/k))
library(ggplot2)
eps.x <- max(strwidth(df$k)) # maximum width of legend label
eps.y <- eps.x*diff(range(df$y))/diff(range(df$x))
ggplot(df,aes(x,y))+
geom_line(aes(group=factor(k)))+
geom_rect(data=df[df$x==5,],
aes(xmax=x+eps.x, xmin=x-eps.x, ymax=y+eps.y, ymin=y-eps.y),
fill="white", colour=NA)+
geom_text(data=df[df$x==5,],aes(x,y,label=k))+
theme_bw()
If you want to color the lines too:
ggplot(df,aes(x,y))+
geom_line(aes(color=factor(k)))+
geom_rect(data=df[df$x==5,],
aes(xmax=x+eps.x, xmin=x-eps.x, ymax=y+eps.y, ymin=y-eps.y),
fill="white", colour=NA)+
geom_text(data=df[df$x==5,],aes(x,y,label=k), colour="black")+
scale_color_discrete(guide="none")+
theme_bw()

Fix for overflowing x-axis text in ggplot2

I've created custom, two level x-axis entries that tend to work pretty well. The only problem is that when my y-axis, proportion, is close to one, these axis entries spill onto the chart area. When I use vjust to manually alter their vertical position, part of each entry is hidden by the chart boundary.
Any suggestions for how to make chart boundaries that dynamically adjust to accommodate large y-axis values and the full text of each entry (without running on to the chart).
Have a look at the following example:
library(ggplot2)
GroupType <- rep(c("American","European"),2)
Treatment <- c(rep("Smurf",2),rep("OompaLoompa",2))
Proportion <- rep(1,length(GroupType))
PopulationTotal <- rep(2,length(GroupType))
sampleData <- as.data.frame(cbind(GroupType,Treatment,Proportion,PopulationTotal))
hist_cut <- ggplot(sampleData, aes(x=GroupType, y=Proportion, fill=Treatment, stat="identity"))
chartCall<-expression(print(hist_cut + geom_bar(position="dodge") + scale_x_discrete(breaks = NA) +
geom_text(aes(label = paste(as.character(GroupType),"\n[N=",PopulationTotal,"]",sep=""),y=-0.02),size=4) + labs(x="",y="",fill="")
))
dev.new(width = 860, height = 450)
eval(chartCall)
Any thoughts about how I can fix the sloppy x-axis text?
Many thanks in advance,
Aaron
Unfortunately you have to manage the y axis yourself - there's currently no way for ggplot2 to figure out how much extra space you need because the physical space required depends on the size of the plot. Use, e.g., expand_limits(y = -0.1) to budget a little extra space for the text.

Resources