Introduce explicit line break in ggplot2 on the Y-axis (boxplot) - r

I'm attempting to write some code that can be used to make boxplots of temperatures at which proteins melt at, I'm 99% there except I need to introduce a line break on the y-axis of my boxplot.
Essentially, my current y axis scale goes from 45-60, I want to make the y axis start at 0, line break, 45-60. See the picture as an e.g.
I've tried using the scale_y_continuous to set a break but that didn't work as I'd hoped.
df %>%
group_by(Protein) %>%
ggplot(., aes(x = factor(Protein), y = Melting_Temperature)) +
geom_boxplot() +
theme_classic() +
geom_point(aes(x = as.numeric(df$Protein) + 0.5, colour = Protein),
alpha=0.7)+
xlab("Protein Type")+
ylab("Melting Temperature") +
stat_summary(fun.y=mean, colour = "darkred", geom = "point", shape =
18, size = 3, show_guide = FALSE) +
geom_text(data = means, aes(label = round(Melting_Temperature, 1), y =
Melting_Temperature + 0.5))

IMHO, tick marks and axis labels should be sufficient to indicate the range of data on display. So, there is no need to start an axis at 0 (except for bar charts and alike).
However, the package ggthemes offers Tufte style axes which might be an alternative to the solution the OP is asking for:
library(ggplot2)
library(ggthemes)
ggplot(iris) +
aes(x = Species, y = Sepal.Length) +
geom_boxplot() +
geom_rangeframe() +
theme_tufte(base_family = "")
Note that the iris dataset is used here in place of OP's data which are not available.
geom_rangeframe() plots axis lines which extend to the maximum and minimum of the plotted data. As the plot area is usually somewhat larger this creates a kind of gap.
theme_tufte() is a theme based on Chapter 6 "Data-Ink Maximization and Graphical Design" of Edward Tufte's The Visual Display of Quantitative Information with no border, no axis lines, and no grids.

This is not supported in ggplot as built. In this discussion from 2010, Hadley Wickham (author of ggplot as well as RStudio et al) explains that axis breaks are questionable practice in his view.
Those comments by Hadley are linked, and other options discussed, in this prior SO discussion.

Related

Points keep getting cut off, and standard fixes don't work well with facet grid on a log scale

Novice R user here wrestling with some arcane details of ggplot
I am trying to produce a plot that charts two data ranges: One plotted as a line, and another plotted on the same plot, but as points. The code is something roughly like this:
ggplot(data1, aes(x = Year, y = Capacity, col = Process)) +
geom_line() +
facet_grid(Country ~ ., scales = "free_y") +
scale_y_continuous(trans = "log10") +
geom_point(data = data2, aes(x = Year, y = Capacity, col = Process))
I've left out some additional cosmetic arguments for the sake of simplicity.
The problem is that the points from the geom_point keep getting cut off by the x axis:
I know the standard fix here would be to adjust the y limits to make room for the points:
scale_y_continuous(limits = c(-100, Y_MAX))
But here there is a separate problem due to the facet grid with free scales, since there is no single value for Y_MAX
I've also tried it using expansions:
scale_y_continuous(expand = c(0.5, 0))
But here, it runs into problems with the log scale, since it multiplies by different values for each facet, producing very wonky results.
I just want to produce enough blank space on the bottom of each facet to make room for the point. Or, alternatively, move each point up a little bit to make room. Is there any easy way to do this in my case?
This might be a good place for scales::pseudo_log_trans, which combines a log transformation with a linear transformation (and a flipped sign log transformation) to retain most of the benefits of a log transformation while also allowing zero and negative values. Adjust the sigma parameter of the function to adjust where the transition from linear to log should happen.
library(ggplot2)
ggplot(data = data.frame(country = rep(c("France","USA"), each = 5),
x = rep(1:5, times = 2),
y = c(10^(2:6), 0, 10^(1:4))),
aes(x,y)) +
geom_point() +
# scale_y_continuous(trans = "log10") +
scale_y_continuous(trans = scales::pseudo_log_trans(),
breaks = c(0, 10^(0:6)),
labels = scales::label_number_si()) +
facet_wrap(~country, ncol = 1, scales = "free_y")
vs. with (trans = "log10"):

Increase spaces between x values of boxplot (overlapping x labels)

Hello I am very new to using coding language and recently made my first couple of figures in R. I used this code to make the figures and they turned out good except that the labels in the x axis were overlapping.
library(ggplot2)
ggplot(LR_density, aes(x=Plant_Lines, y=`Lateral_Root_Density.(root/cm)`, fill=Expression_Type)) +
geom_boxplot() +
geom_jitter(color="black", size=0.4, alpha=0.9) +
ggtitle("Lateral root density across plant expression types")
The figure produced by the line of code I used
I was wondering if anyone knew how to get the x axis labels to be more spaced out in ggplot2 boxplots. I have been looking around but havent found a clear answer on this. Any help on what to do or where to look would be great!
As per comment, this thread shows another option to deal with overlapping x axis labels, which one can use since ggplot2 3.3.0
In included a second graph which "squeezes" the axis a bit, which kind of also simulates the effect of changing the viewport/ file size.
library(ggplot2)
ggplot(diamonds, aes(x = cut, y = price)) +
geom_boxplot() +
scale_x_discrete(guide = guide_axis(n.dodge = 2))
ggplot(diamonds, aes(x = cut, y = price)) +
geom_boxplot() +
scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
coord_fixed(1/10^3.4)
Created on 2020-04-30 by the reprex package (v0.3.0)

how to prevent axes from intersecting in ggplot2

I'm using ggplot2 to make line graphs of some log-transformed data that all have large values (between 10^6 and 10^8); since the axes doesn't start at zero, I'd prefer not to have them intersect at the "origin."
Here's what the axes currently look like:
I'd prefer something more like one gets from base graphics (but I'm additionally using geom_ribbon and other fancy things I really like in ggplot2, so I'd prefer to find a ggplot2 solution):
Here's what I'm doing currently:
mydata <- data.frame(Day = rep(1:8, 3),
Treatment = rep(c("A", "B", "C"), each=8),
Value = c(7.415929, 7.200486, 7.040555, 7.096490, 7.056413, 7.143981, 7.429724, 7.332760, 7.643673, 7.303994, 7.343151, 6.923636, 6.923478, 7.249170, 7.513370, 7.438630, 7.209895, 7.000063, 7.160154, 6.677734, 7.026307, 6.830495, 6.863329, 7.319219))
ggplot(mydata, aes(x=Day, y=Value, group=Treatment))
+ theme_classic()
+ geom_line(aes(color = Treatment), size=1)
+ scale_y_continuous(labels = math_format(10^.x))
+ coord_cartesian(ylim = c(6.4, 7.75), xlim=c(0.5, 8))
plot(mydata$Day, mydata$Value, frame.plot = F) #non-intersecting axes
Workaround for this problem would be to remove axis lines with theme(axis.line=element_blank()) and then add false axis lines with geom_segment() - one for x axis and second for y axis. x, y , xend and yend values are determined from your plot (taken as the smallest and the largest values shown on plot for each corresponding axis) and axis limits used in coord_cartesian() (minimal value of limits to ensure that segment is plotted in place of axis).
ggplot(mydata, aes(x=Day, y=Value, group=Treatment)) +theme_classic() +
geom_line(aes(color = Treatment), size=1) +
scale_y_continuous(labels = math_format(10^.x))+
coord_cartesian(ylim = c(6.4, 7.75), xlim=c(0.5, 8))+
theme(axis.line=element_blank())+
geom_segment(x=2,xend=8,y=6.4,yend=6.4)+
geom_segment(x=0.5,xend=0.5,y=6.5,yend=7.75)
An older question. But since I was looking for this functionality recently I thought I'd flag the ggh4x package, which adds guides for truncating axes.
library(ggh4x)
#> Loading required package: ggplot2
ggplot(data.frame(x=0:10, y=0:10), aes(x, y)) +
geom_point() +
theme_classic() +
guides(x = "axis_truncated", y = "axis_truncated")
Created on 2023-02-17 with reprex v2.0.2
Apart from convenience, two nice things about the ggh4x option are that 1) it is stable across more complex plot compositions like faceting and 2) its dependencies are a subset of those belonging to ggplot2, so you aren't introducing a bunch of additional imports.
P.S. There's an open GitHub issue to bring this kind of "floating axes" functionality to the main ggplot2 library. It looks like it will eventually be incorporated.

ggplot2 multiple stat_binhex() plots with different color gradients in one image

I'd like to use ggplot2's stat_binhex() to simultaneously plot two independent variables on the same chart, each with its own color gradient using scale_colour_gradientn().
If we disregard the fact that the x-axis units do not match, a reproducible example would be to plot the following in the same image while maintaining separate fill gradients.
d <- ggplot(diamonds, aes(x=carat,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("white","blue"),name = "Frequency",na.value=NA)
try(ggsave(plot=d,filename=<some file>,height=6,width=8))
d <- ggplot(diamonds, aes(x=depth,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("yellow","black"),name = "Frequency",na.value=NA)
try(ggsave(plot=d,filename=<some other file>,height=6,width=8))
I found some conversation of a related issue in ggplot2 google groups here.
Here is another possible solution: I have taken #mnel's idea of mapping bin count to alpha transparency, and I have transformed the x-variables so they can be plotted on the same axes.
library(ggplot2)
# Transforms range of data to 0, 1.
rangeTransform = function(x) (x - min(x)) / (max(x) - min(x))
dat = diamonds
dat$norm_carat = rangeTransform(dat$carat)
dat$norm_depth = rangeTransform(dat$depth)
p1 = ggplot(data=dat) +
theme_bw() +
stat_binhex(aes(x=norm_carat, y=price, alpha=..count..), fill="#002BFF") +
stat_binhex(aes(x=norm_depth, y=price, alpha=..count..), fill="#FFD500") +
guides(fill=FALSE, alpha=FALSE) +
xlab("Range Transformed Units")
ggsave(plot=p1, filename="plot_1.png", height=5, width=5)
Thoughts:
I tried (and failed) to display a sensible color/alpha legend. Seems tricky, but should be possible given all the legend-customization features of ggplot2.
X-axis unit labeling needs some kind of solution. Plotting two sets of units on one axis is frowned upon by many, and ggplot2 has no such feature.
Interpretation of cells with overlapping colors seems clear enough in this example, but could get very messy depending on the datasets used, and the chosen colors.
If the two colors are additive complements, then wherever they overlap equally you will see a neutral gray. Where the overlap is unequal, the gray would shift to more yellow, or more blue. My colors are not quite complements, judging by the slightly pink hue of the gray overlap cells.
I think what you want goes against the principles of ggplot2 and the grammar of graphics approach more generally. Until the issue is addressed (for which I would not hold my breath), you have a couple of choices
Use facet_wrap and alpha
This is will not produce a nice legend, but takes you someway to what you want.
You can set the alpha value to scale by the computed Frequency, accessed by ..Frequency..
I don't think you can merge the legends nicely though.
library(reshape2)
# in long format
dm <- melt(diamonds, measure.var = c('depth','carat'))
ggplot(dm, aes(y = price, fill = variable, x = value)) +
facet_wrap(~variable, ncol = 1, scales = 'free_x') +
stat_binhex(aes(alpha = ..count..), colour = 'grey80') +
scale_alpha(name = 'Frequency', range = c(0,1)) +
theme_bw() +
scale_fill_manual('Variable', values = setNames(c('darkblue','yellow4'), c('depth','carat')))
Use gridExtra with grid.arrange or arrangeGrob
You can create separate plots and use gridExtra::grid.arrange to arrange on a single image.
d_carat <- ggplot(diamonds, aes(x=carat,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("white","blue"),name = "Frequency",na.value=NA)
d_depth <- ggplot(diamonds, aes(x=depth,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("yellow","black"),name = "Frequency",na.value=NA)
library(gridExtra)
grid.arrange(d_carat, d_depth, ncol =1)
If you want this to work with ggsave (thanks to #bdemarest comment below and #baptiste)
replace grid.arrange with arrangeGrob something like.
ggsave(plot=arrangeGrob(d_carat, d_depth, ncol=1), filename="plot_2.pdf", height=12, width=8)

Add a footnote citation outside of plot area in R?

I'd like to add a footnote citation to my 3-panel facet grid plot produced in R. It's a footnote to credit the data source. I'd ideally like to have it below and external to all three axes---preferably in the lower left.
I'm using ggplot2 and also ggsave(). This means I can't use grid.text()-based solutions, because that only draws on the x11() window, and can't be added to the ggplot object.
Using instead png() ...code... dev.off() does not appear to be an option because I need ggsave's resizing parameters, and find this command produces better, clearer prints (that are also much faster, because I'm not printing to the screen).
Here's my basic code:
p1 <- ggplot(data, aes(date, value))
facet_grid(variable ~ .) + geom_point(aes(y =value), size=1) +
theme_bw() +
opts(title=mytitle)
print(p1)
ggsave("FILE.png",width=mywidth, height=myheight, p1, dpi=90)
I've tried:
p1 <- ggplot(data, aes(date, value))
facet_grid(variable ~ .) + geom_point(aes(y =value), size=1) +
theme_bw() +
opts(title=mytitle)
print(p1)
grid.text(unit(0.1,"npc"),0.025,label = "Data courtesy of Me")
grid.gedit("GRID.text", gp=gpar(fontsize=7))
ggsave("FILE.png",width=mywidth, height=myheight, p1, dpi=90)
This appropriately puts the footnote in the lower left corner on the x11() display, external to the plots, but unfortunately, since it isn't applied to the p1 object, it isn't saved by the ggsave command.
I've also tried:
p1 <- ggplot(data, aes(date, value))
facet_grid(variable ~ .) + geom_point(aes(y =value), size=1) +
theme_bw() +
opts(title=mytitle) +
annotate("text", label = "Footnote", x = 0, y = 10, size = 5, colour = "black") +
print(p1)
ggsave("FILE.png",width=mywidth, height=myheight, p1, dpi=90)
This successfully prints using ggsave, however it has the following problems:
It is repeated 3 times, in each of the 3 facets, rather than 1 time.
It is contained within the plots, rather than external to them.
Text is difficult to place---seems to be using plot units (my x-axis is date, so 0 puts it around 1970).
The text size doesn't seem to change despite my size parameter.
A couple of related links from when I explored this...
ggplot2 footnote
(doesn't work with ggsave)
How to label the barplot in ggplot with the labels in another test result?
(is inside the plot, not external/below plot)
Different font faces and sizes within label text entries in ggplot2
(doesn't work with ggsave)
problem saving pdf file in R with ggplot2
ggplot2 now has this ability natively with no need for additional packages. ... + labs(caption = "footnote", ...)
library(ggplot2)
ggplot(diamonds, aes(carat, price, color = clarity)) +
geom_point() +
labs(title = "Diamonds are forever...",
subtitle = "Carat weight by Price",
caption = "H. Wickham. ggplot2: Elegant Graphics for Data Analysis Springer-Verlag New York, 2009.")
library(gridExtra)
library(grid)
library(ggplot2)
g <- grid.arrange(qplot(1:10, 1:10, colour=1:10) + labs(caption="ggplot2 caption"),
bottom = textGrob("grid caption", x = 1,
hjust = 1, gp = gpar(fontface = 3L, fontsize = 9)))
ggsave("plot.pdf", g)
Edit: note that this solution is somewhat complementary to the recent caption argument added to ggplot2, since the textGrob can here be aligned with respect to the whole figure, not just the plot panel.
Adding to the answer of Brandon Bertelsen: if you want to have the caption in the left corner, add
theme(plot.caption = element_text(hjust = 0))

Resources