Relative positioning text annotations in ggplot - r

There are other variations of this question, such as:
R: place geom_text() relative to plot borders rather than fixed position on the plot
ggplot2 annotate layer position in R
Position ggplot text in each corner
In my opinion, these do not solve the general problem. The first simply pre-calculates the x and y ranges so that proportions can be used. The second two use the "trick" that one can pass +/- Inf to position text in a given corner.
Here are two improvements I think would make for a more generalized solution:
allow arbitrary positioning of a label via relative positioning
works with variables calculated on the fly via dplyr (rules out pre-calculating ranges/ratios)
For sample data:
data.frame(
x = runif(100, min = sample(0:50, 1), max = sample(50:1000, 1)),
y = runif(100, min = sample(0:1000, 1), max = sample(1000:10000, 1))
) %>%
mutate(z = x + y) %>%
# code here to plot and put an annotation at e.g. x = 0.95, y = 0.1, relative to plot limits

I've been wrestling with this today and had a possible improvement to existing answers, leveraging some additional learning about how one can access the data from within the ggplot() call.
I found (see 1 and 2) that by surrounding the ggplot call in {} and passing . as the data argument, one can continue to refer to . throughout the call. This enables:
pos_x <- 0.95
pos_y <- 0.1
data.frame(
x = runif(100, min = sample(0:50, 1), max = sample(50:1000, 1)),
y = runif(100, min = sample(0:1000, 1), max = sample(1000:10000, 1))
) %>%
mutate(z = x + y) %>% {
ggplot(., aes(x = x, y = z)) + geom_point() +
annotate(geom = "text", label = "some label",
x = min(.$x) + pos_x * diff(range(.$x)),
y = min(.$z) + pos_y * diff(range(.$z)),
hjust = 1, vjust = 1) +
scale_x_continuous(limits = range(.$x)) +
scale_y_continuous(limits = range(.$z))
}
You can re-run this and observe the plot label stay fixed even as the x/y axis ranges change significantly. For some improvement opportunities:
the lower axis limit varies with the data so just using y = number_close_to_zero*max(.$y) could be risky if min(.$y) is too high. For this reason, I manually specified the axis limits
similarly, for this reason the position isn't exact between plots if you just do pos_x_rel * max(.$x), so I used min(.$x) + diff(range(.$x)) instead
hjust and vjust aren't automatic; they need to be tweaked depending on the desired label location
it would be nice to automagically get the variable used for x/y vs. having to use the column name. In other words, if I wanted to change to aes(..., y = y), I wouldn't have to change instances of .$z to .$y.

Related

ggplot cowplot ensuring y axes are identical when arranging plots with log scale

I want to create a combination plot using plot_grid from the cowplot package.
The two plots that I want to combine use a log scale. Of the data plotted, some is negative, which gets dropped.
I can quite easily produce a decent result using facet_wrap that looks like this:
library(tidyverse)
tibble(x = rnorm(100),
y = rnorm(100),
type = "A") %>%
bind_rows(tibble(x = rnorm(100, mean = 10),
y = rnorm(100, mean = 10),
type = "B")) %>%
ggplot(aes(y = y, x = x)) +
geom_point() +
facet_wrap(~type)
But in my particular situation, I can't use facet_wrap because I want to give the panels A and B different x-axis labels and want to change the number format slightly (e.g. adding a $ sign to the axis ticks of panel A and a % sign to panel B).
Therefore I use plot_grid:
tibble(x = rnorm(100),
y = rnorm(100),
type = "A") %>%
ggplot(aes(y = y, x = x)) +
geom_point() +
scale_y_log10() -> a
tibble(x = rnorm(100, mean = 10),
y = rnorm(100, mean = 10),
type = "B") %>%
ggplot(aes(y = y, x = x)) +
geom_point() +
scale_y_log10() -> b
cowplot::plot_grid(a,b)
Now the problem is that the axis is completely distorted (this would be equal to scales = "free_y" in facet_wrap)
So therefore I attempt to set the limits/ranges for both plots manually by choosing the min and max from both plots:
lims <- c(min(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range),
max(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range))
cowplot::plot_grid(a + ylim(lims),b + ylim(lims))
But now the result is this:
So essentially I want to replicate the scales="fixed" in facet_wrap using plot_grid
Any ideas?
many thanks!
The issue is that you provide y axis limits in log10 scale as returned by layer_scales. You need to convert it to actual values.
lims = 10^c(min(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range),
max(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range))
Alternatively, you can compute the range of the actual data.

Using geom_text, nudge half the length of a label in ggplot

I wish to add labels to points on a ggplot. The labels should be below each point. There may be multiple labels per point. If so, they should be left-justified. Each label may be a different length.
For each point, the shortest name should be centered below each point. Thus, I wish to nudge_x = half the length of the shortest name for each point.
How do I determine the length of a label so as to nudge half its value?
Example
library("tidyverse")
df <- tibble(
x = c("one", "two"),
y = c(2.5, 1.7),
company = c("Normal", "Short\nA_bit_longer")
)
company_nudge_x <- -0.1
company_nudge_y <- -0.2
ggplot(df, aes(x = x, y = y, group = x)) +
geom_point(size = 5) +
geom_line(aes(group = "x")) +
coord_cartesian(ylim = c(0.3, 2.7)) +
# Labels
geom_text(aes(label = company),
#nudge_x = company_nudge_x,
nudge_y = company_nudge_y,
hjust = 0) # left_justify text
A bit of a hack, but in this solution, for each row of data, you can:
get the first part of the label (if applicable)
count the number of characters
determine the nudge value based on that count (you might have to play around with it, adjusting the value for char_nudge)
And then apply that inside the geom_text() function, inside the aesthetics.
Two things to keep in mind:
Because you have a categorical variable on x, you need to convert it to a factor and then an integer in order to be able to add a nudge to it when using it for the position of the geom_text (thankfully, ggplot2 and as.factor() will both order the levels alphabetically);
This works best with a monospace font (you can try and see what happens if you remove the argument family = "mono": the l is not as wide as other letters, which results in a switch in position).
library(tidyverse)
# example dataframe
df <- tibble(
x = c("one", "two"),
y = c(2.5, 1.7),
company = c("Normalllllllllllll", "Short\nA_bit_longerrrrrr")
)
# set constants
company_nudge_y <- -0.2
char_nudge <- 0.03
# augment dataframe
df <- df %>%
mutate(comp_small = str_extract(company, "^.+"),
len_lab = nchar(comp_small),
nudge_x = -len_lab / 2 * char_nudge)
# plot it
ggplot(df, aes(x = x, y = y, group = x)) +
geom_point(size = 5) +
geom_line(aes(group = "x")) +
coord_cartesian(ylim = c(0.3, 2.7)) +
# Labels
geom_text(aes(x = as.integer(as.factor(x)) + nudge_x, # add the nudge
label = company),
family = "mono", # monospace font will work better
nudge_y = company_nudge_y,
hjust = 0) # left_justify text
Created on 2020-11-27 by the reprex package (v0.3.0)

ggplot2 missing labels after custom scaling of axis

I am attempting to apply a custom scaling of my x-axis using ggplot2 and scales::trans_new(). However, when I do some of the axis labels go missing. Can someone help me figure out why?
Setup:
library(tidyverse)
# the data
ds <- tibble(
myx = c(1, .5, .1, .01, .001, 0),
myy = 1:6
)
# the custom transformation
forth_root_trans_rev <- scales::trans_new(
name = "sign_fourth_root_rev",
transform = function (x) { - abs(x)^(1/4) },
inverse = function (x) { x^4 }
)
Plot 1:
When I try and plot this the label for x = 0 gets lost.
# plot - missing x-label at `0`
ggplot(ds, aes(x = myx, y = myy)) +
geom_line() +
geom_point() +
scale_x_continuous(
trans = forth_root_trans_rev,
breaks = sort(unique(ds$myx)),
)
Plot 2
When I add some space on both sides of the graph, even more x-labels get lost.
# plot - missing x-labels below 0.5
ggplot(ds, aes(x = myx, y = myy)) +
geom_line() +
geom_point() +
scale_x_continuous(
trans = forth_root_trans_rev,
breaks = sort(unique(ds$myx)),
expand = expand_scale(mult = c(.1, .6))
)
I presume this is related to this old issue: https://github.com/tidyverse/ggplot2/issues/980. Nevertheless, I can't figure out how to apply this transformation and retain all x-labels.
Where am I going wrong?
The problem here is due to the combination of two factors:
Your x-axis values (after transformation) fall in the [-1, 0] range, so any expansion (whether additive or multiplicative) will nudge the final range to cover both positive and negative values.
Your custom transformation is not one-to-one in the [<some negative number>, <some positive number>] region.
How it occurred
Somewhere deep inside the all code used to build the ggplot object (you can run ggplot2:::ggplot_build.ggplot before printing the plot & step into layout$setup_panel_params(), but I don't recommend this for casual users... the rabbit hole goes really deep down there), x-axis breaks are calculated in the following manner:
Obtain limits for the transformed values (for c(1, .5, .1, .01, .001, 0) in the question, this will be (-1, 0)).
Add expansion to the limits, if applicable (default expansion for a continuous axis is 5% on either side, so the limits become (-1.05, 0.05)).
Apply the inverse transformation on the limits (taking x^4 on the limits yields (1.215506, 0.000006)).
Apply the transformation on both user-inputted breaks & limits (for breaks, c(1, .5, .1, .01, .001, 0) becomes (-1.0000000, ..., 0.0000000), but for limits, (1.215506, 0.000006) now becomes (-1.05, -0.05), which is narrower than (-1.05, 0.05)).
Breaks beyond the limit's range are dropped (since the limits now stop at -0.05, the break at 0 is dropped).
How to get around this
You can modify your transformation with the use of sign() to preserve positive / negative values, such that the transformation is one-to-one in the full range, as suggested by Hadley in the discussion on the GH issue you linked. For example:
# original
forth_root_trans_rev <- scales::trans_new(
name = "sign_fourth_root_rev",
transform = function (x) { - abs(x)^(1/4) },
inverse = function (x) { x^4 }
)
# new
forth_root_trans_rev2 <- scales::trans_new(
name = "sign_fourth_root_rev",
transform = function (x) { -sign(x) * abs(x)^(1/4) },
inverse = function (x) { -sign(x) * abs(x)^4 }
)
library(dplyr)
library(tidyr)
# comparison of two transformations
# y1 shows a one-to-one mapping in either (-Inf, 0] or [0, Inf) but not both;
# y2 shows a one-to-one mapping in (-Inf, Inf)
data.frame(x = seq(-1, 1, 0.01)) %>%
mutate(y1 = x %>% forth_root_trans_rev$transform() %>% forth_root_trans_rev$inverse(),
y2 = x %>% forth_root_trans_rev2$transform() %>% forth_root_trans_rev2$inverse()) %>%
gather(trans, y, -x) %>%
ggplot(aes(x, y, colour = trans)) +
geom_line() +
geom_vline(xintercept = 0, linetype = "dashed") +
facet_wrap(~trans)
Usage
p <- ggplot(ds, aes(x = myx, y = myy)) +
geom_line() +
geom_point() +
theme(panel.grid.minor = element_blank())
p +
scale_x_continuous(
trans = forth_root_trans_rev2,
breaks = sort(unique(ds$myx))
)
p +
scale_x_continuous(
trans = forth_root_trans_rev2,
breaks = sort(unique(ds$myx)),
expand = expand_scale(mult = c(.1, .6)) # with different expansion factor, if desired
)

R ggplot: How to define group dependent y-axis breaks using facetted ggplots?

I have 40 groups (defined by short_ID) and would like to produce 40 different plots that use different y-scale breaks for each short_ID. I want the breaks for the y-scale to be (1) mean-2SD, (2) mean and (3) mean+2SD.
I have a dataset called Dataplots containing my X and Y variables and the grouping variable "short_ID". I have created additional vectors M$SD11 (=mean-2SD), M$mean and M$SD22 (=mean+2SD) to define the breaks and M$short_ID as grouping variable. The code below partly works but the problem is that I do not know how to make the breaks group-dependent (i.e., dependent on short_ID). When I run the code below I get the same y axis breaks for all plots, namely for example the max of the vector M$SD22 instead of a different M$SD22 value for each plot. So I think I need to add something to
"scale_y_continuous(breaks=c(M$SD11, M$mean, M$SD22)", for example "scale_y_continuous(group=M$short_ID, breaks=c(M$SD11, M$mean, M$SD22)" but this does not work.
Does anybody know what I can do to define different breaks for my different groups (i.e, short_IDs)? How can I change the code below to do this? Many thanks!
Dataplot <- ggplot(data = Dataplots, aes(x = Measure, y = Amylase_u, group = short_ID)) + geom_line() + facet_wrap(~ short_ID) + scale_y_continuous(breaks=c(M$SD11, M$mean, M$SD22))
I have added an example of 'Dataplots' and 'M'. For the purpose of the example I included only two groups (i.e., short_IDs) instead of the 40 I actually have. Thus this example would need to produce 2 plots, one for each short_ID with different y-axis breaks for each of the groups.
Example of Dataplots:
dput(Dataplots) structure(list(short_ID = c(1111, 1111, 1111, 1111, 2222, 2222, 2222, 2222), Measure = c(1, 2, 3, 4, 1, 2, 3, 4), Amylase_u = c(81.561, 75.648, 145.25, 85.246, 311.69, 261.74, 600.93, 291.39)), .Names = c("short_ID", "Measure", "Amylase_u"), row.names = c(NA, -8L), class = "data.frame", codepage = 65001L)
Example of M:
dput(M) structure(list(SD11 = c(162, 682), mean = c(97, 366), SD22 = c(32, 51), short_ID = c(1111, 2222)), .Names = c("SD11", "mean", "SD22", "short_ID"), row.names = 1:2, class = "data.frame")
#Mark I have been trying to apply your suggestions to my complete dataset but cannot seem to get it right. I have in total 61 plots. I started with:
myPlots <-
lapply(unique(Dataplots$short_ID), function(thisID){
Dataplots %>%
filter(short_ID == thisID) %>%
ggplot(aes(x = Measure, y = Amylase_u)) +
geom_line() +
scale_y_continuous(breaks= M %>%
filter(short_ID == thisID) %>%
select(mean) %>%
as.numeric()
) +
ggtitle(thisID)
})
(As you can see I decided to go for the subject-mean on the y-axis only and decided to drop the SDs.) I then continued with your final cowplot sugestion:
plot_grid(ggdraw() + draw_label("Amylase_u", angle = 90), plot_grid(
plot_grid(plotlist = lapply(myPlots, function(x){x + theme(axis.title = element_blank())}))
, ggdraw() + draw_label("Measurement")
, ncol = 1
, rel_heights = c(0.9, .1))
, nrow = 1, rel_widths = c(0.05, 0.95))
This, however, results in 61 plots with the subject-mean on the y-axis but without the Measurements depecited in it (so the graph itself is missing). I figured there may be a ')' misplaced so I tried:
plot_grid(
ggdraw() + draw_label("Amylase_u", angle = 90)
, plot_grid(
plot_grid(plotlist = lapply(myPlots, function(x){x +theme(axis.title = element_blank())}))
, ggdraw() + draw_label("Measurement")
, ncol = 1
, rel_heights = c(0.9, .1)
, nrow = 1
, rel_widths = c(0.05, 0.95)))
This does give me graphs but they are tiny and the layout is terrible (Rplot2). I tried adapting the rel-heights and widths too but even after reading the help-file don't quite get how I should adapt them.
Thanks again!
Rplot2
Finally, I removed the IDnumbers on top of each plot because they are not really necessary and this already greatly improves the plot (Rplot3), but still the layout needs to be adjusted.
Rplot3
My understanding is that this still remains impossible in the facet functions. However, you can accomplish it yourself using the cowplot package.
First, loop over your ideas (in lapply) and generate each of the sub-plots you wanted. Note that I am using dplyr for the pipe and filtering.
myPlots <-
lapply(unique(Dataplots$short_ID), function(thisID){
Dataplots %>%
filter(short_ID == thisID) %>%
ggplot(aes(x = Measure, y = Amylase_u)) +
geom_line() +
scale_y_continuous(breaks= M %>%
filter(short_ID == thisID) %>%
select(SD11, mean, SD22) %>%
as.numeric()
) +
ggtitle(thisID)
})
Then, call the function plot_grid from cowplot with the list of plots:
plot_grid(plotlist = myPlots)
gives:
A few notes:
cowplot autoloads its own default style, so use theme_set to return to your preferred style
Your included data appear to not actually span all of the thresholds you gave for the y-axis breaks
This should work for an arbitrarily large number of subplots, though you may want/ need to adjust labels and alignment to make them readable.
Since I am not sure what your goal is, here is another alternative. If you just want to plot deviation from mean (in standard deviations) to make the changes comparable, you could just calculate the z-score of the column within the groups and plot the results. Using dplyr again:
Dataplots %>%
group_by(short_ID) %>%
mutate(scaledAmylase = as.numeric(scale(Amylase_u)) ) %>%
ggplot(aes(x = Measure
, y = scaledAmylase)) +
geom_line() +
facet_wrap(~short_ID)
gives
Or, if the mean/SD are calculated/defined somewhere else (and stored in M) rather than coming directly from the data, you can scale using M instead of the data:
Dataplots %>%
left_join(M) %>%
mutate(scaledAmylase = (Amylase_u - mean) / ((SD22 - mean) / 2) ) %>%
ggplot(aes(x = Measure
, y = scaledAmylase)) +
geom_line() +
facet_wrap(~short_ID)
gives
And, because I can't leave well enough alone, here is a version of the plot_grid approach that removes the duplicated axis titles and includes them just once instead (like facet_wrap would). As above, increasing the number of subplots or the aspect ratio will force you to tweak the relative values here:
plot_grid(
ggdraw() + draw_label("Amylase_u", angle = 90)
, plot_grid(
plot_grid(plotlist = lapply(myPlots, function(x){x + theme(axis.title = element_blank())}))
, ggdraw() + draw_label("Measurement")
, ncol = 1
, rel_heights = c(0.9, .1))
, nrow = 1
, rel_widths = c(0.05, 0.95)
)
gives

How to get total number of x displayed in ggplot?

Consider:
x <- rnorm(100)
qplot(x)
How to I get the total number (N = 100) of x displayed on the top right corner in my ggplot?
See actual output:
See this example (N = 37):
You can also set the location of the label programmatically, based on the data values. ggplot2 defaults to 30 bins, so the code below uses 30 bins to set the y-value for the label location:
set.seed(101)
x <- rnorm(100)
qplot(x) +
annotate("text", label=paste0("N = ", length(x)), x=max(x), y=max(table(cut(x, 30))))
or
qplot(x) +
geom_text(aes(label=paste0("N = ", length(x)), x=max(x), y=max(table(cut(x, 30)))))
UPDATE: To address your comment, let's plot with a discrete x vector. Now if we still want the y position of the text to be at the maximum, we once again find the category with the maximum number of counts. The data are already discrete, so we just need y=max(table(x)). For the x position, if we want the label at the maximum x value, we need the number of unique x categories, since ggplot implicitly numbers these from 1 to the N (where N is the number of categories). The unique function returns a vector containing each unique category. We just need the length of this vector to get the maximum x value in the graph: x=length(unique(x)).
set.seed(101)
x <- cut(rnorm(100), 5)
qplot(x) +
geom_text(aes(label=paste0("N = ", length(x)), x=length(unique(x)), y=max(table(x))))
Lots of ways. geom_text is the most general tool. For a one-off label, maybe annotate:
qplot(x) +
annotate("text",x = Inf,y = Inf,label = "N = 100",hjust = 1.5,vjust = 1.5)
The other answers show how you can add the text to your plot. But annotate() can also be used to add other geoms. If you want to put your annotation inside a rectangle, for instance, you can do the following:
x0 <- max(x)
y0 <- max(table(cut(x, 30)))
qplot(x) +
annotate("rect", xmin = x0*.8, xmax = x0*1.2, ymin = y0*.95, ymax = y0*1.05,
fill = "white", colour = "black") +
annotate("text", label = paste0("N = ", length(x)), x = x0, y = y0)
which gives
Up to the line that starts with annotate("rect", everything is taken from the other answers to this question.
Like this? (code below)
# install.packages("ggplot2", dependencies = TRUE)
library(ggplot2)
set.seed(421)
x <- rnorm(100)
qplot(x) + annotate("text", x = 2, y = 15, label = paste("N =", length(x)))

Resources