Add a labelling function to just first or last ggplot label - r

I often find myself working with data with long-tail distributions, so that a huge amount of range in values happens in the top 1-2% of the data. When I plot the data, the upper outliers cause variation in the rest of the data to wash out, but I want to show those difference.
I know there are other ways of handling this, but I found that capping the values towards the end of the distribution and then applying a continuous color palette (i.e., in ggplot) is one way that works for me to represent the data. However, I want to ensure the legend stays accurate, by adding a >= sign to the last legend label
The picture below shows the of legend I'd like to achieve programmatically, with the >= sign drawn in messily in red.
I also know I can manually set breaks and labels, but I'd really like to just do something like, if(it's the last label) ~paste0(">=",label) else label) (to show with pseudo code)
Reproducible example:
(I want to alter the plot legend to prefix just the last label)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- tibble(x = x
,y = y
,z = z)
d %>%
ggplot(aes(x = x
,y = y
,fill = z
,color = z)) +
geom_point() +
scale_color_viridis_c()

One option would be to pass a function to the labels argument which replaces the last element or label with your desired label like so:
library(ggplot2)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- data.frame(
x = x,
y = y,
z = z
)
ggplot(d, aes(
x = x,
y = y,
fill = z,
color = z
)) +
geom_point() +
scale_fill_continuous(labels = function(x) {
x[length(x)] <- paste0(">=", x[length(x)])
x
}, aesthetics = c("color", "fill"))

Related

ggplot cowplot ensuring y axes are identical when arranging plots with log scale

I want to create a combination plot using plot_grid from the cowplot package.
The two plots that I want to combine use a log scale. Of the data plotted, some is negative, which gets dropped.
I can quite easily produce a decent result using facet_wrap that looks like this:
library(tidyverse)
tibble(x = rnorm(100),
y = rnorm(100),
type = "A") %>%
bind_rows(tibble(x = rnorm(100, mean = 10),
y = rnorm(100, mean = 10),
type = "B")) %>%
ggplot(aes(y = y, x = x)) +
geom_point() +
facet_wrap(~type)
But in my particular situation, I can't use facet_wrap because I want to give the panels A and B different x-axis labels and want to change the number format slightly (e.g. adding a $ sign to the axis ticks of panel A and a % sign to panel B).
Therefore I use plot_grid:
tibble(x = rnorm(100),
y = rnorm(100),
type = "A") %>%
ggplot(aes(y = y, x = x)) +
geom_point() +
scale_y_log10() -> a
tibble(x = rnorm(100, mean = 10),
y = rnorm(100, mean = 10),
type = "B") %>%
ggplot(aes(y = y, x = x)) +
geom_point() +
scale_y_log10() -> b
cowplot::plot_grid(a,b)
Now the problem is that the axis is completely distorted (this would be equal to scales = "free_y" in facet_wrap)
So therefore I attempt to set the limits/ranges for both plots manually by choosing the min and max from both plots:
lims <- c(min(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range),
max(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range))
cowplot::plot_grid(a + ylim(lims),b + ylim(lims))
But now the result is this:
So essentially I want to replicate the scales="fixed" in facet_wrap using plot_grid
Any ideas?
many thanks!
The issue is that you provide y axis limits in log10 scale as returned by layer_scales. You need to convert it to actual values.
lims = 10^c(min(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range),
max(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range))
Alternatively, you can compute the range of the actual data.

Is there a programatic way to pass specific ranges for the y-axis on a ggplot2 plot?

I've got plots that are being generated automatically based on some user inputs. Most of the time, the plots work fine. However, some users have requested to ensure that there is always an axis label on each end of the plotted data. For example, this plot:
sample_data <-
data.frame(
x = rep(LETTERS[1:3], each = 3)
, y = 1:9 + 0.5
)
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
)
Has no label above the top point or below the bottom point. I can add them easily enough with expand_limits:
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
expand_limits(y = c(2, 10))
However, because these plots are being automatically generated, I cannot manually add the next axis point each time. I've tried passing only.loose = TRUE to labeling:extended, but that still doesn't change the displayed values (any more than entering the values that I want would):
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(breaks = breaks_extended(only.loose = TRUE))
In addition, some of the plots are more complex than this (e.g., with or without confidence intervals, additional grouping, etc.), and the data is prepared for the plot using dplyr and piped directly into ggplot (with %>%). So, even something like recalculating the values is non-trivial.
In fact, even in this case, it fails because adding the expanded points to capture the next set of labels changes the labeling.
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(breaks = breaks_extended(n = 5
, only.loose = TRUE)) +
expand_limits(y =
sample_data %>%
group_by(x) %>%
summarise(my_mean = mean(y)) %>%
pull(my_mean) %>%
range() %>%
{labeling::extended(.[1], .[2], 5
, only.loose = TRUE)}
)
It appears that this happens because
labeling::extended(2.5, 8.5, 5, only.loose = TRUE)
returns the range 2 to 9 by 1's, while:
labeling::extended(2, 9, 5, only.loose = TRUE)
returns the range 2 to 10 by 2's. Somehow, breaks_extended is throwing in some added variation, though whether I track it down or not doesn't change much. I could work around this by calculating the breaks first, but (again) this is for a fairly complicated set of plots.
I feel like I am missing some sort of obvious point, but it keeps eluding me.
Yes there is a programmatic way to set the limits on y-scales and that is to provide a function to the limits argument. It is given the natural data limits as input that you can then edit programmatically. The same goes for breaks, except the input are the limits.
Example below, how this code should look exactly is up to your specifications.
library(ggplot2)
sample_data <- data.frame(
x = rep(LETTERS[1:3], each = 3),
y = 1:9 + 0.5
)
ggplot(sample_data,
aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(
limits = function(x) {
lower <- floor(x[1])
lower <- ifelse(x[1] - lower < 0.5, lower - 1, lower)
upper <- ceiling(x[2])
upper <- ifelse(upper - x[2] <= 0.5, upper + 1, upper)
c(lower, upper)
},
breaks = function(x) {
scales::breaks_pretty()(x)
}
)
#> Warning: Removed 3 rows containing missing values (geom_segment).
Created on 2021-03-23 by the reprex package (v1.0.0)
Inspired by teunbrand, I built a function that generates the limits, then checks to ensure that the expansion (including the 5% buffer) does not change the output of pretty
my_lims_expand <- function(x){
prev_pass <-
range(pretty(x))
curr_pass <-
pretty(c(prev_pass[1] - 0.05 * diff(prev_pass)
, prev_pass[2] + 0.05 * diff(prev_pass)))
last_under <-
tail(which(curr_pass < min(x)), 1)
first_over <-
head(which(curr_pass > max(x)), 1)
out <-
range(curr_pass[last_under:first_over])
confirm_out <-
range(pretty(out))
while(!all(out == confirm_out)){
prev_pass <- curr_pass
curr_pass <-
pretty(c(prev_pass[1] - 0.05 * diff(prev_pass)
, prev_pass[2] + 0.05 * diff(prev_pass)))
last_under <-
tail(which(curr_pass < min(x)), 1)
first_over <-
head(which(curr_pass > max(x)), 1)
out <-
range(curr_pass[last_under:first_over])
confirm_out <-
range(pretty(out))
}
return(out)
}
Then, I can use that function for limits:
ggplot(sample_data,
aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(
limits = my_lims_expand
, breaks = pretty
)
to generate the desired plot:

How to make scatter plot points into numbers?

I am creating a scatter plot using ggplot/geom_point. Here is my code for building the function in ggplot.
AddPoints <- function(x) {
list(geom_point(data = dat , mapping = aes(x = x, y = y) , shape = 1 , size = 1.5 ,
color = "blue"))
}
I am wondering if it would be possible to replace the standard points on the plot with numbers. That is, instead of seeing a dot on the plot, you would see a number on the plot to represent each observation. I would like that number to correspond to a column for that given observation (column name 'RP'). Thanks in advance.
Sample data.
Data <- data.frame(
X = sample(1:10),
Y = sample(3:12),
RP = sample(c(4,8,9,12,3,1,1,2,7,7)))
Use geom_text() and map the rp variable to the label argument.
ggplot(Data, aes(x = X, y = Y, label = RP)) +
geom_text()

How plot new point in ggplot with older color data?

I know similar questions asked before but my question is different. Consider data points data1 that have colors with respect to x and y coordinates and I plot it with ggplot
x = 1:100
y = 1:100
d = expand.grid(x,y)
data1 <- data.frame(
xval = d$Var1,
yval = d$Var2,
col = d$Var1+d$Var2)
data2 <- data.frame(
xnew = c(1.5, 90.5),
ynew = c(95.5, 4))
ggplot(data1, aes(xval, yval, colour = col)) + geom_point()
But I want the last line don't plot anything and I want plot data2 points with respect to colors of data1. for example I paint what I want to plot for data2 :
I changed the last line to:
ggplot(data1, aes(xval, yval, colour = col)) +
geom_point(data = data2, aes(x = xnew, y = ynew))
Now I expect that ggplot draw just 2 points of data2, but I have an Error:
Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
Error: Column colour must be a 1d atomic vector or a list
The problem is, that there is no mapping between col out of data1 and your data2.
Please try the following:
ggplot(data2, aes(x = xnew, y = ynew, colour = xnew)) + geom_point() +
scale_fill_gradientn(colours=c(2,1),
values = range(data1$xval),
rescaler = function(x,...) x,
oob = identity)

How to get total number of x displayed in ggplot?

Consider:
x <- rnorm(100)
qplot(x)
How to I get the total number (N = 100) of x displayed on the top right corner in my ggplot?
See actual output:
See this example (N = 37):
You can also set the location of the label programmatically, based on the data values. ggplot2 defaults to 30 bins, so the code below uses 30 bins to set the y-value for the label location:
set.seed(101)
x <- rnorm(100)
qplot(x) +
annotate("text", label=paste0("N = ", length(x)), x=max(x), y=max(table(cut(x, 30))))
or
qplot(x) +
geom_text(aes(label=paste0("N = ", length(x)), x=max(x), y=max(table(cut(x, 30)))))
UPDATE: To address your comment, let's plot with a discrete x vector. Now if we still want the y position of the text to be at the maximum, we once again find the category with the maximum number of counts. The data are already discrete, so we just need y=max(table(x)). For the x position, if we want the label at the maximum x value, we need the number of unique x categories, since ggplot implicitly numbers these from 1 to the N (where N is the number of categories). The unique function returns a vector containing each unique category. We just need the length of this vector to get the maximum x value in the graph: x=length(unique(x)).
set.seed(101)
x <- cut(rnorm(100), 5)
qplot(x) +
geom_text(aes(label=paste0("N = ", length(x)), x=length(unique(x)), y=max(table(x))))
Lots of ways. geom_text is the most general tool. For a one-off label, maybe annotate:
qplot(x) +
annotate("text",x = Inf,y = Inf,label = "N = 100",hjust = 1.5,vjust = 1.5)
The other answers show how you can add the text to your plot. But annotate() can also be used to add other geoms. If you want to put your annotation inside a rectangle, for instance, you can do the following:
x0 <- max(x)
y0 <- max(table(cut(x, 30)))
qplot(x) +
annotate("rect", xmin = x0*.8, xmax = x0*1.2, ymin = y0*.95, ymax = y0*1.05,
fill = "white", colour = "black") +
annotate("text", label = paste0("N = ", length(x)), x = x0, y = y0)
which gives
Up to the line that starts with annotate("rect", everything is taken from the other answers to this question.
Like this? (code below)
# install.packages("ggplot2", dependencies = TRUE)
library(ggplot2)
set.seed(421)
x <- rnorm(100)
qplot(x) + annotate("text", x = 2, y = 15, label = paste("N =", length(x)))

Resources