Dynamic decimal precision in ggplot axis labels - r

I would like to display only the minimum number of decimal places in my ggplot axis labels (e.g. 0.1 with 10 as opposed to 10.0. I was trying to get scales::label_number() to do this, but the accuracy argument is applied across all of the labels. Also I'd like to be able to add big.mark = "," if possible.
The closest answer I found suggests an ifelse function to dynamically round as needed but it feels a little clunky. Is there some slick way to do this with {scales} or similar?
Minimal example with current and desired results:
library(tidyverse)
library(scales)
# current labels with scales::label_number()
tibble(x = -2:3, y = 10^x) %>%
ggplot(aes(x, y)) +
geom_point() +
ggtitle("Undesriable: same precision on all labels") +
scale_y_log10(labels = label_number(big.mark = ","), breaks = 10^(-2:3))
# desired labels manually specified
tibble(x = -2:3, y = 10^x) %>%
ggplot(aes(x, y)) +
geom_point() +
ggtitle("Desriable: minimum needed precision on each label with comma") +
scale_y_log10(labels = c(0.01, 0.1, 1, 10, 100, "1,000"), breaks = 10^(-2:3))
Created on 2022-06-23 by the reprex package (v2.0.1)

Using I seems pretty neat to me:
tibble(x = -2:3, y = 10^x) %>%
ggplot(aes(x, y)) +
geom_point() +
scale_y_log10(labels = I, breaks = 10^(-2:3))
Though if you wanted a bit more control, then you could use prettyNum - e.g.
tibble(x = -2:3, y = 10^x) %>%
ggplot(aes(x, y)) +
geom_point() +
scale_y_log10(labels = ~ prettyNum(.x, big.mark = ","), breaks = 10^(-2:3))

Related

Define custom transformation of ggplot axis labels with trans_new function

I am working on percentage changes between periods and struggling with logaritmic transformation of labels. Here is an example based on the storms dataset:
library(dplyr)
library(ggplot2)
library(scales)
df <- storms |>
group_by(year) |>
summarise(wind = mean(wind)) |>
mutate(lag = lag(wind, n = 1)) |>
mutate(perc = (wind / lag) - 1) |>
tidyr::drop_na()
I want to visualize the distribution of percentages, making the percentage change symmetrical (log difference) with log1p.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
x-axis with log1p values
At this point I wanted to transform the x-axis label back to the original percentage value.
I tried to create my own transformation with trans_new, and applied it to the labels in scale_x_continuous, but I can't make it work.
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p_trans(),
inverse = function(x)
expm1(x),
breaks = breaks_log(),
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(labels = trans_perc)
Currently, the result is:
Error in get_labels():
! breaks and labels are different lengths
Run rlang::last_error() to see where the error occurred.
Thanks!
EDIT
I am adding details on the different output I am getting from Alan's first answer:
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p,
inverse = expm1,
breaks = pretty_breaks(5),
format = percent_format(),
domain = c(-Inf, Inf)
)
library(ggpubr)
a <- ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
b <- ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
c <- ggplot(df, aes(x = perc)) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
ggarrange(a, b, c,
ncol = 3,
labels = c("Log on Value only",
"Log on Value and X",
"Log on X only"))
[different outcomes]:(https://i.stack.imgur.com/dCW2m.png
If I understand you correctly, you want to keep the shape of the histogram, but change the labels so that they reflect the value of the perc column rather the transformed log1p(perc) value. If that is the case, there is no need for a transformer object. You can simply put the reverse transformation (plus formatting) as a function into the labels argument of scale_x_continuous.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous("Percentage Change",
breaks = log1p(pretty(df$perc, 5)),
labels = ~ percent(expm1(.x)))
Note that although the histogram remains symmetrical in shape, the axis labels represent the back-transformed values of the original axis labels.
The point of a transformer object is to do all this for you without having to pass a transformed data set (i.e. without having to pass log1p(perc)). So in your case, you could do:
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p,
inverse = expm1,
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = perc)) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
Which gives essentially the same result

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

inner labelling for heatmap, in R ggplot

I am trying to add a number label on each cell of a heatmap. Because it also needs marginal barcharts I have tried two packages. iheatmapr and ComplexHeatmap.
(1st try) iheatmapr makes it easy to add to add bars as below, but I couldnt see how to add labels inside the heatmap on individual cells.
library(tidyverse)
library(iheatmapr)
library(RColorBrewer)
in_out <- data.frame(
'Economic' = c(2,1,1,3,4),
'Education' = c(0,3,0,1,1),
'Health' = c(1,0,1,2,0),
'Social' = c(2,5,0,3,1) )
rownames(in_out) <- c('Habitat', 'Resource', 'Combined', 'Protected', 'Livelihood')
GreenLong <- colorRampPalette(brewer.pal(9, 'Greens'))(12)
lowGreens <- GreenLong[0:5]
in_out_matrix <- as.matrix(in_out)
main_heatmap(in_out_matrix, colors = lowGreens)
in_out_plot <- iheatmap(in_out_matrix,
colors=lowGreens) %>%
add_col_labels() %>%
add_row_labels() %>%
add_col_barplot(y = colSums(bcio)/total) %>%
add_row_barplot(x = rowSums(bcio)/total)
in_out_plot
Then used: save_iheatmap(in_out_plot, "iheatmapr_test.png")
Because I couldnt use ggsave(device = ragg::agg_png etc) with iheatmapr object.
Also, the iheatmapr object's apparent incompatibility (maybe I am wrong) with ggsave() is a problem for me because I normally use ragg package to export image AGG to preserve font sizes. I am suspecting some other heatmap packages make custom objects that maybe incompatible with patchwork and ggsave.
ggsave("png/iheatmapr_test.png", plot = in_out_plot,
device = ragg::agg_png, dpi = 72,
units="in", width=3.453, height=2.5,
scaling = 0.45)
(2nd try) ComplexHeatmap makes it easy to label individual number "cells" inside a heatmap, and also offers marginal bars among its "Annotations", and I have tried it, but its colour palette system (which uses integers to refer to a set of colours) doesnt suit my RGB vector colour gradient, and overall it is a sophisticated package clearly designed to make graphics more advanced than what I am doing.
I am aiming for style as shown in screenshot example below, which was made in Excel.
Please can anyone suggest a more suitable R package for a simple heatmap like this with marginal bars, and number labels inside?
Instead of relying on packages which offer out-of-the-box solutions one option to achieve your desired result would be to create your plot from scratch using ggplot2 and patchwork which gives you much more control to style your plot, to add labels and so on.
Note: The issue with iheatmapr is that it returns a plotly object, not a ggplot. That's why you can't use ggsave.
library(tidyverse)
library(patchwork)
in_out <- data.frame(
'Economic' = c(1,1,1,5,4),
'Education' = c(0,0,0,1,1),
'Health' = c(1,0,1,0,0),
'Social' = c(1,1,0,3,1) )
rownames(in_out) <- c('Habitat', 'Resource', 'Combined', 'Protected', 'Livelihood')
in_out_long <- in_out %>%
mutate(y = rownames(.)) %>%
pivot_longer(-y, names_to = "x")
# Summarise data for marginal plots
yin <- in_out_long %>%
group_by(y) %>%
summarise(value = sum(value)) %>%
mutate(value = value / sum(value))
xin <- in_out_long %>%
group_by(x) %>%
summarise(value = sum(value)) %>%
mutate(value = value / sum(value))
# Heatmap
ph <- ggplot(in_out_long, aes(x, y, fill = value)) +
geom_tile() +
geom_text(aes(label = value), size = 8 / .pt) +
scale_fill_gradient(low = "#F7FCF5", high = "#00441B") +
theme(legend.position = "bottom") +
labs(x = NULL, y = NULL, fill = NULL)
# Marginal plots
py <- ggplot(yin, aes(value, y)) +
geom_col(width = .75) +
geom_text(aes(label = scales::percent(value)), hjust = -.1, size = 8 / .pt) +
scale_x_continuous(expand = expansion(mult = c(.0, .25))) +
theme_void()
px <- ggplot(xin, aes(x, value)) +
geom_col(width = .75) +
geom_text(aes(label = scales::percent(value)), vjust = -.5, size = 8 / .pt) +
scale_y_continuous(expand = expansion(mult = c(.0, .25))) +
theme_void()
# Glue plots together
px + plot_spacer() + ph + py + plot_layout(ncol = 2, widths = c(2, 1), heights = c(1, 2))

Is it possible to align x axis title to a value of the axis?

Having a tibble and a simple scatterplot:
p <- tibble(
x = rnorm(50, 1),
y = rnorm(50, 10)
)
ggplot(p, aes(x, y)) + geom_point()
I get something like this:
I would like to align (center, left, right, as the case may be) the title of the x-axis - here rather blandly x - with a specific value on the axis, say the off-center 0 in this case. Is there a way to do that declaratively, without having to resort to the dumb (as in "free of context") trial-and-error element_text(hjust=??). The ?? are rather appropriate here because every value is a result of experimentation (my screen and PDF export in RStudio never agree on quite some plot elements). Any change in the data or the dimensions of the rendering may (or may not) invalidate the hjust value and I am looking for a solution that graciously repositions itself, much like the axes do.
Following the suggestions in the comments by #tjebo I dug a little deeper into the coordinate spaces. hjust = 0.0 and hjust = 1.0 clearly align the label with the Cartesian coordinate system extent (but magically left-aligned and right-aligned, respectively) so when I set specific limits, calculation of the exact value of hjust is straightforward (aiming for 0 and hjust = (0 - -1.5) / (3.5 - -1.5) = 0.3):
ggplot(p, aes(x, y)) +
geom_point() +
coord_cartesian(ylim = c(8, 12.5), xlim = c(-1.5, 3.5), expand=FALSE) +
theme(axis.title.x = element_text(hjust = 0.3))
This gives an acceptable result for a label like x, but for longer labels the alignment is off again:
ggplot(p %>% mutate(`Longer X label` = x), aes(x = `Longer X label`, y = y)) +
geom_point() +
coord_cartesian(ylim = c(8, 12.5), xlim = c(-1.5, 3.5), expand=FALSE) +
theme(axis.title.x = element_text(hjust = 0.3))
Any further suggestions much appreciated.
Another option (different enough hopefully to justify the second answer) is as already mentioned to create the annotation as a separate plot. This removes the range problem. I like {patchwork} for this.
library(tidyverse)
library(patchwork)
p <- tibble( x = rnorm(50, 1), y = rnorm(50, 10))
p1 <- tibble( x = rnorm(50, 1), y = 100*rnorm(50, 10))
## I like to define constants outside my ggplot call
mylab <- "longer_label"
x_demo <- c(-1, 2)
demo_fct <- function(p){
p1 <- ggplot(p, aes(x, y)) +
geom_point() +
labs(x = NULL) +
theme(plot.margin = margin())
p2 <- ggplot(p, aes(x, y)) +
## you need that for your correct alignment with the first plot
geom_blank() +
annotate(geom = "text", x = x_demo, y = 1,
label = mylab, hjust = 0) +
theme_void() +
# you need that for those annoying margin reasons
coord_cartesian(clip = "off")
p1 / p2 + plot_layout(heights = c(1, .05))
}
demo_fct(p) + plot_annotation(title = "demo1 with x at -1 and 2")
demo_fct(p1) + plot_annotation(title = "demo2 with larger data range")
Created on 2021-12-04 by the reprex package (v2.0.1)
I still think you will fair better and easier with custom annotation. There are typically two ways to do that. Either direct labelling with a text layer (for single labels I prefer annotate(geom = "text"), or you create a separate plot and stitch both together, e.g. with patchwork.
The biggest challenge is the positioning in y dimension. For this I typically take a semi-automatic approach where I only need to define one constant, and set the coordinates relative to the data range, so changes in range should in theory not matter much. (they still do a bit, because the panel dimensions also change). Below showing examples of exact label positioning for two different data ranges (using the same constant for both)
library(tidyverse)
# I only need patchwork for demo purpose, it is not required for the answer
library(patchwork)
p <- tibble( x = rnorm(50, 1), y = rnorm(50, 10))
p1 <- tibble( x = rnorm(50, 1), y = 100*rnorm(50, 10))
## I like to define constants outside my ggplot call
y_fac <- .1
mylab <- "longer_label"
x_demo <- c(-1, 2)
demo_fct <- function(df, x) {map(x_demo,~{
## I like to define constants outside my ggplot call
ylims <- range(df$y)
ggplot(df, aes(x, y)) +
geom_point() +
## set hjust = 0 for full positioning control
annotate(geom = "text", x = ., y = min(ylims) - y_fac*mean(ylims),
label = mylab, hjust = 0) +
coord_cartesian(ylim = ylims, clip = "off") +
theme(plot.margin = margin(b = .5, unit = "in")) +
labs(x = NULL)
})
}
demo_fct(p, x_demo) %>% wrap_plots() + plot_annotation(title = "demo 1, label at x = -1 and x = 2")
demo_fct(p1, x_demo) %>% wrap_plots() + plot_annotation(title = "demo 2 - different data range")
Created on 2021-12-04 by the reprex package (v2.0.1)

Adding a single label per group in ggplot with stat_summary and text geoms

I would like to add counts to a ggplot that uses stat_summary().
I am having an issue with the requirement that the text vector be the same length as the data.
With the examples below, you can see that what is being plotted is the same label multiple times.
The workaround to set the location on the y axis has the effect that multiple labels are stacked up. The visual effect is a bit strange (particularly when you have thousands of observations) and not sufficiently professional for my purposes. You will have to trust me on this one - the attached picture doesn't fully convey the weirdness of it.
I was wondering if someone else has worked out another way. It is for a plot in shiny that has dynamic input, so text cannot be overlaid in a hardcoded fashion.
I'm pretty sure ggplot wasn't designed for the kind of behaviour with stat_summary that I am looking for, and I may have to abandon stat_summary and create a new summary dataframe, but thought I would first check if someone else has some wizardry to offer up.
This is the plot without setting the y location:
library(dplyr)
library(ggplot2)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
df_x <- df_x %>%
group_by(Group) %>%
mutate(w_count = n())
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
geom_text(aes(label = w_count)) +
coord_flip() +
theme_classic()
and this is with my hack
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
geom_text(aes(y = 1, label = w_count)) +
coord_flip() +
theme_classic()
Create a df_text that has the grouped info for your labels. Then use annotate:
library(dplyr)
library(ggplot2)
set.seed(123)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
df_text <- df_x %>%
group_by(Group) %>%
summarise(avg = mean(Value),
n = n()) %>%
ungroup()
yoff <- 0.0
xoff <- -0.1
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
annotate("text",
x = 1:2 + xoff,
y = df_text$avg + yoff,
label = df_text$n) +
coord_flip() +
theme_classic()
I found another way which is a little more robust for when the plot is dynamic in its ordering and filtering, and works well for faceting. More robust, because it uses stat_summary for the text.
library(dplyr)
library(ggplot2)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
counts_df <- function(y) {
return( data.frame( y = 1, label = paste0('n=', length(y)) ) )
}
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
coord_flip() +
theme_classic()
p + stat_summary(geom="text", fun.data=counts_df)

Resources