Extend x-axis with dates - r

I wish to use ggrepel to add labels to the ends of the lines of a ggplot. To do that, I need to make space for the labels. To do that, I use scale_x_continuous ot extend the x-axis. Not sure that's correct and am open to other strategies.
I can do it when the x_axis type is friendly numeric.
library("tidyverse")
library("ggrepel")
p <- tibble (
x = c(1991, 1999),
y = c(3, 5)
)
ggplot(p, aes(x, y)) + geom_line() + scale_x_continuous(limits = c(1991, 2020)) +
geom_text_repel(data = p[2,], aes(label = "Minimum Wage"), size = 4, nudge_x = 1, nudge_y = 0, colour = "gray50")
However, when I try something similar except the x-axis is of the evil date type, I get the error:
Error in as.Date.numeric(value) : 'origin' must be supplied
p <- tibble (
x = c(as.Date("1991-01-01"), as.Date("1999-01-01")),
y = c(2, 5)
)
range <- c(as.Date("1991-01-01"), as.Date("2020-01-01"))
ggplot(p, aes(x, y)) + geom_line() + scale_x_continuous(limits = range)
How can I get this to work with my arch nemesis, date?

Use scale_x_date instead of scale_x_continuous:
p <- tibble (
x = c(as.Date("1991-01-01"), as.Date("1999-01-01")),
y = c(2, 5)
)
range <- c(as.Date("1991-01-01"), as.Date("2020-01-01"))
ggplot(p, aes(x, y)) + geom_line() + scale_x_date(limits = range)

Note that scale_x_date() has an expand argument which allows exact control over where the x-axis starts and ends. You could try expand = c(0,0) to include only the dates specified in your limits = argument or expand = c(f, f) where f is the fraction of days relative to the entire time series record you should include in your plot beyond the range of dates specified via your limit = argument. For example, f could be 0.01.

Related

Define custom transformation of ggplot axis labels with trans_new function

I am working on percentage changes between periods and struggling with logaritmic transformation of labels. Here is an example based on the storms dataset:
library(dplyr)
library(ggplot2)
library(scales)
df <- storms |>
group_by(year) |>
summarise(wind = mean(wind)) |>
mutate(lag = lag(wind, n = 1)) |>
mutate(perc = (wind / lag) - 1) |>
tidyr::drop_na()
I want to visualize the distribution of percentages, making the percentage change symmetrical (log difference) with log1p.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
x-axis with log1p values
At this point I wanted to transform the x-axis label back to the original percentage value.
I tried to create my own transformation with trans_new, and applied it to the labels in scale_x_continuous, but I can't make it work.
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p_trans(),
inverse = function(x)
expm1(x),
breaks = breaks_log(),
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(labels = trans_perc)
Currently, the result is:
Error in get_labels():
! breaks and labels are different lengths
Run rlang::last_error() to see where the error occurred.
Thanks!
EDIT
I am adding details on the different output I am getting from Alan's first answer:
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p,
inverse = expm1,
breaks = pretty_breaks(5),
format = percent_format(),
domain = c(-Inf, Inf)
)
library(ggpubr)
a <- ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
b <- ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
c <- ggplot(df, aes(x = perc)) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
ggarrange(a, b, c,
ncol = 3,
labels = c("Log on Value only",
"Log on Value and X",
"Log on X only"))
[different outcomes]:(https://i.stack.imgur.com/dCW2m.png
If I understand you correctly, you want to keep the shape of the histogram, but change the labels so that they reflect the value of the perc column rather the transformed log1p(perc) value. If that is the case, there is no need for a transformer object. You can simply put the reverse transformation (plus formatting) as a function into the labels argument of scale_x_continuous.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous("Percentage Change",
breaks = log1p(pretty(df$perc, 5)),
labels = ~ percent(expm1(.x)))
Note that although the histogram remains symmetrical in shape, the axis labels represent the back-transformed values of the original axis labels.
The point of a transformer object is to do all this for you without having to pass a transformed data set (i.e. without having to pass log1p(perc)). So in your case, you could do:
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p,
inverse = expm1,
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = perc)) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
Which gives essentially the same result

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

ggplot is not graphing a vertical line

I am trying to plot a graph in ggplot2 where the x-axis represents month-day combinations, the dots represent y-values for two different groups.
When graphing my original data set using this code,
ggplot(graphing.df, aes(MONTHDAY, y.var, color = GROUP)) +
geom_point() +
ylab(paste0(""))+
scale_x_discrete(breaks = function(x) x[seq(1, length(x), by = 15)])+
theme(legend.text = element_blank(),
legend.title = element_blank()) +
geom_vline(xintercept = which(graphing.df$MONTHDAY == "12-27")[1], col='red', lwd=2)
I get this graph where the vertical line is not showing.
When I tried to create a reproducible example using the following code...
df <- data.frame(MONTHDAY = c("01-01", "01-01", "01-02", "01-02", "01-03", "01-03"),
TYPE = rep(c("A", "B"), 3),
VALUE = sample(1:10, 6, replace = TRUE))
verticle_line <- "01-02"
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
#geom_vline(xintercept = which(df$MONTHDAY == verticle_line)[1], col='red', lwd=2)+
geom_vline(xintercept = which(df$MONTHDAY == verticle_line), col='blue', lwd=2)
The vertical line is showing, but now its showing in the wrong place
In my original data set I have two values for each month-day combination (representing each of the two groups). The month-day combination column is a character vector, it is not a factor and does not have levels.
Here is a way. It subsets the data keeping only the rows of interest and plots the vertical line defined by MONTHDAY.
library(ggplot2)
verticle_line <- "01-02"
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(data = subset(df, MONTHDAY == verticle_line),
mapping = aes(xintercept = MONTHDAY), color = 'blue', size = 2)
Data
I will repost the data creation code, this time setting the RNG seed in order to make the example reproducible.
set.seed(2020)
df <- data.frame(MONTHDAY = c("01-01", "01-01", "01-02", "01-02", "01-03", "01-03"),
TYPE = rep(c("A", "B"), 3),
VALUE = sample(1:10, 6, replace = TRUE))
The reason your line is not showing up where you expect is because you are setting the value of xintercept= via the output of the which() function. which() returns the index value where the condition is true. So in the case of your reproducible example, you get the following:
> which(df$MONTHDAY == verticle_line)
[1] 3 4
It returns a vector indicating that in df$MONTHDAY, indexes 3 and 4 in that vector are true. So your code below:
geom_vline(xintercept = which(df$MONTHDAY == verticle_line)...
Reduces down to this:
geom_vline(xintercept = c(3,4)...
Your MONTHDAY axis is not formatted as a date, but treated as a discrete axis of character vectors. In this case xintercept=c(3,4) applied to a discrete axis draws two vertical lines at x intercepts equivalent to the 3rd and 4th discrete position on that axis: in other words, "01-03" and... some unknown 4th position that is not observable within the axis limits.
How do you fix this? Just take out which():
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(xintercept = verticle_line, col='blue', lwd=2)
We can get the corresponding values of 'MONTHDAY' after subsetting
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(xintercept = df$MONTHDAY[df$MONTHDAY == verticle_line],
col='blue', lwd=2)

facet_zoom can't change breaks of zoomed plot

I currently have a plot and have used facet_zoom to focus on records between 0 and 10 in the x axis. The following code reproduces an example:
require(ggplot2)
require(ggforce)
require(dplyr)
x <- rnorm(10000, 50, 25)
y <- rexp(10000)
data <- data.frame(x, y)
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10))
I want to change the breaks on the zoomed portion of the graph to be the equivalent of:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10)) +
scale_x_continuous(breaks = seq(0,10,2))
But this changes the breaks of the original plot as well. Is it possible to just change the breaks of the zoomed portion whilst leaving the original plot as default?
This works for your use case:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10)) +
scale_x_continuous(breaks = pretty)
From ?scale_x_continuous, breaks would accept the following (emphasis added):
One of:
NULL for no breaks
waiver() for the default breaks computed by the transformation object
A numeric vector of positions
A function that takes the limits as input and returns breaks as output
pretty() is one such function. It doesn't offer very fine control, but does allow you to have some leeway to specify breaks across different facets with very different scales.
For illustration, here are two examples with different desired number of breaks. See ?pretty for more details on the other arguments this function accepts.
p <- ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10))
cowplot::plot_grid(
p + scale_x_continuous(breaks = function(x) pretty(x, n = 3)),
p + scale_x_continuous(breaks = function(x) pretty(x, n = 10)),
labels = c("n = 3", "n = 10"),
nrow = 1
)
Of course, you can also define your own function to convert plot limits into desired breaks, (e.g. something like p + scale_x_continuous(breaks = function(x) seq(min(x), max(x), length.out = 5))), but I generally find these functions require more tweaking to get right, & pretty() is often good enough.

Formatting ggplot2 axis labels with commas (and K? MM?) if I already have a y-scale

I am trying to format Cost and Revenue (both in thousands) and Impressions (in millions) data for a ggplot graph's y-axis labels.
My plot runs from 31 days ago to 'yesterday' and uses the min and max values over that period for the ylim(c(min,max)) option. Showing just the Cost example,
library(ggplot2)
library(TTR)
set.seed(1984)
#make series
start <- as.Date('2016-01-01')
end <- Sys.Date()
days <- as.numeric(end - start)
#make cost and moving averages
cost <- rnorm(days, mean = 45400, sd = 11640)
date <- seq.Date(from = start, to = end - 1, by = 'day')
cost_7 <- SMA(cost, 7)
cost_30 <- SMA(cost, 30)
df <- data.frame(Date = date, Cost = cost, Cost_7 = cost_7, Cost_30 = cost_30)
# set parameters for window
left <- end - 31
right <- end - 1
# plot series
ggplot(df, aes(x = Date, y = Cost))+
geom_line(lwd = 0.5) +
geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) +
geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) +
xlim(c(left, right)) +
ylim(c(min(df$Cost[df$Date > left]), max(df$Cost[df$Date > left]))) +
xlab("")
I would a) like to represent thousands and millions on the y-axis with commas, and b) like those numbers abbreviated and with 'K' for thousands or 'MM' for millions. I realize b) may be a tall order, but for now a) cannot be accomplished with
ggplot(...) + ... + ylim(c(min, max)) + scale_y_continuous(labels = comma)
Because the following error is thrown:
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
I have tried putting the scale_y_continuous(labels = comma) section after the geom_line()layer (which throws the error above) or at the end of all the ggplot layers, which overrides my limits in the ylim call and then throws the error above, anyway.
Any ideas?
For the comma formatting, you need to include the scales library for label=comma. The "error" you discussed is actually just a warning, because you used both ylim and then scale_y_continuous. The second call overrides the first. You can instead set the limits and specify comma-separated labels in a single call to scale_y_continuous:
library(scales)
ggplot(df, aes(x = Date, y = Cost))+
geom_line(lwd = 0.5) +
geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) +
geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) +
xlim(c(left, right)) +
xlab("") +
scale_y_continuous(label=comma, limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left])))
Another option would be to melt your data to long format before plotting, which reduces the amount of code needed and streamlines aesthetic mappings:
library(reshape2)
ggplot(melt(df, id.var="Date"),
aes(x = Date, y = value, color=variable, linetype=variable))+
geom_line() +
xlim(c(left, right)) +
labs(x="", y="Cost") +
scale_y_continuous(label=comma, limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left])))
Either way, to put the y values in terms of thousands or millions you could divide the y values by 1,000 or 1,000,000. I've used dollar_format() below, but I think you'll also need to divide by the appropriate power of ten if you use unit_format (per #joran's suggestion). For example:
div=1000
ggplot(melt(df, id.var="Date"),
aes(x = Date, y = value/div, color=variable, linetype=variable))+
geom_line() +
xlim(c(left, right)) +
labs(x="", y="Cost (Thousands)") +
scale_y_continuous(label=dollar_format(),
limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left]))/div)
Use scale_color_manual and scale_linetype_manual to set custom colors and linetypes, if desired.
I just found the solution. It does not work with "label = comma". Please try this solution:
scale_y_continuous(labels = scales::comma)
It works well for me.
The unit_format() function highlighted by #joran has now been depreciated within the scales package and replaced with label_number(). It defaults to using a space as a separator, change this with the big.mark= argument. Use the prefix = and suffix = arguments to add characters before and after, and the scale = argument to multiple the numbers by a scaling factor (so in many cases you want a negative exponent here).
The problem that #konrad notes with a space between the number and the suffix no longer seems to exist. If you want a space, include it in the suffix argument suffix = " M".
So for example to show 1234000 as £1,234k on the the y axis scale_y_continuous(labels = label_number(prefix = "£", suffix = "k", scale = 1e-3, big.mark = ","))
As comma separators are so commonly used there is a convenience function label_comma which sets big.mark = ",". Or, for even less typing, the comma() function is exactly the same.
One gotcha is that the scales package is not loaded as a dependency with library(ggplot), you have to load it separately, or as #Aurora points out in their answer, by prefixing the function with scales::
https://scales.r-lib.org/reference/label_number.html

Resources