I am working on percentage changes between periods and struggling with logaritmic transformation of labels. Here is an example based on the storms dataset:
library(dplyr)
library(ggplot2)
library(scales)
df <- storms |>
group_by(year) |>
summarise(wind = mean(wind)) |>
mutate(lag = lag(wind, n = 1)) |>
mutate(perc = (wind / lag) - 1) |>
tidyr::drop_na()
I want to visualize the distribution of percentages, making the percentage change symmetrical (log difference) with log1p.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
x-axis with log1p values
At this point I wanted to transform the x-axis label back to the original percentage value.
I tried to create my own transformation with trans_new, and applied it to the labels in scale_x_continuous, but I can't make it work.
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p_trans(),
inverse = function(x)
expm1(x),
breaks = breaks_log(),
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(labels = trans_perc)
Currently, the result is:
Error in get_labels():
! breaks and labels are different lengths
Run rlang::last_error() to see where the error occurred.
Thanks!
EDIT
I am adding details on the different output I am getting from Alan's first answer:
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p,
inverse = expm1,
breaks = pretty_breaks(5),
format = percent_format(),
domain = c(-Inf, Inf)
)
library(ggpubr)
a <- ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
b <- ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
c <- ggplot(df, aes(x = perc)) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
ggarrange(a, b, c,
ncol = 3,
labels = c("Log on Value only",
"Log on Value and X",
"Log on X only"))
[different outcomes]:(https://i.stack.imgur.com/dCW2m.png
If I understand you correctly, you want to keep the shape of the histogram, but change the labels so that they reflect the value of the perc column rather the transformed log1p(perc) value. If that is the case, there is no need for a transformer object. You can simply put the reverse transformation (plus formatting) as a function into the labels argument of scale_x_continuous.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous("Percentage Change",
breaks = log1p(pretty(df$perc, 5)),
labels = ~ percent(expm1(.x)))
Note that although the histogram remains symmetrical in shape, the axis labels represent the back-transformed values of the original axis labels.
The point of a transformer object is to do all this for you without having to pass a transformed data set (i.e. without having to pass log1p(perc)). So in your case, you could do:
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p,
inverse = expm1,
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = perc)) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
Which gives essentially the same result
Related
I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)
I would like to add counts to a ggplot that uses stat_summary().
I am having an issue with the requirement that the text vector be the same length as the data.
With the examples below, you can see that what is being plotted is the same label multiple times.
The workaround to set the location on the y axis has the effect that multiple labels are stacked up. The visual effect is a bit strange (particularly when you have thousands of observations) and not sufficiently professional for my purposes. You will have to trust me on this one - the attached picture doesn't fully convey the weirdness of it.
I was wondering if someone else has worked out another way. It is for a plot in shiny that has dynamic input, so text cannot be overlaid in a hardcoded fashion.
I'm pretty sure ggplot wasn't designed for the kind of behaviour with stat_summary that I am looking for, and I may have to abandon stat_summary and create a new summary dataframe, but thought I would first check if someone else has some wizardry to offer up.
This is the plot without setting the y location:
library(dplyr)
library(ggplot2)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
df_x <- df_x %>%
group_by(Group) %>%
mutate(w_count = n())
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
geom_text(aes(label = w_count)) +
coord_flip() +
theme_classic()
and this is with my hack
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
geom_text(aes(y = 1, label = w_count)) +
coord_flip() +
theme_classic()
Create a df_text that has the grouped info for your labels. Then use annotate:
library(dplyr)
library(ggplot2)
set.seed(123)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
df_text <- df_x %>%
group_by(Group) %>%
summarise(avg = mean(Value),
n = n()) %>%
ungroup()
yoff <- 0.0
xoff <- -0.1
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
annotate("text",
x = 1:2 + xoff,
y = df_text$avg + yoff,
label = df_text$n) +
coord_flip() +
theme_classic()
I found another way which is a little more robust for when the plot is dynamic in its ordering and filtering, and works well for faceting. More robust, because it uses stat_summary for the text.
library(dplyr)
library(ggplot2)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
counts_df <- function(y) {
return( data.frame( y = 1, label = paste0('n=', length(y)) ) )
}
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
coord_flip() +
theme_classic()
p + stat_summary(geom="text", fun.data=counts_df)
I'm working with stock prices and trying to plot the price difference.
I created one using autoplot.zoo(), my question is, how can I manage to change the point shapes to triangles when they are above the upper threshold and to circles when they are below the lower threshold. I understand that when using the basic plot() function you can do these by calling the points() function, wondering how I can do this but with ggplot2.
Here is the code for the plot:
p<-autoplot.zoo(data, geom = "line")+
geom_hline(yintercept = threshold, color="red")+
geom_hline(yintercept = -threshold, color="red")+
ggtitle("AAPL vs. SPY out of sample")
p+geom_point()
We can't fully replicate without your data, but here's an attempt with some sample generated data that should be similar enough that you can adapt for your purposes.
# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
You can create an additional variable that determines the shape, based on the relationship in the data itself, and pass that as an argument into ggplot.
# Create conditional data
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
data$outlier[is.na(data$outlier)] <- "In Range"
library(ggplot2)
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16,15))
# If you want points just above and below# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
thresh <- 4
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
Alternatively, you can just add the points above and below the threshold as individual layers with manually specified shapes, like this. The pch argument points to shape type.
# Another way of doing this
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
ggplot(data, aes(x = date, y = spread, group = 1)) +
geom_line() +
geom_point(data = data[data$spread>thresh,], pch = 17) +
geom_point(data = data[data$spread< (-thresh),], pch = 16) +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
I am trying to create a custom histogram with a rug plot showing the original values on the X axis.
I am going to use the mtcars dataset to illustrate. Its not be best dataset for this question...but hopefully the reader will understand what I am trying to achieve...
Below shows the basic histogram, without any rug plot attempt.
I want to create the histogram using geom_bar as this allows for more flexibility with custom bins.
I also want a small gap between the histgram bars (i.e width = 0.95) .... which adds to this
problem's complexity.
library(dplyr)
library(ggplot2)
# create custom bins
vct_seq <- c(seq(from = 10, to = 25, by = 5), 34)
mtcars$bin <- cut(mtcars$mpg, breaks = vct_seq)
# create data.frame for the ggplot graph..using bins above
df_mtcars_count <- mtcars %>% group_by(bin) %>% summarise(count = n())
# indicative labels
vct_labels <- c("bin 1", "bin 2", "bin 3", "bin 4")
# attempt 1 - basic plot -- no rug plot
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p
Next, try and add a basic rug plot on the X axis. This obviously doesn't work as the geom_bar and geom_rug have completely different scales.
# attempt 2 with no scaling.... doesn't work as x scale for ordinal (bins) and
# x scale for continuous (mpg) do not match
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg), inherit.aes = F, alpha = 0.3)
p
Now, try and rescale the mpg column to match with the ordinal scale....
First define a linear mapping function...
fn_linear_map <- function(vct_existing_val, vct_new_range) {
# example....converts 1:20 into the range 1 to 10 like this:
# fn_linear_map(1:20, c(1, 10))
fn_r_diff <- function(x) x %>% range() %>% diff()
flt_ratio <- fn_r_diff(vct_new_range) / fn_r_diff(vct_existing_val)
vct_old_min_offset <- vct_existing_val - min(vct_existing_val)
vct_new_range_val <- (vct_old_min_offset * flt_ratio) + min(vct_new_range)
return(vct_new_range_val)
}
Now apply the function...we try and map mpg to the range 1 to 4 (which is an attempt to match
the ordinal scale)
mtcars$mpg_remap <- fn_linear_map(mtcars$mpg, c(1, 4))
Try the plot again.... getting closer ... but not really accurate...
# attempt 3: getting closer but doesn't really match the ordinal scale
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg_remap), inherit.aes = F, alpha = 0.3)
p
The graph above is getting close to what I want....but rug plot does not line up
with the actual data ... example the max observation (33.9) should be displayed
almost aligning with the right hand side of the bar.. see below:
mtcars %>% filter(bin == "(25,34]") %>% arrange(mpg) %>% dplyr::select(mpg, mpg_remap)
Your scale makes no sense to me, as you are showing a bin that is twice as wide using the same bar width. Doing that in combination with a rug strikes me as confusing as best and misleading at worst. I suggest you plot the bars with their correct widths, after which the rug is trivial.
I think the best solution is to just use geom_histogram:
ggplot(mtcars, aes(mpg)) +
geom_histogram(breaks = vct_seq, col = 'grey80') +
geom_rug(aes(mpg, y = NULL))
If you really want the gaps between the bars you'll have to do more work:
library(tidyr)
d <- mtcars %>%
count(bin) %>%
separate(bin, c('min', 'max'), sep = ',', remove = FALSE) %>%
mutate_at(vars('min', 'max'), readr::parse_number) %>%
mutate(
middle = min + (max - min) / 2,
width = 0.9 * (max - min)
)
ggplot(d, aes(middle, n)) +
geom_col(width = d$width) +
geom_rug(aes(mpg, y = NULL), mtcars)
I currently have a plot and have used facet_zoom to focus on records between 0 and 10 in the x axis. The following code reproduces an example:
require(ggplot2)
require(ggforce)
require(dplyr)
x <- rnorm(10000, 50, 25)
y <- rexp(10000)
data <- data.frame(x, y)
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10))
I want to change the breaks on the zoomed portion of the graph to be the equivalent of:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10)) +
scale_x_continuous(breaks = seq(0,10,2))
But this changes the breaks of the original plot as well. Is it possible to just change the breaks of the zoomed portion whilst leaving the original plot as default?
This works for your use case:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10)) +
scale_x_continuous(breaks = pretty)
From ?scale_x_continuous, breaks would accept the following (emphasis added):
One of:
NULL for no breaks
waiver() for the default breaks computed by the transformation object
A numeric vector of positions
A function that takes the limits as input and returns breaks as output
pretty() is one such function. It doesn't offer very fine control, but does allow you to have some leeway to specify breaks across different facets with very different scales.
For illustration, here are two examples with different desired number of breaks. See ?pretty for more details on the other arguments this function accepts.
p <- ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10))
cowplot::plot_grid(
p + scale_x_continuous(breaks = function(x) pretty(x, n = 3)),
p + scale_x_continuous(breaks = function(x) pretty(x, n = 10)),
labels = c("n = 3", "n = 10"),
nrow = 1
)
Of course, you can also define your own function to convert plot limits into desired breaks, (e.g. something like p + scale_x_continuous(breaks = function(x) seq(min(x), max(x), length.out = 5))), but I generally find these functions require more tweaking to get right, & pretty() is often good enough.