ggplot2 annotation with dates in x-axis - r

I'm trying to draw an area plot with a series of % value, one for each day during a set period. I would like to add a segment to the top of the plot to show more clearly the areas where the % is decreasing.
I tried to use this code (the example has just a few data points for simplicity):
library(ggplot2)
library(scales)
limit = c(0.85,0.87,0.88,0.90,0.72,0.74)
day <- as.Date(strptime((seq(20150201,20150206,1)),format = "%Y%m%d"))
dati = data.frame("Day" = day, "Limit" = limit)
g <- ggplot(data = dati, aes(Day, Limit))
g <- g + geom_area(fill = "dark red")
g <- g + coord_cartesian(ylim = c(0,1))
g <- g + scale_y_continuous(labels=percent)
g <- g + annotate("segment", y= 1, yend = 1, x = dati[3, "Day"], xend = dati[4, "Day"])
print(g)
But I get this error: Error: / not defined for "Date" objects
Any ideas on how to solve this?
I already checked How to use ggplot2's annotate with dates in x-axis?, but it appears the bug is back. Plus I'd like to do this without using the lubridate package.

Wrapping my earlier comment into an answer: use geom_segment instead.
+ geom_segment(y = 1, yend = 1,
x = as.numeric(dati[3, "Day"]), xend = as.numeric(dati[4, "Day"]))

Related

breaks at integer powers of ten on ggplot2 log10 axes

Transforming ggplot2 axes to log10 using scales::trans_breaks() can sometimes (if the range is small enough) produce un-pretty breaks, at non-integer powers of ten.
Is there a general purpose way of setting these breaks to occur only at 10^x, where x are all integers, and, ideally, consecutive (e.g. 10^1, 10^2, 10^3)?
Here's an example of what I mean.
library(ggplot2)
# dummy data
df <- data.frame(fct = rep(c("A", "B", "C"), each = 3),
x = rep(1:3, 3),
y = 10^seq(from = -4, to = 1, length.out = 9))
p <- ggplot(df, aes(x, y)) +
geom_point() +
facet_wrap(~ fct, scales = "free_y") # faceted to try and emphasise that it's general purpose, rather than specific to a particular axis range
The unwanted result -- y-axis breaks are at non-integer powers of ten (e.g. 10^2.8)
p + scale_y_log10(
breaks = scales::trans_breaks("log10", function(x) 10^x),
labels = scales::trans_format("log10", scales::math_format(10^.x))
)
I can achieve the desired result for this particular example by adjusting the n argument to scales::trans_breaks(), as below. But this is not a general purpose solution, of the kind that could be applied without needing to adjust anything on a case-by-case basis.
p + scale_y_log10(
breaks = scales::trans_breaks("log10", function(x) 10^x, n = 1),
labels = scales::trans_format("log10", scales::math_format(10^.x))
)
Should add that I'm not wed to using scales::trans_breaks(), it's just that I've found it's the function that gets me closest to what I'm after.
Any help would be much appreciated, thank you!
Here is an approach that at the core has the following function.
breaks = function(x) {
brks <- extended_breaks(Q = c(1, 5))(log10(x))
10^(brks[brks %% 1 == 0])
}
It gives extended_breaks() a narrow set of 'nice numbers' and then filters out non-integers.
This gives us the following for you example case:
library(ggplot2)
library(scales)
#> Warning: package 'scales' was built under R version 4.0.3
# dummy data
df <- data.frame(fct = rep(c("A", "B", "C"), each = 3),
x = rep(1:3, 3),
y = 10^seq(from = -4, to = 1, length.out = 9))
ggplot(df, aes(x, y)) +
geom_point() +
facet_wrap(~ fct, scales = "free_y") +
scale_y_continuous(
trans = "log10",
breaks = function(x) {
brks <- extended_breaks(Q = c(1, 5))(log10(x))
10^(brks[brks %% 1 == 0])
},
labels = math_format(format = log10)
)
Created on 2021-01-19 by the reprex package (v0.3.0)
I haven't tested this on many other ranges that might be difficult, but it should generalise better than setting the number of desired breaks to 1. Difficult ranges might be those just in between -but not including- powers of 10. For example 11-99 or 101-999.

Can I draw a horizontal line at specific number of range of values using ggplot2?

I have data (from excel) with the y-axis as ranges (also calculated in excel) and the x-axis as cell counts and I would like to draw a horizontal line at a specific value in the range, like a reference line. I tried using geom_hline(yintercept = 450) but I am sure it is quite naive and does not work that way for a number in range. I wonder if there are any better suggestions for it :)
plot.new()
library(ggplot2)
d <- read.delim("C:/Users/35389/Desktop/R.txt", sep = "\t")
head(d)
d <- cbind(row.names(d), data.frame(d), row.names=NULL)
d
g <- ggplot(d, aes(d$CTRL,d$Bin.range))+ geom_col()
g + geom_hline(yintercept = 450)
First of all, have a look at my comments.
Second, this is how I suggest you to proceed: don't calculate those ranges on Excel. Let ggplot do it for you.
Say, your data is like this:
df <- data.frame(x = runif(100, 0, 500))
head(df)
#> x
#>1 322.76123
#>2 57.46708
#>3 223.31943
#>4 498.91870
#>5 155.05416
#>6 107.27830
Then you can make a plot like this:
library(ggplot2)
ggplot(df) +
geom_histogram(aes(x = x),
boundary = 0,
binwidth = 50,
fill = "steelblue",
colour = "white") +
geom_vline(xintercept = 450, colour = "red", linetype = 2, size = 1) +
coord_flip()
We don't have your data, but the following data frame is of a similar structure:
d <- data.frame(CTRL = sample(100, 10),
Bin.range = paste(0:9 * 50, 0:9 * 50 + 49.9, sep = "-"))
The first thing to note is that your y axis does not have your ranges ordered correctly. You have 50-99.9 at the top of the y axis. This is because your ranges are stored as characters and ggplot will automatically arrange these alphabetically, not numerically. So you need to reorder the factor levels of your ranges:
d$Bin.range <- factor(d$Bin.range, d$Bin.range)
When you create your plot, don't use d$Bin.range, but instead just use Bin.range. ggplot knows to look for this variable in the data frame you have passed.
g <- ggplot(d, aes(CTRL, Bin.range)) + geom_col()
If you want to draw a horizontal line, your two options are to specify the y axis label at which you want to draw the line (i.e. yintercept = "400-449.9") or, which is what I suspect you want, use a numeric value of 9.5 which will put it between the top two values:
g + geom_hline(yintercept = 9.5, linetype = 2)

Rounding frame_time and smooth transitions for gganimate

I have the following data frame:
# Seed RNG
set.seed(33550336)
# Create data frame
df <- data.frame(x = runif(100),
y = runif(100),
t = runif(100, min = 0, max = 10))
I'd like to plot points (i.e., at x and y coordinates) appearing and disappearing as a function of t. gganimate is awesome, so I used that.
# Load libraries
library(gganimate)
library(ggplot2)
# Create animation
g <- ggplot(df, aes(x = x, y = y))
g <- g + geom_point(colour = "#FF3300", shape = 19, size = 5, alpha = 0.25)
g <- g + labs(title = 'Time: {frame_time}')
g <- g + transition_time(t)
g <- g + enter_fade() + exit_fade()
animate(g, fps = 1)
This code produced the following:
There are a couple of things that I don't like about this.
The transitions are very abrupt. My hope using enter_fade and exit_fade was that the points would fade into view, then back out. Clearly this isn't the case, but how would I achieve this result?
I would like to round {frame_time}, so that while the points fade in and out at fractions of t, the actual time t that would be shown would be an integer. If frame_time was a regular variable, this would be simple enough using something like bquote and round, but this doesn't seem to be the case. How can I round frame_time in my title?
Here's a relatively manual approach that relies on doing more of the prep beforehand and feeding that into gganimate. I'd like to see if there's a simpler way to do this inside gganimate more automatically.
First I make a copy of the data frame for each frame I want to show. Then I calculate the difference between the time I'm presently viewing (time) and the t when I want to show each data point. I use cos to handle the easing in and out, so that each dot's appearance at given time is described with display. In the ggplot call, I then map alpha and size to display, and use transition_time(time) to move through the frames.
# Create prep table
fade_time = 1
frame_count = 100
frames_per_time = 10
df2 <- map_df(seq_len(frame_count), ~df, .id = "time") %>%
mutate(time = as.numeric(time)/frames_per_time,
delta_norm = (t - time) / fade_time,
display = if_else(abs(delta_norm) > 1, 0, cos(pi / 2 * delta_norm)))
# Create animation
g <- ggplot(df2, aes(x = x, y = y, alpha = display, size = display))
g <- g + geom_point(colour = "#FF3300", shape = 19)
g <- g + scale_alpha(range = c(0, 1)) + scale_size_area(max_size = 5)
g <- g + labs(title = "{round(frame_time, 1)}")
g <- g + transition_time(time)
animate(g)

stat_summary_bin at x-axis ticks

How do I make ggplot's stat_summary_bin do the summaries at specified x-axis values? For example, in the plot below I'd like the summaries at x=-1, 0, 1, 2, etc. Now it's at -1.5, -0.5, 0.5, and 1.5.
df = data.frame(x=rnorm(1000), y=rnorm(1000))
ggplot(df, aes(x=x, y=y)) +
stat_summary_bin(binwidth=1)
This is a known bug in ggplot2 that was only fixed in late July. A workaround, from the bug report:
df <- data.frame(x = rnorm(1000), y = rnorm(1000))
ggplot(df, aes(x = cut_width(x, 1, center = 0), y = y)) +
stat_summary_bin(binwidth = 1)
And, to fix the x axis:
ggplot(df, aes(x = cut_width(x, 1, center = 0), y = y)) +
stat_summary_bin(binwidth = 1) +
scale_x_discrete(labels = -3:3, name = 'x')
The breaks argument will be available in the next version of ggplot2, which should be released within the next few weeks.
If you want to run the development version of ggplot2 now, you should be able to do so via:
devtools::install_github("tidyverse/ggplot2")

How to annotate lines like this in ggplot2?

See example:
I hope I don't need to manually assign the coordinators of the texts. If this is too complicated to achieve in ggplot2, what are the alternatives in R? Or maybe even not in R?
As #Axeman says, ggrepel is a decent option. Unfortunately it will only avoid overlap with other labels, and not the lines, so the solution isn't quite perfect.
library(ggplot2)
install.packages("ggrepel")
library(ggrepel)
set.seed(50)
d <- data.frame(y = c(rnorm(50), rnorm(50, 5), rnorm(50, 10)),
x = rep(seq(50), times = 3),
group = rep(LETTERS[seq(3)], each = 50))
ggplot(d, aes(x, y, group = group, label = group)) +
geom_line() +
geom_text_repel(data = d[d$x == sample(d$x, 1), ], size = 10)

Resources