stat_summary_bin at x-axis ticks - r

How do I make ggplot's stat_summary_bin do the summaries at specified x-axis values? For example, in the plot below I'd like the summaries at x=-1, 0, 1, 2, etc. Now it's at -1.5, -0.5, 0.5, and 1.5.
df = data.frame(x=rnorm(1000), y=rnorm(1000))
ggplot(df, aes(x=x, y=y)) +
stat_summary_bin(binwidth=1)

This is a known bug in ggplot2 that was only fixed in late July. A workaround, from the bug report:
df <- data.frame(x = rnorm(1000), y = rnorm(1000))
ggplot(df, aes(x = cut_width(x, 1, center = 0), y = y)) +
stat_summary_bin(binwidth = 1)
And, to fix the x axis:
ggplot(df, aes(x = cut_width(x, 1, center = 0), y = y)) +
stat_summary_bin(binwidth = 1) +
scale_x_discrete(labels = -3:3, name = 'x')
The breaks argument will be available in the next version of ggplot2, which should be released within the next few weeks.
If you want to run the development version of ggplot2 now, you should be able to do so via:
devtools::install_github("tidyverse/ggplot2")

Related

Plot rectangles using geom_rect with continous x-axis and discrete values in y-axis (R)

I am trying to plot rectangles in the x-axis for different classes in the y-axis. I want to do this with geom_rect, but I don't want to use y_min and y_max since I want these to be determined by the classes (i.e. factors) I have in my data.
I managed to get the plot I want changing the breaks and the tick labels manually, but I am sure there must be a better way to do this.
Small toy example:
data <- data.frame(x_start = c(0, 2, 4, 6),
x_end = c(1, 3, 5, 7),
y_start = c(0, 0, 2, 2),
y_end = c(1, 1, 3, 3),
info = c("x", "x", "y", "y"))
Original plot:
ggplot(data ,aes(xmin=x_start, xmax=x_end, ymin=y_start, ymax=y_end, fill=info)) + geom_rect()
Plot that I want:
ggplot(data ,aes(xmin=x_start, xmax=x_end, ymin=y_start, ymax=y_end, fill=info)) + geom_rect() +
scale_y_continuous(breaks = c(0.5,2.5), labels = c("x","y"))
library(dplyr)
y_lab <- data %>%
distinct(y_end, y_start, info) %>%
mutate(y_mid = (y_end + y_start)/2)
ggplot(data, aes(xmin=x_start, xmax=x_end, ymin=y_start, ymax=y_end, fill=info)) +
geom_rect() +
scale_y_continuous(breaks = y_lab$y_mid, labels = y_lab$info)
Or using geom_tile:
ggplot(data, aes(x = (x_start + x_end)/2, y = info, fill=info, width = 1)) +
geom_tile()

How can I overlay a boxplot with a reference line

This is a question about ggplot. The context is data from bootstrapped resamples to be compared with a hypothetical distribution. After box-plotting the bootstrapped data, I would like to overlay a line of expected proportions. The ggplot code below produces:
Error: Aesthetics must be either length 1 or the same as the data (20): y
boot1 <- data.table(digit = 1, prop = runif(10, 0.25, 0.35))
boot2 <- data.table(digit = 2, prop = runif(10, 0.12, 0.25))
boots <- rbindlist(list(boot1, boot2))
ggplot(boots, aes(x = as.factor(digit), y = prop)) +
geom_boxplot() +
geom_line(aes(x = as.factor(digit), y = c(0.3, 0.17)))
In a realistic example, the y values of the line plot would use the values produced by a non-linear function.
Thank you for your attention.
For your example you can try geom_segment() because you don't have a continuous line, but rather segments. So each of your factors will be encoded 1,2,3 on the x-axis, if you have 3 categories, then you need to create a date frame with digit = 1:3 :
mean_data = data.frame(digit = 1:2,prop = c(0.3,0.17))
ggplot(boots, aes(x = factor(digit), y = prop)) +
geom_boxplot() +
geom_segment(data = mean_data,
aes(x = digit - 0.3,xend = digit + 0.3,y=prop,yend=prop),col="blue")
As another spin on the segmentation approach, I tried geom_curve with intervals equal to my x-axis categories.
+ geom_curve(x = 1, y = 0.3, xend = 2, yend = 0.17, curvature = 0.1, color = 2)
and the result is
Its not elegant, particularly with multiple categories. Thank you #StupidWolf for the assistance.

Make ggplot2 write the order of magnitude of the axis label only once at the top

I would like to make ggplot2 write only the first part of the scientific notation onto the axis and then add a $x 10^n$ atop the axis for the order of magnitude. Is there a function to do this?
Here is a MWE with a hack to show what I mean:
ggplot(data = data.frame(x = 1:10, y = seq(1, 2, l = 10)*1000), aes(x,y)) + geom_line()
while I'd something like:
ggplot(data = data.frame(x = 1:10, y = seq(1, 2, l = 10)*1000), aes(x,y)) + geom_line() +
scale_y_continuous(breaks = c(1, 1.25, 1.5, 1.75, 2, 2.05)*1000, label = c(1, 1.25, 1.5, 1.75, 2, "x 10^3"))
As a side question, I have noticed that the axis label becomes quickly to close to the tick labels when they are large. Is there a way to set an automatic spacing in between them ?
Here's a more automated way to execute your hack, in case you want to use similar labeling rules for different data. It will figure out an appropriate power of 10 to use and apply that to the labeling:
y_breaks = pretty_breaks()(data$y)
y_max_exp = floor(log10(max(y_breaks)))
y_breaks = c(y_breaks, max(y_breaks) * 1.025)
y_labels = if_else(y_breaks == max(y_breaks),
paste0("x 10^", y_max_exp),
as.character(y_breaks / (10^y_max_exp)))
ggplot(data, aes(x,y)) + geom_line() +
scale_y_continuous(breaks = y_breaks, label = y_labels, minor_breaks = NULL)

ggplot2 annotation with dates in x-axis

I'm trying to draw an area plot with a series of % value, one for each day during a set period. I would like to add a segment to the top of the plot to show more clearly the areas where the % is decreasing.
I tried to use this code (the example has just a few data points for simplicity):
library(ggplot2)
library(scales)
limit = c(0.85,0.87,0.88,0.90,0.72,0.74)
day <- as.Date(strptime((seq(20150201,20150206,1)),format = "%Y%m%d"))
dati = data.frame("Day" = day, "Limit" = limit)
g <- ggplot(data = dati, aes(Day, Limit))
g <- g + geom_area(fill = "dark red")
g <- g + coord_cartesian(ylim = c(0,1))
g <- g + scale_y_continuous(labels=percent)
g <- g + annotate("segment", y= 1, yend = 1, x = dati[3, "Day"], xend = dati[4, "Day"])
print(g)
But I get this error: Error: / not defined for "Date" objects
Any ideas on how to solve this?
I already checked How to use ggplot2's annotate with dates in x-axis?, but it appears the bug is back. Plus I'd like to do this without using the lubridate package.
Wrapping my earlier comment into an answer: use geom_segment instead.
+ geom_segment(y = 1, yend = 1,
x = as.numeric(dati[3, "Day"]), xend = as.numeric(dati[4, "Day"]))

Show two measurement units on axis ticks in ggplot2

How it is possible (if at all) to show two alternative units on axis ticks in ggplot2?
What I would like to achieve is something like this:
Here's a hacky way of doing that:
d = data.frame(x = 1:20, y = rnorm(20, 5, 5))
ggplot(data = d, aes(x = x, y = y)) +
scale_x_continuous(breaks = c(1:20, seq(2.54, 20, 2.54)),
labels = c(1:20, paste0("\n", 1:as.integer(20/2.54), "\""))) +
geom_point()

Resources