Plot multiple series of data into a single bagplot with R - r

Let's condsider the bagplot example as included in the aplpack library in R. A bagplot is a bivariate generalisation of a boxplot and therefore gives insight in the distribution of data points in both axes.
Example of a bagplot:
Code for the example:
# example of Rousseeuw et al., see R-package rpart
cardata <- structure(as.integer( c(2560,2345,1845,2260,2440,
2285, 2275, 2350, 2295, 1900, 2390, 2075, 2330, 3320, 2885,
3310, 2695, 2170, 2710, 2775, 2840, 2485, 2670, 2640, 2655,
3065, 2750, 2920, 2780, 2745, 3110, 2920, 2645, 2575, 2935,
2920, 2985, 3265, 2880, 2975, 3450, 3145, 3190, 3610, 2885,
3480, 3200, 2765, 3220, 3480, 3325, 3855, 3850, 3195, 3735,
3665, 3735, 3415, 3185, 3690, 97, 114, 81, 91, 113, 97, 97,
98, 109, 73, 97, 89, 109, 305, 153, 302, 133, 97, 125, 146,
107, 109, 121, 151, 133, 181, 141, 132, 133, 122, 181, 146,
151, 116, 135, 122, 141, 163, 151, 153, 202, 180, 182, 232,
143, 180, 180, 151, 189, 180, 231, 305, 302, 151, 202, 182,
181, 143, 146, 146)), .Dim = as.integer(c(60, 2)),
.Dimnames = list(NULL, c("Weight", "Disp.")))
bagplot(cardata,factor=3,show.baghull=TRUE,
show.loophull=TRUE,precision=1,dkmethod=2)
title("car data Chambers/Hastie 1992")
# points of y=x*x
bagplot(x=1:30,y=(1:30)^2,verbose=FALSE,dkmethod=2)
The bagplot of aplpack seems to only support plotting a "bag" for a single data series. Even more interesting would be to plot two (or three) data series within a single bagplot, where visually comparing the "bags" of the data series gives insight in the differences in the data distributions of the data series. Does anyone know if (and if so, how) this can be done in R?

If we modify some of the aplpack::bagplot code we can make a new geom for ggplot2. Then we can compare groups within a dataset in the usual ggplot2 ways. Here's one example:
library(ggplot2)
ggplot(iris, aes(Sepal.Length, Sepal.Width,
colour = Species, fill = Species)) +
geom_bag() +
theme_minimal()
and we can show the points with the bagplot:
ggplot(iris, aes(Sepal.Length, Sepal.Width,
colour = Species, fill = Species)) +
geom_bag() +
geom_point() +
theme_minimal()
Here's the code for the geom_bag and modified aplpack::bagplot function: https://gist.github.com/benmarwick/00772ccea2dd0b0f1745

Related

time series equation in R

I have data that looks similar to the following example data and I'm looking for a way to fit an equation that i can use on other data with similar profiles but might be higher or lower.
structure(list(day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132,
133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145,
146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158,
159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171,
172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184,
185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197,
198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210,
211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223,
224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236,
237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249,
250, 251, 252, 253), Count = c(10, 50, 500, 425, 300, 400, 275,
98, 115, 79, 87, 114, 69, 105, 81, 82, 117, 87, 123, 81, 119,
97, 84, 124, 122, 53, 114, 95, 49, 95, 101, 114, 74, 120, 72,
61, 79, 59, 96, 95, 105, 53, 110, 69, 69, 79, 106, 52, 50, 98,
102, 107, 122, 108, 47, 68, 51, 114, 96, 102, 121, 113, 130,
134, 143, 144, 141, 139, 140, 142, 141, 125, 134, 130, 137, 139,
123, 138, 108, 133, 97, 122, 120, 110, 144, 121, 103, 127, 103,
100, 139, 138, 103, 105, 114, 142, 128, 141, 141, 122, 110, 125,
112, 98, 130, 116, 138, 120, 135, 143, 136, 145, 101, 120, 131,
119, 131, 116, 114, 143, 126, 102, 116, 106, 133, 110, 102, 141,
141, 132, 110, 95, 130, 133, 131, 128, 103, 111, 120, 140, 107,
114, 95, 113, 116, 131, 145, 144, 121, 111, 100, 145, 96, 130,
95, 119, 135, 127, 113, 105, 110, 102, 105, 116, 145, 115, 102,
120, 143, 140, 141, 132, 143, 136, 108, 106, 127, 112, 122, 118,
112, 96, 116, 141, 162, 168, 198, 156, 165, 180, 179, 166, 194,
194, 162, 199, 156, 193, 200, 160, 160, 187, 150, 185, 161, 183,
166, 167, 199, 159, 146, 195, 151, 161, 161, 162, 167, 193, 191,
181, 148, 200, 182, 164, 147, 182, 165, 165, 159, 163, 188, 154,
192, 157, 149, 163, 170, 151, 185, 168, 154, 164, 191, 169, 186,
157, 182, 195, 150, 145, 152, 188, 176)), row.names = c(NA, -253L
), class = c("tbl_df", "tbl", "data.frame"))
The red line is an example of what an equation might look like. Very rough drawing.
I think what you may be looking for is a generalized additive model (GAM), which is often used to model nonlinear data like time. Here I have saved your dput as data and fit it to a GAM below. First, we can load the mgcv package for the GAM fit.
#### Load Library ####
library(mgcv)
Then you fit the GAM. This can be a very complex topic, and I advise reading a lot on this, but essentially you fit the regression in a similar manner you are probably used to if you have done regression in R before. The only difference is what spline terms you add to the regression, or the nonlinear functions that approximate the relationship between x and y. Here I have just fit a cubic regression spline using the s function for the spline, "day" as the variable, and bs = "cr" for the cubic regression spline. I also use REML here, recommended by a lot of GAM experts, to automatically adjust the knots and smoothing parameters. This can be customized a lot, but for simplicity I leave it alone here.
#### Fit GAM ####
fit <- gam(
Count ~ s(day, bs = "cr"),
method = "REML",
data=data
)
The results can be run here:
#### Summary ####
summary(fit)
As seen below. Here you see the intercept is listed like typical regression summaries. Now you have an additional "Approximate significance of smooth terms" section, which lists some useful metrics for your smoothing term. EDF is how curvilinear it is, and Ref.df & F are used for the significance test, seen to the far right. In this case, the smoothing term is significant. There are also many model metrics on the bottom that are worth observing:
Family: gaussian
Link function: identity
Formula:
Count ~ s(day, bs = "cr")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 133.538 2.504 53.32 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(day) 8.037 8.751 17.93 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.382 Deviance explained = 40.2%
-REML = 1298.7 Scale est. = 1586.9 n = 253
Technically we can write an equation based off this knowledge, but the difference with GAMs is that each spline fit sets separate coefficients for each part of the nonlinear trend, so its not entirely useful for nonlinear data (some reasons are given here). For example, if I want all of the coefficients for a linear equation, I can run coef(fit) and get this very long list:
(Intercept) s(day).1 s(day).2 s(day).3 s(day).4 s(day).5
133.537549 -83.413590 -54.926693 -35.398280 -38.849985 -41.564495
s(day).6 s(day).7 s(day).8 s(day).9
-38.991790 9.101440 4.764924 24.764163
Plotting the data can be done with the below function and is a much better approximation of the regression fit:
#### Plot Fit ####
plot(fit)
Which shows the data fit with its spline and standard error, along with a rug showing the data points with lines on the x axis. This plotting can be customized a lot too, especially with the gratia package, but I leave it here as is. In any case, the interpretation from the plot is far more clear...counts initially decrease a ton, then rebound slightly, plateau for some time, then rebound again before plateauing again.
Hope that is helpful and I recommend reading a lot on this topic. I have included some links to some really useful primers on the subject below.
Citations
Simpson, 2018: GAMs article. This covers mostly fixed effect versions, which you are probably more likely to use.
Pedersen et al., 2019: GAMMs article. This covers some random effects parts too, which may be difficult to understand unless you know more about mixed models.
This book is also a canonical reference to GAMs that is a lot more comprehensive, but I find it is a difficult read and not the best source for beginners.
I am not sure if i got it right,
are you looking for the code to make this kind of plot?
install tidyverse (a collection of packages) and then -
Add this piece of code:
# your code first and then use the pipe operator '%>%':
%>%
ggplot(aes(x = day, y = Count))+geom_line()

Multiple groupings in ggplot2 [duplicate]

I would like to display months (in abbreviated form) along the horizontal axis, with the corresponding year printed once. I know how to display month-year:
The un-needed repetition of the year clutters the labels. Instead I would like something like this:
except that the year would be printed below the months.
I printed the year above the axis labels, because that's the best I could do. This follows a limitation of the annotate() function, which gets clipped if it lies outside of the plot area. I am aware of possible workarounds based on annotate_custom(), but I couldn't make them to work with date objects (I did not try to convert dates to numbers and back to dates again, as it seemed more complicated than hopefully necessary)
I'm wondering if the new dup_axis() could be hijacked for this purpose. If instead of sending the duplicated axis to the opposite side of the panel, it could send it a few lines below the duplicated axis, then perhaps it would just be a matter of setting up one axis with panel.grid.major blanked out and the labels set to %b, while the other axis would have panel.grid.minor blanked out and the labels set to %Y. (an added challenge is that the year labels would be shifted to October instead of January)
These questions are related. However, the annotate_custom() function and textGrob() functions do not play well with dates, as far as I can tell.
how-can-i-add-annotations-below-the-x-axis-in-ggplot2
displaying-text-below-the-plot-generated-by-ggplot2
Data and basic code below:
library("ggplot2")
library("scales")
ggplot(data = df, aes(x = Date, y = value)) + geom_line() +
scale_x_date(date_breaks = "2 month", date_minor_breaks = "1 month", labels = date_format("%b %Y")) +
xlab(NULL)
ggplot(data = df, aes(x = Date, y = value)) + geom_line() +
scale_x_date(date_minor_breaks = "2 month", labels = date_format("%b")) +
annotate(geom = "text", x = as.Date("1719-10-01"), y = 0, label = "1719") +
annotate(geom = "text", x = as.Date("1720-10-01"), y = 0, label = "1720") +
xlab(NULL)
# data
df <- structure(list(Date = structure(c(-91455, -91454, -91453, -91452,
-91451, -91450, -91448, -91447, -91446, -91445, -91444, -91443,
-91441, -91440, -91439, -91438, -91437, -91436, -91434, -91433,
-91431, -91430, -91429, -91427, -91426, -91425, -91424, -91423,
-91422, -91420, -91419, -91418, -91417, -91416, -91415, -91413,
-91412, -91411, -91410, -91409, -91408, -91406, -91405, -91404,
-91403, -91402, -91401, -91399, -91398, -91397, -91396, -91395,
-91394, -91392, -91391, -91390, -91389, -91388, -91387, -91385,
-91384, -91382, -91381, -91380, -91379, -91377, -91376, -91375,
-91374, -91373, -91372, -91371, -91370, -91369, -91368, -91367,
-91366, -91364, -91363, -91362, -91361, -91360, -91359, -91357,
-91356, -91355, -91354, -91353, -91352, -91350, -91349, -91348,
-91347, -91346, -91345, -91343, -91342, -91341, -91340, -91339,
-91338, -91336, -91335, -91334, -91333, -91332, -91331, -91329,
-91328, -91327, -91326, -91325, -91324, -91322, -91321, -91320,
-91319, -91315, -91314, -91313, -91312, -91311, -91310, -91308,
-91307, -91306, -91305, -91304, -91303, -91301, -91300, -91299,
-91298, -91297, -91296, -91294, -91293, -91292, -91291, -91290,
-91289, -91287, -91286, -91285, -91284, -91283, -91282, -91280,
-91279, -91278, -91277, -91276, -91275, -91273, -91272, -91271,
-91270, -91269, -91268, -91266, -91265, -91264, -91263, -91262,
-91261, -91259, -91258, -91257, -91256, -91255, -91254, -91252,
-91251, -91250, -91249, -91248, -91247, -91245, -91244, -91243,
-91242, -91241, -91240, -91238, -91237, -91236, -91235, -91234,
-91233, -91231, -91230, -91229, -91228, -91227, -91226, -91224,
-91223, -91222, -91221, -91220, -91219, -91217, -91216, -91215,
-91214, -91213, -91212, -91210, -91209, -91208, -91207, -91205,
-91201, -91200, -91199, -91198, -91196, -91195, -91194, -91193,
-91192, -91191, -91189, -91188, -91187, -91186, -91185, -91184,
-91182, -91181, -91180, -91179, -91178, -91177, -91175, -91174,
-91173, -91172, -91171, -91170, -91168, -91167, -91166, -91165,
-91164, -91163, -91161, -91160, -91159, -91158, -91157, -91156,
-91154, -91153, -91152, -91151, -91150, -91149, -91147, -91146,
-91145, -91144, -91143, -91142, -91140, -91139, -91138, -91131,
-91130, -91129, -91128, -91126, -91125, -91124, -91123, -91122,
-91121, -91119, -91118, -91117, -91116, -91115, -91114, -91112,
-91111, -91110, -91109, -91108, -91107, -91104, -91103, -91102,
-91101, -91100, -91099, -91097, -91096, -91095, -91094, -91093,
-91091, -91090, -91089, -91088, -91087, -91086, -91084, -91083,
-91082, -91081, -91080, -91079, -91077, -91076, -91075, -91074,
-91073, -91072, -91070, -91069, -91068, -91065, -91063, -91062,
-91061, -91060, -91059, -91058, -91056, -91055, -91054, -91053,
-91052, -91051, -91049, -91048, -91047, -91046, -91045, -91044,
-91042, -91041, -91040, -91039, -91038, -91037, -91035, -91034,
-91033, -91032, -91031, -91030, -91028, -91027, -91026, -91025,
-91024, -91023, -91021, -91020, -91019, -91018, -91017, -91016,
-91014, -91013, -91012, -91011, -91010, -91009, -91007, -91006,
-91005, -91004, -91003, -91002, -91000, -90999, -90998, -90997,
-90996, -90995, -90993, -90992, -90991, -90990, -90989, -90988,
-90986, -90985, -90984, -90983, -90982), class = "Date"), value = c(113,
113, 113, 113, 114, 114, 114, 115, 115, 115, 116, 116, 116, 116,
117, 117, 117, 117, 116, 117, 116, 116, 116, 117, 117, 117, 117,
117, 117, 117, 116, 117, 116, 116, 116, 117, 117, 117, 117, 117,
117, 117, 116, 116, 117, 117, 117, 117, 117, 117, 117, 117, 117,
117, 117, 118, 118, 118, 118, 117, 118, 117, 117, 117, 117, 117,
117, 118, 116, 116, 116, 116, 116, 116, 116, 117, 117, 118, 118,
118, 118, 118, 119, 120, 120, 119, 119, 120, 120, 121, 121, 122,
124, 124, 122, 123, 124, 123, 123, 123, 123, 123, 124, 124, 126,
126, 126, 126, 126, 125, 125, 126, 127, 126, 126, 125, 126, 126,
126, 128, 128, 128, 130, 133, 131, 133, 134, 134, 134, 136, 136,
136, 135, 135, 135, 136, 136, 136, 136, 135, 135, 135, 135, 130,
129, 129, 130, 131, 136, 138, 155, 157, 161, 170, 174, 168, 165,
169, 171, 181, 184, 182, 179, 181, 179, 175, 177, 177, 174, 170,
174, 173, 178, 173, 178, 179, 182, 184, 184, 180, 181, 182, 182,
184, 184, 188, 195, 198, 220, 255, 275, 350, 310, 315, 320, 320,
316, 300, 310, 310, 320, 317, 313, 312, 310, 297, 285, 285, 286,
288, 315, 328, 338, 344, 345, 352, 352, 342, 335, 343, 340, 342,
339, 337, 336, 336, 342, 347, 352, 352, 351, 352, 352, 351, 352,
352, 355, 375, 400, 452, 487, 476, 475, 473, 485, 500, 530, 595,
720, 720, 770, 750, 770, 750, 735, 740, 745, 735, 700, 700, 750,
760, 755, 755, 760, 760, 765, 950, 950, 950, 875, 875, 875, 880,
880, 880, 900, 900, 900, 880, 880, 890, 895, 890, 880, 870, 870,
870, 870, 870, 860, 860, 860, 860, 850, 840, 810, 820, 810, 810,
805, 810, 805, 820, 815, 820, 805, 790, 800, 780, 760, 765, 750,
740, 820, 810, 800, 800, 775, 750, 810, 750, 740, 700, 705, 660,
630, 640, 595, 590, 570, 565, 535, 440, 400, 410, 400, 405, 390,
370, 300, 300, 180, 200, 310, 290, 260, 260, 275, 260, 270, 265,
255, 250, 210, 210, 200, 195, 210, 215, 240, 240, 220, 220, 220,
220, 210, 212, 208, 220, 210, 212, 208, 220, 215, 220, 214, 214,
213, 212, 210, 210, 195, 195, 160, 160, 175, 205, 210, 208, 197,
181, 185)), .Names = c("Date", "value"), row.names = c(NA, 393L
), class = "data.frame")
The code below provides two potential options for adding year labels.
Option 1a: Faceting
You could use faceting to mark the years. For example:
library(ggplot2)
library(lubridate)
ggplot(df, aes(Date, value)) +
geom_line() +
scale_x_date(date_labels="%b", date_breaks="month", expand=c(0,0)) +
facet_grid(~ year(Date), space="free_x", scales="free_x", switch="x") +
theme_bw() +
theme(strip.placement = "outside",
strip.background = element_rect(fill=NA,colour="grey50"),
panel.spacing=unit(0,"cm"))
Note that with this approach, if there are missing dates at the beginning or end of a year (by "missing", I mean rows for those dates are not even present in the data) then the x-axis will start/end at the first/last date in the data for that year, rather than go from Jan-1 to Dec-31. In that case, you'd need to add in rows for the missing dates and either NA for value or interpolate value. In addition, with this method there is no space or line between December 31 of one year and January 1 of the next year, so there's a discontinuity across each year.
Option 1b: Faceting + centered month labels
To address #AF7's comment. You can center the month labels by adding some spaces before each label. But you have to choose the number of spaces manually, depending on the physical size of the plot when you print it to a device. (There's probably a way to center the labels programmatically based on the internal grob measurements, but I'm not sure how to do it.) I've also removed the minor vertical gridlines and lightened the line between years.
ggplot(df, aes(Date, value)) +
geom_line() +
scale_x_date(date_labels=paste(c(rep(" ",11), "%b"), collapse=""),
date_breaks="month", expand=c(0,0)) +
facet_grid(~ year(Date), space="free_x", scales="free_x", switch="x") +
theme_bw() +
theme(strip.placement = "outside",
strip.background = element_blank(),
panel.grid.minor.x = element_blank(),
panel.border = element_rect(colour="grey70"),
panel.spacing=unit(0,"cm"))
Option 2a: Edit the x-axis label grob
Here's a more complex and finicky method (though it could likely be automated by someone who understands the structure and unit spacings of grid graphics better than I do) that avoids the pitfalls of the faceting method described above:
library(grid)
# Fake data with an extra year added for illustration
set.seed(2)
df = data.frame(Date=seq(as.Date("1718-03-01"),as.Date("1721-09-20"), by="1 day"))
df$value = cumsum(rnorm(nrow(df)))
# The plot we'll start with
p = ggplot(df, aes(Date, value)) +
geom_vline(xintercept=as.numeric(df$Date[yday(df$Date)==1]), colour="grey60") +
geom_line() +
scale_x_date(date_labels="%b", date_breaks="month", expand=c(0,0)) +
theme_bw() +
theme(panel.grid.minor.x = element_blank()) +
labs(x="")
Now we want to add the year values below and in between June and July of each year. The code below does that by modifying the x-axis label grob and is adapted from this SO answer by #SandyMuspratt.
# Get the grob
g <- ggplotGrob(p)
# Get the y axis
index <- which(g$layout$name == "axis-b") # Which grob
xaxis <- g$grobs[[index]]
# Get the ticks (labels and marks)
ticks <- xaxis$children[[2]]
# Get the labels
ticksB <- ticks$grobs[[2]]
# Edit x-axis label grob
# Find every index of Jun in the x-axis labels and add a newline and
# then a year label
junes = which(ticksB$children[[1]]$label == "Jun")
ticksB$children[[1]]$label[junes] = paste0(ticksB$children[[1]]$label[junes],
"\n ", unique(year(df$Date)))
# Put the edited labels back into the plot
ticks$grobs[[2]] <- ticksB
xaxis$children[[2]] <- ticks
g$grobs[[index]] <- xaxis
# Draw the plot
grid.newpage()
grid.draw(g)
Option 2b: Edit the x-axis label grob and center the month labels
Below is the only change that needs to be made to Option 2a to center the month labels, but, once again, the number of spaces needs to be tweaked manually.
# Make the edit
# Center the month labels between ticks
ticksB$children[[1]]$label = paste0(paste(rep(" ",7),collapse=""), ticksB$children[[1]]$label)
# Find every index of Jun in the x-axis labels and a year label
junes = grep("Jun", ticksB$children[[1]]$label)
ticksB$children[[1]]$label[junes] = paste0(ticksB$children[[1]]$label[junes], "\n ", unique(year(df$Date)))
I came upon this question and thought maybe I can add a solution. We can display both month and year in every year's first displayed month by using a simple condition. You can play with the date_breaks to remove January from the labels, and this will still work. I'm using month() and year() from lubridate.
library(tidyverse)
library(lubridate)
df %>%
ggplot(aes(Date, value)) +
geom_line() +
scale_x_date(date_breaks = "2 months",
labels = function(x) if_else(is.na(lag(x)) | !year(lag(x)) == year(x),
paste(month(x, label = TRUE), "\n", year(x)),
paste(month(x, label = TRUE))))
If you want to try to hack together a sub-label, you could convert it to a grob. I edited this from the original post to create a function that adds the sublabels and returns a gtable object. Note that the sublabs input must be the same length as your x-axis breaks:
library(grid)
library(gtable)
library(gridExtra)
add_sublabs <- function(plot, sublabs){
gg <- ggplotGrob(plot)
axis_num <- which(gg$layout[,"name"] == "axis-b")
xbreaks <- gg[["grobs"]][[axis_num]][["children"]][[2]][["grobs"]][[2]][["children"]][[1]]$x
if(length(xbreaks) != length(sublabs)) stop("Sub-labels must be the same length as the x-axis breaks")
to_breaks <- c(as.numeric(xbreaks),1)[which(!duplicated(sublabs, fromLast = TRUE))+1]
sublabs_x <- diff(c(0,to_breaks))
sublabs_labels <- sublabs[!duplicated(sublabs, fromLast = TRUE)]
tg <- tableGrob(matrix(sublabs_labels, nrow = 1))
tg$widths = unit(sublabs_x, attr(xbreaks,"unit"))
pos <- gg$layout[axis_num,c("t","l")]
gg2 <- gtable_add_rows(gg, heights = sum(tg$heights)+unit(4,"mm"), pos = pos$t)
gg3 <- gtable_add_grob(gg2, tg, t = pos$t+1, l = pos$l)
return(gg3)
}
#Plot and sublabels
p <- ggplot(data = df, aes(x = Date, y = value)) + geom_line() +
scale_x_date(date_breaks = "2 month", date_minor_breaks = "1 month", labels = date_format("%b")) +
xlab(NULL)
sublabs <- c(rep("1719",2),rep("1720",6))
#Draw
grid.draw(add_sublabs(p, sublabs))
One way to avoid the complexities would be to change the required output so that January is replaced by the year.
The lab function returns the labels given the breaks. Unexpectedly, ggplot will pass NAs to it so in the first line of the function body we replace those with some date -- it does not matter which date since such values are not subsequently used by ggplot. Finally we format the date as a year or abbreviated month depending on whether the month is January (which corresponds to the POSIXlt component mon equalling 0) or not.
library(ggplot2)
library(scales)
lab <- function(b) {
b[is.na(b)] <- Sys.Date()
format(b, ifelse(as.POSIXlt(b)$mon == 0, "%Y", "%b"))
}
ggplot(df, aes(Date, value)) +
geom_line() +
scale_x_date(date_breaks = "month", labels = lab)
Note: I have added Issue 2182 to the ggplot2 github issues list regarding the NAs that are passed to the label function. If subsequent versions of ggplot2 no longer pass the NAs then the first line of the body of lab could be omitted .
Update: fixed.

How to fit a bell-shaped curve on 2 dimensional scatter data?

I have x-y scatter data, which exhibit bell-shaped (i.e. normal distribution shaped) behaviour over the course of a year. These are primary production data from high latitudes (more in detail here, the article is paywalled, but I hope that the figures are visible).
Question
How do I fit a normal distribution shaped curve on scatter data in ggplot2?
Example data
x <- structure(list(yday = c(238, 238, 238, 242, 242, 250, 250, 253,
254, 169, 199, 208, 230, 21, 37, 88, 94, 102, 125, 125, 95, 98,
100, 101, 103, 93, 96, 97, 97, 99, 291, 300, 316, 332, 363, 12,
27, 49, 68, 256, 263, 263, 264, 265, 266, 127, 127, 127, 127,
133, 123, 127, 133, 127, 133, 141, 148, 155, 112, 120, 127, 134,
169, 169, 169, 169, 169, 124, 124, 124, 124, 126, 126, 127, 130,
132, 134, 134, 136, 138, 140, 142, 144, 146, 149, 152, 154, 127,
130, 132, 134, 134, 136, 138, 140, 142, 144, 146, 149, 152, 154,
127, 130, 132, 134, 134, 136, 138, 140, 142, 144, 146, 149, 152,
154, 127, 130, 132, 134, 134, 136, 138, 140, 142, 144, 146, 149,
152, 154, 127, 130, 132, 134, 134, 136, 138, 140, 142, 144, 146,
149, 152, 154, 146, 154, 154, 154, 143, 156, 134, 162, 164, 168,
166, 71, 38, 39), value = c(48802.2, 28869.36, 46370.31, 67936,
91442, 89559.4, 46862.2, 7895.3, 19660.72, 273540.7, 254615,
268930, 264310.56, 72561, 114520, 42950, 149151.15, 530610, 53289.2,
45, 1776, 20504, 1768, 5740, 29497, 340762, 259837, 11576, 847238,
1773275, 1555.92, 3108.48, 8579.1, 25677.8, 17697.32, 2887.56,
5311.2, 13127.98, 38006.4, 135128, 71003, 10454.75, 41389.6,
15266.5, 58601.7, 206984.918282083, 265165.058077198, 90485.4790849673,
5705618.16993847, 1616527.31316346, 1610059.4788107, 5689427.93092749,
3261840.85863376, 1911057.16943202, 1023812.55301328, 1579191.31813709,
2241683.51873045, 4398531.75676259, 933143.183151504, 1771596.51236257,
3000366.86522064, 1219826.2208944, 247538.595548984, 353927.523573691,
323722.062546854, 278081.544235635, 601042.642308546, 2317070.57555887,
963348.671707912, 8125168.04401668, 1860334.91955526, 18673.3716353477,
682901.426071428, 348046.291238703, 1387947.38534056, 112176.06673827,
203778.898538342, 304593.428222028, 1015454.26894711, 384172.102208766,
1211065.9345086, 580449.092224899, 556147.163209095, 707840.652723421,
2919016.89462558, 878518.35266303, 760837.632557093, 437441.609086177,
3761984.12905246, 1008524.7172583, 153914.10863321, 209919.739543153,
5165174.16501832, 592152.070785338, 754878.057858348, 548774.607567716,
784679.488265372, 1191547.05905346, 867977.806474748, 1601076.47417622,
1059665.24406883, 509654.672768973, 1878007.77720015, 217773.469093887,
282571.399726361, 98438.9397685662, 889753.057501427, 564416.438455766,
1843608.78521975, 727213.52622083, 689307.464580901, 1018069.45500141,
1188687.56383149, 1352651.53225745, 726363.223839249, 420446.302222222,
856363.289847527, 18015.1056535911, 229628.041636759, 165657.605285714,
164394.955219614, 510449.457665504, 1778093.57209278, 610564.603533888,
889187.420481627, 2762856.39975472, 863978.618292937, 1697540.61924469,
859284.971006319, 197822.92972973, 389791.010663156, 144870.825015753,
196128.64471631, 133023.96688172, 70787.1033258229, 518223.208383732,
4051834.77590046, 628912.36192445, 378818.831615793, 413839.100579421,
1091509.82410159, 1187325.98867099, 331226.406610866, 1729022.69104484,
4663215.78870189, 2843159.21140248, 708207.236363227, 1436498.03405122,
1158173.94324553, 448469.915666212, 901903.855484778, 1599625.0472896,
1141633.00553421, 728670.952878351, 123982.148723477, 112304.540084388,
4011.30312056738)), .Names = c("yday", "value"), row.names = c(NA,
-157L), class = "data.frame")
Example
ggplot(x, aes(x = yday, y = value)) + geom_point() + xlab("Day of year")
The "curve" has been added manually.
What have I tried
Previous works have used normal distribution, Weibull distribution and log-normal disrtibution. I obviously cannot fit a density distribution directly to these data, as the data are two dimensional. This is where I struggle. I could adjust the height of the density distributions, but this would not be model fitting, and R has to have a better way to do it. My initial feeling is that I need to formulate a curve from the above-mentioned distributions and use the nls to fit these curves. I am, however, a bit lost with how to formulate the equations and function calls. Once the nls call is formulated, it could be passed to the stat_function or geom_smooth layers. Any help would be appreciated.
You can easily fit the normal density function:
fit <- nls(value ~ a * dnorm(yday, mean, sd), data = x,
start = list(mean = 150, sd = 25, a = 1e8))
plot(value ~ yday, data = x)
curve(predict(fit, newdata = data.frame(yday = x)), from = 0, to = 400, add = TRUE)
If that is a sensible thing to to is a different question.

Axis labels on two lines with nested x variables (year below months)

I would like to display months (in abbreviated form) along the horizontal axis, with the corresponding year printed once. I know how to display month-year:
The un-needed repetition of the year clutters the labels. Instead I would like something like this:
except that the year would be printed below the months.
I printed the year above the axis labels, because that's the best I could do. This follows a limitation of the annotate() function, which gets clipped if it lies outside of the plot area. I am aware of possible workarounds based on annotate_custom(), but I couldn't make them to work with date objects (I did not try to convert dates to numbers and back to dates again, as it seemed more complicated than hopefully necessary)
I'm wondering if the new dup_axis() could be hijacked for this purpose. If instead of sending the duplicated axis to the opposite side of the panel, it could send it a few lines below the duplicated axis, then perhaps it would just be a matter of setting up one axis with panel.grid.major blanked out and the labels set to %b, while the other axis would have panel.grid.minor blanked out and the labels set to %Y. (an added challenge is that the year labels would be shifted to October instead of January)
These questions are related. However, the annotate_custom() function and textGrob() functions do not play well with dates, as far as I can tell.
how-can-i-add-annotations-below-the-x-axis-in-ggplot2
displaying-text-below-the-plot-generated-by-ggplot2
Data and basic code below:
library("ggplot2")
library("scales")
ggplot(data = df, aes(x = Date, y = value)) + geom_line() +
scale_x_date(date_breaks = "2 month", date_minor_breaks = "1 month", labels = date_format("%b %Y")) +
xlab(NULL)
ggplot(data = df, aes(x = Date, y = value)) + geom_line() +
scale_x_date(date_minor_breaks = "2 month", labels = date_format("%b")) +
annotate(geom = "text", x = as.Date("1719-10-01"), y = 0, label = "1719") +
annotate(geom = "text", x = as.Date("1720-10-01"), y = 0, label = "1720") +
xlab(NULL)
# data
df <- structure(list(Date = structure(c(-91455, -91454, -91453, -91452,
-91451, -91450, -91448, -91447, -91446, -91445, -91444, -91443,
-91441, -91440, -91439, -91438, -91437, -91436, -91434, -91433,
-91431, -91430, -91429, -91427, -91426, -91425, -91424, -91423,
-91422, -91420, -91419, -91418, -91417, -91416, -91415, -91413,
-91412, -91411, -91410, -91409, -91408, -91406, -91405, -91404,
-91403, -91402, -91401, -91399, -91398, -91397, -91396, -91395,
-91394, -91392, -91391, -91390, -91389, -91388, -91387, -91385,
-91384, -91382, -91381, -91380, -91379, -91377, -91376, -91375,
-91374, -91373, -91372, -91371, -91370, -91369, -91368, -91367,
-91366, -91364, -91363, -91362, -91361, -91360, -91359, -91357,
-91356, -91355, -91354, -91353, -91352, -91350, -91349, -91348,
-91347, -91346, -91345, -91343, -91342, -91341, -91340, -91339,
-91338, -91336, -91335, -91334, -91333, -91332, -91331, -91329,
-91328, -91327, -91326, -91325, -91324, -91322, -91321, -91320,
-91319, -91315, -91314, -91313, -91312, -91311, -91310, -91308,
-91307, -91306, -91305, -91304, -91303, -91301, -91300, -91299,
-91298, -91297, -91296, -91294, -91293, -91292, -91291, -91290,
-91289, -91287, -91286, -91285, -91284, -91283, -91282, -91280,
-91279, -91278, -91277, -91276, -91275, -91273, -91272, -91271,
-91270, -91269, -91268, -91266, -91265, -91264, -91263, -91262,
-91261, -91259, -91258, -91257, -91256, -91255, -91254, -91252,
-91251, -91250, -91249, -91248, -91247, -91245, -91244, -91243,
-91242, -91241, -91240, -91238, -91237, -91236, -91235, -91234,
-91233, -91231, -91230, -91229, -91228, -91227, -91226, -91224,
-91223, -91222, -91221, -91220, -91219, -91217, -91216, -91215,
-91214, -91213, -91212, -91210, -91209, -91208, -91207, -91205,
-91201, -91200, -91199, -91198, -91196, -91195, -91194, -91193,
-91192, -91191, -91189, -91188, -91187, -91186, -91185, -91184,
-91182, -91181, -91180, -91179, -91178, -91177, -91175, -91174,
-91173, -91172, -91171, -91170, -91168, -91167, -91166, -91165,
-91164, -91163, -91161, -91160, -91159, -91158, -91157, -91156,
-91154, -91153, -91152, -91151, -91150, -91149, -91147, -91146,
-91145, -91144, -91143, -91142, -91140, -91139, -91138, -91131,
-91130, -91129, -91128, -91126, -91125, -91124, -91123, -91122,
-91121, -91119, -91118, -91117, -91116, -91115, -91114, -91112,
-91111, -91110, -91109, -91108, -91107, -91104, -91103, -91102,
-91101, -91100, -91099, -91097, -91096, -91095, -91094, -91093,
-91091, -91090, -91089, -91088, -91087, -91086, -91084, -91083,
-91082, -91081, -91080, -91079, -91077, -91076, -91075, -91074,
-91073, -91072, -91070, -91069, -91068, -91065, -91063, -91062,
-91061, -91060, -91059, -91058, -91056, -91055, -91054, -91053,
-91052, -91051, -91049, -91048, -91047, -91046, -91045, -91044,
-91042, -91041, -91040, -91039, -91038, -91037, -91035, -91034,
-91033, -91032, -91031, -91030, -91028, -91027, -91026, -91025,
-91024, -91023, -91021, -91020, -91019, -91018, -91017, -91016,
-91014, -91013, -91012, -91011, -91010, -91009, -91007, -91006,
-91005, -91004, -91003, -91002, -91000, -90999, -90998, -90997,
-90996, -90995, -90993, -90992, -90991, -90990, -90989, -90988,
-90986, -90985, -90984, -90983, -90982), class = "Date"), value = c(113,
113, 113, 113, 114, 114, 114, 115, 115, 115, 116, 116, 116, 116,
117, 117, 117, 117, 116, 117, 116, 116, 116, 117, 117, 117, 117,
117, 117, 117, 116, 117, 116, 116, 116, 117, 117, 117, 117, 117,
117, 117, 116, 116, 117, 117, 117, 117, 117, 117, 117, 117, 117,
117, 117, 118, 118, 118, 118, 117, 118, 117, 117, 117, 117, 117,
117, 118, 116, 116, 116, 116, 116, 116, 116, 117, 117, 118, 118,
118, 118, 118, 119, 120, 120, 119, 119, 120, 120, 121, 121, 122,
124, 124, 122, 123, 124, 123, 123, 123, 123, 123, 124, 124, 126,
126, 126, 126, 126, 125, 125, 126, 127, 126, 126, 125, 126, 126,
126, 128, 128, 128, 130, 133, 131, 133, 134, 134, 134, 136, 136,
136, 135, 135, 135, 136, 136, 136, 136, 135, 135, 135, 135, 130,
129, 129, 130, 131, 136, 138, 155, 157, 161, 170, 174, 168, 165,
169, 171, 181, 184, 182, 179, 181, 179, 175, 177, 177, 174, 170,
174, 173, 178, 173, 178, 179, 182, 184, 184, 180, 181, 182, 182,
184, 184, 188, 195, 198, 220, 255, 275, 350, 310, 315, 320, 320,
316, 300, 310, 310, 320, 317, 313, 312, 310, 297, 285, 285, 286,
288, 315, 328, 338, 344, 345, 352, 352, 342, 335, 343, 340, 342,
339, 337, 336, 336, 342, 347, 352, 352, 351, 352, 352, 351, 352,
352, 355, 375, 400, 452, 487, 476, 475, 473, 485, 500, 530, 595,
720, 720, 770, 750, 770, 750, 735, 740, 745, 735, 700, 700, 750,
760, 755, 755, 760, 760, 765, 950, 950, 950, 875, 875, 875, 880,
880, 880, 900, 900, 900, 880, 880, 890, 895, 890, 880, 870, 870,
870, 870, 870, 860, 860, 860, 860, 850, 840, 810, 820, 810, 810,
805, 810, 805, 820, 815, 820, 805, 790, 800, 780, 760, 765, 750,
740, 820, 810, 800, 800, 775, 750, 810, 750, 740, 700, 705, 660,
630, 640, 595, 590, 570, 565, 535, 440, 400, 410, 400, 405, 390,
370, 300, 300, 180, 200, 310, 290, 260, 260, 275, 260, 270, 265,
255, 250, 210, 210, 200, 195, 210, 215, 240, 240, 220, 220, 220,
220, 210, 212, 208, 220, 210, 212, 208, 220, 215, 220, 214, 214,
213, 212, 210, 210, 195, 195, 160, 160, 175, 205, 210, 208, 197,
181, 185)), .Names = c("Date", "value"), row.names = c(NA, 393L
), class = "data.frame")
The code below provides two potential options for adding year labels.
Option 1a: Faceting
You could use faceting to mark the years. For example:
library(ggplot2)
library(lubridate)
ggplot(df, aes(Date, value)) +
geom_line() +
scale_x_date(date_labels="%b", date_breaks="month", expand=c(0,0)) +
facet_grid(~ year(Date), space="free_x", scales="free_x", switch="x") +
theme_bw() +
theme(strip.placement = "outside",
strip.background = element_rect(fill=NA,colour="grey50"),
panel.spacing=unit(0,"cm"))
Note that with this approach, if there are missing dates at the beginning or end of a year (by "missing", I mean rows for those dates are not even present in the data) then the x-axis will start/end at the first/last date in the data for that year, rather than go from Jan-1 to Dec-31. In that case, you'd need to add in rows for the missing dates and either NA for value or interpolate value. In addition, with this method there is no space or line between December 31 of one year and January 1 of the next year, so there's a discontinuity across each year.
Option 1b: Faceting + centered month labels
To address #AF7's comment. You can center the month labels by adding some spaces before each label. But you have to choose the number of spaces manually, depending on the physical size of the plot when you print it to a device. (There's probably a way to center the labels programmatically based on the internal grob measurements, but I'm not sure how to do it.) I've also removed the minor vertical gridlines and lightened the line between years.
ggplot(df, aes(Date, value)) +
geom_line() +
scale_x_date(date_labels=paste(c(rep(" ",11), "%b"), collapse=""),
date_breaks="month", expand=c(0,0)) +
facet_grid(~ year(Date), space="free_x", scales="free_x", switch="x") +
theme_bw() +
theme(strip.placement = "outside",
strip.background = element_blank(),
panel.grid.minor.x = element_blank(),
panel.border = element_rect(colour="grey70"),
panel.spacing=unit(0,"cm"))
Option 2a: Edit the x-axis label grob
Here's a more complex and finicky method (though it could likely be automated by someone who understands the structure and unit spacings of grid graphics better than I do) that avoids the pitfalls of the faceting method described above:
library(grid)
# Fake data with an extra year added for illustration
set.seed(2)
df = data.frame(Date=seq(as.Date("1718-03-01"),as.Date("1721-09-20"), by="1 day"))
df$value = cumsum(rnorm(nrow(df)))
# The plot we'll start with
p = ggplot(df, aes(Date, value)) +
geom_vline(xintercept=as.numeric(df$Date[yday(df$Date)==1]), colour="grey60") +
geom_line() +
scale_x_date(date_labels="%b", date_breaks="month", expand=c(0,0)) +
theme_bw() +
theme(panel.grid.minor.x = element_blank()) +
labs(x="")
Now we want to add the year values below and in between June and July of each year. The code below does that by modifying the x-axis label grob and is adapted from this SO answer by #SandyMuspratt.
# Get the grob
g <- ggplotGrob(p)
# Get the y axis
index <- which(g$layout$name == "axis-b") # Which grob
xaxis <- g$grobs[[index]]
# Get the ticks (labels and marks)
ticks <- xaxis$children[[2]]
# Get the labels
ticksB <- ticks$grobs[[2]]
# Edit x-axis label grob
# Find every index of Jun in the x-axis labels and add a newline and
# then a year label
junes = which(ticksB$children[[1]]$label == "Jun")
ticksB$children[[1]]$label[junes] = paste0(ticksB$children[[1]]$label[junes],
"\n ", unique(year(df$Date)))
# Put the edited labels back into the plot
ticks$grobs[[2]] <- ticksB
xaxis$children[[2]] <- ticks
g$grobs[[index]] <- xaxis
# Draw the plot
grid.newpage()
grid.draw(g)
Option 2b: Edit the x-axis label grob and center the month labels
Below is the only change that needs to be made to Option 2a to center the month labels, but, once again, the number of spaces needs to be tweaked manually.
# Make the edit
# Center the month labels between ticks
ticksB$children[[1]]$label = paste0(paste(rep(" ",7),collapse=""), ticksB$children[[1]]$label)
# Find every index of Jun in the x-axis labels and a year label
junes = grep("Jun", ticksB$children[[1]]$label)
ticksB$children[[1]]$label[junes] = paste0(ticksB$children[[1]]$label[junes], "\n ", unique(year(df$Date)))
I came upon this question and thought maybe I can add a solution. We can display both month and year in every year's first displayed month by using a simple condition. You can play with the date_breaks to remove January from the labels, and this will still work. I'm using month() and year() from lubridate.
library(tidyverse)
library(lubridate)
df %>%
ggplot(aes(Date, value)) +
geom_line() +
scale_x_date(date_breaks = "2 months",
labels = function(x) if_else(is.na(lag(x)) | !year(lag(x)) == year(x),
paste(month(x, label = TRUE), "\n", year(x)),
paste(month(x, label = TRUE))))
If you want to try to hack together a sub-label, you could convert it to a grob. I edited this from the original post to create a function that adds the sublabels and returns a gtable object. Note that the sublabs input must be the same length as your x-axis breaks:
library(grid)
library(gtable)
library(gridExtra)
add_sublabs <- function(plot, sublabs){
gg <- ggplotGrob(plot)
axis_num <- which(gg$layout[,"name"] == "axis-b")
xbreaks <- gg[["grobs"]][[axis_num]][["children"]][[2]][["grobs"]][[2]][["children"]][[1]]$x
if(length(xbreaks) != length(sublabs)) stop("Sub-labels must be the same length as the x-axis breaks")
to_breaks <- c(as.numeric(xbreaks),1)[which(!duplicated(sublabs, fromLast = TRUE))+1]
sublabs_x <- diff(c(0,to_breaks))
sublabs_labels <- sublabs[!duplicated(sublabs, fromLast = TRUE)]
tg <- tableGrob(matrix(sublabs_labels, nrow = 1))
tg$widths = unit(sublabs_x, attr(xbreaks,"unit"))
pos <- gg$layout[axis_num,c("t","l")]
gg2 <- gtable_add_rows(gg, heights = sum(tg$heights)+unit(4,"mm"), pos = pos$t)
gg3 <- gtable_add_grob(gg2, tg, t = pos$t+1, l = pos$l)
return(gg3)
}
#Plot and sublabels
p <- ggplot(data = df, aes(x = Date, y = value)) + geom_line() +
scale_x_date(date_breaks = "2 month", date_minor_breaks = "1 month", labels = date_format("%b")) +
xlab(NULL)
sublabs <- c(rep("1719",2),rep("1720",6))
#Draw
grid.draw(add_sublabs(p, sublabs))
One way to avoid the complexities would be to change the required output so that January is replaced by the year.
The lab function returns the labels given the breaks. Unexpectedly, ggplot will pass NAs to it so in the first line of the function body we replace those with some date -- it does not matter which date since such values are not subsequently used by ggplot. Finally we format the date as a year or abbreviated month depending on whether the month is January (which corresponds to the POSIXlt component mon equalling 0) or not.
library(ggplot2)
library(scales)
lab <- function(b) {
b[is.na(b)] <- Sys.Date()
format(b, ifelse(as.POSIXlt(b)$mon == 0, "%Y", "%b"))
}
ggplot(df, aes(Date, value)) +
geom_line() +
scale_x_date(date_breaks = "month", labels = lab)
Note: I have added Issue 2182 to the ggplot2 github issues list regarding the NAs that are passed to the label function. If subsequent versions of ggplot2 no longer pass the NAs then the first line of the body of lab could be omitted .
Update: fixed.

ggplot for normal distribution - add data to graph

I'm trying add to my plot some data that will facilitate users. My distribution graph comes from this code:
require(ggplot2)
my_data<-c(70, 71, 75, 78, 78, 79, 80, 81, 84, 85, 87, 87, 90, 91, 95, 95, 96, 96, 97, 98, 98, 100, 101, 102, 102, 102, 102, 104, 104, 104, 107, 107, 109, 110, 110, 110, 111, 112, 113, 113, 114, 115, 118, 118, 118, 120, 124, 131, 137, 137, 139, 145, 158, 160, 162, 165, 169, 177, 179, 180)
dist <- dnorm(my_data,mean=mean(my_data), sd(my_data))
qplot(my_data,dist,geom="line")+xlab("x values")+ylab("Density")+ ggtitle("cool graph Distribution") + geom_line(color="black")
and the result is:
What I'm aiming to do is to add more data to the ggplot2:
the mean
say I have a sample: 80. I'd like to draw a line between the x values straight up intersecting with the graph.
Divide the graph into parts by 2 sigmas (or maybe 3) and add a region (example in the graph below shows 4 regions: unusually low price, great price and forth).
Thanks for any pointers!
desired result:
You can add various lines to the chart using geom_line, I reckon that it's only a matter of placing lines at different points that you want to highlight on the chart (mean, etc.):
qplot(my_data,dist,geom="line") +
xlab("x values") +
ylab("Density") +
ggtitle("cool graph Distribution") +
geom_line(color="black") +
geom_line(stat = "hline", yintercept = "mean", colour = "blue") +
geom_line(stat = "vline", xintercept = "mean", colour = "red")

Resources