I am playing around with gganimate and I do believe it is acting slightly funky when it comes to labels (I've basically followed this example).
I am generating the following .gif with this code snippet (you can find the data here, didn't want post length to explode).
library(gganimate)
library(dplyr)
df <- read.csv("https://pastebin.com/raw/QvhdVqwM", stringsAsFactors = FALSE) %>%
mutate(date = as.Date(date))
countries_anim <- df %>%
filter(country_code == "de") %>%
ggplot(aes(date, value, colour = city_name)) +
geom_line() +
geom_segment(aes(xend = max(date) - 30, yend = value), linetype = 2,
colour = "grey") +
geom_text(aes(x = max(date) - 29, label = city_name), hjust = 0) +
theme(legend.position = "bottom") +
guides(colour = guide_legend(title.position = "top")) +
transition_reveal(date)
n_days <- as.integer(max(df$date) - min(df$date))
anim <- animate(plot = countries_anim, duration = 10,
renderer = gifski_renderer(file = 'figures/de.gif'))
Everything works pretty well except one minor annoyance: at the very beginning of the animation, some annotations (which are supposed to follow time series trend) get permanently printed in the plot area. I've tried to change renderer but the issue seems to be completely uncorrelated.
I am not that versed on gganimate internals and I'm wondering how I could go debugging the issue.
Been struggling in debugging this for a few hours but I seem to have found a solution. Apparently animated annotations are affected by how data is ordered; as you can see in the example below, my dataset was arranged in descending order (by date). Changing the order seems to help annotations to behave better:
library(dplyr)
library(gganimate)
library(ggplot2)
df <- read.csv("https://pastebin.com/raw/QvhdVqwM", stringsAsFactors = FALSE) %>%
mutate(date = as.Date(date))
# Dates are in descending order
df %>%
filter(country_code == "de") %>%
head %>%
as_tibble()
#> # A tibble: 6 x 10
#> big_change change_from_pre… date type region_id value city_name
#> <lgl> <int> <date> <chr> <chr> <int> <chr>
#> 1 FALSE -3 2020-05-28 one_… de-berlin 28 Berlin
#> 2 FALSE 3 2020-05-28 one_… de-hambu… 32 Hamburg
#> 3 FALSE 2 2020-05-28 one_… de-rhine… 31 Rhine-Ru…
#> 4 FALSE 2 2020-05-27 one_… de-berlin 32 Berlin
#> 5 FALSE -3 2020-05-27 one_… de-hambu… 28 Hamburg
#> 6 FALSE 3 2020-05-27 one_… de-rhine… 28 Rhine-Ru…
#> # … with 3 more variables: country_code <chr>, note <chr>, country <chr>
countries_anim <- df %>%
filter(country_code == "de") %>%
arrange(date) %>% # arranging by date solves the problem.
ggplot(aes(date, value, colour = city_name)) +
geom_line() +
geom_segment(aes(xend = max(date) - 30, yend = value), linetype = 2,
colour = "grey") +
geom_text(aes(x = max(date) - 29, label = city_name), hjust = 0) +
theme(legend.position = "bottom") +
guides(colour = guide_legend(title.position = "top")) +
transition_reveal(date)
country_anim <- animate(plot = countries_anim, duration = 10,
renderer = gifski_renderer(file = 'figures/countries.gif'))
I am not quite sure why this happens as data order doesn't really upset gpplot2.
Related
I'm trying to represent the movements of patients between several treatment groups measured in 3 different years. However, there're dropouts where some patients from 1st year are missing in the 2nd year or there are patients in the 2nd year who weren't in the 1st. Same for 3rd year. I have a label called "none" for these combinations, but I don't want it to be in the plot.
An example plot with only 2 years:
EDIT
I have tried with geom_sankey as well (https://rdrr.io/github/davidsjoberg/ggsankey/man/geom_sankey.html).
Although it is more accurate to what I'm looking for. I don't know how to omit the stratum groups without labels (NA). In this case, I'm using my full data, not a dummy example. I can't share it but I can try to create an example if needed. This is the code I've tried:
data = bind_rows(data_2015,data_2017,data_2019) %>%
select(sip, Year, Grp) %>%
mutate(Grp = factor(Grp), Year = factor(Year)) %>%
arrange(sip) %>%
pivot_wider(names_from = Year, values_from = Grp)
df_sankey = data %>% make_long(`2015`,`2017`,`2019`)
ggplot(df_sankey, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node,
color=factor(node) )) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d() +
scale_colour_viridis_d() +
theme_sankey(base_size = 16) +
theme(legend.position = "none") + xlab('')
Figure:
Any idea how to omit the missing groups every year as stratum (without omitting them in the alluvium) will be super helpful. Thanks!
Solved! The solution was much easier I though. I'll leave here the solution in case someone else struggles with a similar problem.
Create a wide table of counts per every group / cohort.
# Data with 3 cohorts for years 2015, 2017 and 2019
# Grp is a factor with 3 levels: 1 to 6
# sip is a unique ID
library(tidyverse)
data_wide = data %>%
select(sip, Year, Grp) %>%
mutate(Grp = factor(Grp, levels=c(1:6)), Year = factor(Year)) %>%
arrange(sip) %>%
pivot_wider(names_from = Year, values_from = Grp)
Using ggsankey package we can transform it as the specific type the package expects. There's already an useful function for this.
df_sankey = data %>% make_long(`2015`,`2017`,`2019`)
# The tibble accounts for every change in X axis and Y categorical value (node):
> head(df_sankey)
# A tibble: 6 × 4
x node next_x next_node
<fct> <chr> <fct> <chr>
1 2015 3 2017 2
2 2017 2 2019 2
3 2019 2 NA NA
4 2015 NA 2017 1
5 2017 1 2019 1
6 2019 1 NA NA
Looks like using the pivot_wider() to pass it to make_long() created a situation where each combination for every value was completed, including missings as NA. Drop NA values in 'node' and create the plot.
df_sankey %>% drop_na(node) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node,
color=factor(node) )) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d() +
scale_colour_viridis_d() +
theme_sankey(base_size = 16) +
theme(legend.position = "none") + xlab('')
Solved!
With my dataframe that looks like this (I have in total 1322 rows) :
I'd like to make a bar plot with the percentage of rating of the CFS score. It should look similar to this :
With this code, I can make a single bar plot for the column cfs_triage :
ggplot(data = df) +
geom_bar(mapping = aes(x = cfs_triage, y = (..count..)/sum(..count..)))
But I can't find out to make one with the three varaibles next to another.
Thank you in advance to all of you that will help me with making this barplot with the percentage of rating for this three variable !(I'm not sure that my explanations are very clear, but I hope that it's the case :))
Your best bet here is to pivot your data into long format. We don't have your data, but we can reproduce a similar data set like this:
set.seed(1)
df <- data.frame(cfs_triage = sample(10, 1322, TRUE, prob = 1:10),
cfs_silver = sample(10, 1322, TRUE),
cfs_student = sample(10, 1322, TRUE, prob = 10:1))
df[] <- lapply(df, function(x) { x[sample(1322, 300)] <- NA; x})
Now the dummy data set looks a lot like yours:
head(df)
#> cfs_triage cfs_silver cfs_student
#> 1 9 NA 1
#> 2 8 4 2
#> 3 NA 8 NA
#> 4 NA 10 9
#> 5 9 5 NA
#> 6 3 1 NA
If we pivot into long format, then we will end up with two columns: one containing the values, and one containing the column name that the value belonged to in the original data frame:
library(tidyverse)
df_long <- df %>%
pivot_longer(everything())
head(df_long)
#> # A tibble: 6 x 2
#> name value
#> <chr> <int>
#> 1 cfs_triage 9
#> 2 cfs_silver NA
#> 3 cfs_student 1
#> 4 cfs_triage 8
#> 5 cfs_silver 4
#> 6 cfs_student 2
This then allows us to plot with value on the x axis, and we can use name as a grouping / fill variable:
ggplot(df_long, aes(value, fill = name)) +
geom_bar(position = 'dodge') +
scale_fill_grey(name = NULL) +
theme_bw(base_size = 16) +
scale_x_continuous(breaks = 1:10)
#> Warning: Removed 900 rows containing non-finite values (`stat_count()`).
Created on 2022-11-25 with reprex v2.0.2
Maybe you need something like this: The formatting was taken from #Allan Cameron (many Thanks!):
library(tidyverse)
library(scales)
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id) %>%
group_by(id) %>%
mutate(percent = value/sum(value, na.rm = TRUE)) %>%
mutate(percent = ifelse(is.na(percent), 0, percent)) %>%
mutate(my_label = str_trim(paste0(format(100 * percent, digits = 1), "%"))) %>%
ggplot(aes(x = factor(name), y = percent, fill = factor(name), label = my_label))+
geom_col(position = position_dodge())+
geom_text(aes(label = my_label), vjust=-1) +
facet_wrap(. ~ id, nrow=1, strip.position = "bottom")+
scale_fill_grey(name = NULL) +
scale_y_continuous(labels = scales::percent)+
theme_bw(base_size = 16)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Haven't used RStudio in a while, so I am quite rusty.
I want to create a bar chart showing the countries shipping the most freight weight in ascending order.
I have made this simple script that does the job:
df_new %>%
filter(!is.na(Freight_weight)) %>%
filter(!is.na(origin_name)) %>%
select(origin_name, Freight_weight) %>%
ggplot(aes(x = reorder(origin_name, Freight_weight, FUN = sum), y = Freight_weight)) +
geom_col() +
labs(x = "") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
However, when I try to do more with it, like adding a top_10 clause to only get the countries with the highest shipments, it doesn't work since it takes the 10 highest individual shipments and not per country.
Instead, I have tried something like this:
df_new %>%
group_by(origin_name) %>%
summarise(n = sum(Freight_weight, na.rm = TRUE)) %>%
ungroup() %>%
mutate(share = n /sum(n) %>% factor() %>% fct_reorder(share)) %>%
ggplot(aes(x = origin_name, y = n)) +
geom_col() +
labs(x = "") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
But here, I can't get the share function to work. What am I doing wrong?
Greatly appreciate your input - if I get this down I should be able to do most of the concurrent analyses!
If you want to find the top 10 countries ordered by their corresponding highest
Freight_weight, one possible solution is,
(Note that, I have created more countries, (denoted by Alphabets) and more data)
Hope this helps.
library(dplyr)
set.seed(123)
df_new <- structure(
list(
Freight_weight = runif(200, min = 1, max = 50),
origin_name = sample(LETTERS[1:15], size = 200, replace = TRUE)
),
row.names = c(NA,-200L),
class = c("tbl_df", "tbl",
"data.frame")
)
df_new %>%
group_by(origin_name) %>%
slice_max(order_by = Freight_weight, n = 1) %>%
ungroup() %>%
arrange(desc(Freight_weight)) %>%
slice(1:10)
#> # A tibble: 10 × 2
#> Freight_weight origin_name
#> <dbl> <chr>
#> 1 49.7 N
#> 2 49.3 I
#> 3 49.2 J
#> 4 49.0 F
#> 5 47.9 M
#> 6 47.8 K
#> 7 47.8 E
#> 8 47.4 O
#> 9 47.1 H
#> 10 46.9 G
Created on 2022-07-06 by the reprex package (v2.0.1)
I have data of a percent score along datetime (date and hours:minutes:seconds). I want to graphically "correct"/highlight a data point that isn't representative.
Background
I have data about how people rate their happiness level on a daily basis, on a continuous scale running 0 -> 1, where 0 means "extremely unhappy" and 1 means "extremely happy". I ask many people and want to get a sense of "happiness in the group" over time.
Data
library(tidyverse)
library(lubridate)
set.seed(1234)
original_df <-
seq(as.POSIXct('2020-09-01', tz = "UTC"), as.POSIXct('2020-09-15', tz = "UTC"), by="1 mins") %>%
sample(15000, replace = T) %>%
as_tibble %>%
rename(date_time = value) %>%
mutate(date = date(date_time)) %>%
add_column(score = runif(15000))
original_df
## # A tibble: 15,000 x 3
## date_time date score
## <dttm> <date> <dbl>
## 1 2020-09-06 04:11:00 2020-09-06 0.683
## 2 2020-09-06 13:35:00 2020-09-06 0.931
## 3 2020-09-05 23:21:00 2020-09-05 0.121
## 4 2020-09-06 14:45:00 2020-09-06 0.144
## 5 2020-09-07 09:15:00 2020-09-07 0.412
## 6 2020-09-01 10:22:00 2020-09-01 0.564
## 7 2020-09-11 14:00:00 2020-09-11 0.960
## 8 2020-09-08 13:24:00 2020-09-08 0.845
## 9 2020-09-01 15:33:00 2020-09-01 0.225
## 10 2020-09-09 19:27:00 2020-09-09 0.815
## # ... with 14,990 more rows
However, it turns out that one of the days happens to have substantially fewer data points. Thus, the actual data set looks like the following:
actual_df <-
original_df %>%
filter(date %in% as_date("2020-09-10")) %>%
group_by(date) %>%
slice_sample(n = 15) %>%
ungroup %>%
bind_rows(original_df %>% filter(!date %in% as_date("2020-09-10")))
> actual_df %>% count(date)
## # A tibble: 14 x 2
## date n
## <date> <int>
## 1 2020-09-01 1073
## 2 2020-09-02 1079
## 3 2020-09-03 1118
## 4 2020-09-04 1036
## 5 2020-09-05 1025
## 6 2020-09-06 1089
## 7 2020-09-07 1040
## 8 2020-09-08 1186
## 9 2020-09-09 1098
## 10 2020-09-10 15 ## <- this day has less data
## 11 2020-09-11 1095
## 12 2020-09-12 1051
## 13 2020-09-13 1037
## 14 2020-09-14 1034
Plotting this data over time
What I've been doing relies on working with means
I treat every day as a factor, and get the daily mean. Statistically, this solution might be far from ideal, as #BrianLang commented below. However, right now this is the method I chose.
library(emmeans)
model_fit <-
actual_df %>%
mutate(across(date, factor)) %>%
lm(score ~ date, data = .)
emmeans_fit_data <- emmeans(model_fit, ~ date, CIs = TRUE)
emmeans_fit_data %>%
as_tibble %>%
ggplot(data = ., aes(x = date, y = emmean)) +
geom_line(color = "#1a476f", group = 1, lwd = 1) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), alpha = 0.5, color = "#90353b", width = 0.2) +
geom_text(aes(label = paste0(round(100*emmean, 1), "%") , color = "90353b"), vjust = -4, hjust = 0.5, size = 3.5) +
geom_point(color = "1a476f") +
scale_y_continuous(labels = function(x) paste0(100*x, "%")) +
ylab("Level of Happiness") +
xlab("Date") +
ggtitle("Mood Over Time") +
theme(plot.title = element_text(hjust = 0.5, size = 14),
axis.text.x=element_text(angle = -60, hjust = 0),
axis.title.x = element_blank(),
legend.title = element_blank(),
plot.caption = element_text(hjust = 0, size = 8),
legend.position = "none")
But then I get this spike on 2020-09-10, which is only due to low number of data points.
One graphical solution would be to do something like dashing the problematic line and "completing" how it would've looked like with enough data points. Perhaps based on averaging the day before and the day after? I don't want to get rid of the real data, but do want to graphically highlight that this is unrepresentative, and that the real value should have been much closer to the day before & after. I was thinking that using dashed lines is a reasonable graphical solution.
Otherwise, I was hoping that there could be a different method for modeling/plotting such "by-time" data using ggplot's smoothing, that will give me a smoother trend line and a confidence ribbon that will account for the problematic day. But I understand that it might be beyond the scope of this question, so I'm just adding it as a side note; in case someone wants to suggest a solution based on different modeling, instead of a mere graphical correction. But I will be thankful for either.
Without wanting to get into time-series models, you could imagine transforming your time variable with restricted cubic splines.
I needed to change a bit of your code so I could avoid installing the newest versions of some packages ;-).
Notice that I changed some variable names because date is a function name, and shouldn't be used as also a variable name.
library(chron)
## added a numeric version of your date variable.
actual_df <- original_df %>%
filter(datez %in% lubridate::date("2020-09-10")) %>%
sample_n(size = 15) %>%
group_by(datez) %>%
ungroup %>%
bind_rows(original_df %>% filter(!datez %in% lubridate::date("2020-09-10"))) %>%
mutate(num_date = as.numeric(datez))
## How many knots across the dates do you want?
number_of_knots = 15
## This is to make sure that visreg is passed the actual knot locations! RMS::RCS does not store them in the model fits.
knots <- paste0("c(", paste0(attr(rms::rcs(actual_df$num_date, number_of_knots), "parms"), collapse = ", "), ")")
## We can construct the formula early.
formula <- as.formula(paste("score ~ rms::rcs(num_date,", knots,")"))
## fit the model as a gaussian glm and pass it to visreg for it's prediction function. This will give you predicted means and 95% CI for that mean. Then I convert the numeric dates back to real dates.
glm_rcs <- glm(data = actual_df, formula = formula, family = "gaussian") %>% visreg::visreg(plot = F) %>% .$fit %>%
mutate(date_date = chron::as.chron(num_date) %>% as.POSIXct())
## plot it!
ggplot(data = glm_rcs, aes(date_date,
y = visregFit)) +
geom_ribbon(aes(ymin = visregLwr, ymax = visregUpr), alpha = .5) +
geom_line()
EDIT: You collect the data by day, but you could add jitter to the date such that they get spread out over the day.
actual_df <- original_df %>%
filter(datez %in% lubridate::date("2020-09-10")) %>%
sample_n(size = 15) %>%
group_by(datez) %>%
ungroup %>%
bind_rows(original_df %>% filter(!datez %in% lubridate::date("2020-09-10"))) %>%
mutate(num_date = as.numeric(datez)) %>%
## Here we add random noise (uniform -.5 to .5) to each numeric date.
mutate(jittered_date = num_date + runif(n(), -.5, .5))
## You can lower this number to increase smoothing.
number_of_knots = 15
knots <- paste0("c(", paste0(attr(rms::rcs(actual_df$jittered_date, number_of_knots), "parms"), collapse = ", "), ")")
formula <- as.formula(paste("score ~ rms::rcs(jittered_date,", knots,")"))
glm_rcs <- glm(data = actual_df, formula = formula, family = "gaussian") %>% visreg::visreg(plot = F) %>% .$fit %>%
mutate(date_date = chron::as.chron(jittered_date) %>% as.POSIXct())
ggplot(data = glm_rcs, aes(date_date,
y = visregFit)) +
geom_ribbon(aes(ymin = visregLwr, ymax = visregUpr), alpha = .5) +
geom_line()
Edit 2:
The jittering of points isn't as necessary if you have a datetime vector rather than a simple day.
In your original code to create the fake data you use lubridate::date(), which takes your posix datetime vector and strips to down to a simple date! You can avoid this with something like this:
original_df <- tibble(datez = seq(as.POSIXct('2020-09-01', tz = "UTC"), as.POSIXct('2020-09-15', tz = "UTC"), by="1 mins") %>%
sample(15000, replace = T)) %>%
mutate(datez_day = lubridate::date(datez)) %>%
add_column(score = runif(15000))
actual_df <- original_df %>%
filter(datez_day %in% lubridate::date("2020-09-10")) %>%
sample_n(size = 15) %>%
bind_rows(original_df %>% filter(!datez_day %in% lubridate::date("2020-09-10"))) %>%
mutate(num_date = as.numeric(datez))
now you have datez_day which is the day value, datez which is a datetime, and num_date which is a numeric representation of the datetime.
from there you can directly model on num_date without adding any jitter.
number_of_knots = 20
knots <- paste0("c(", paste0(attr(rms::rcs(actual_df$num_date, number_of_knots), "parms"), collapse = ", "), ")")
formula <- as.formula(paste("score ~ rms::rcs(num_date,", knots,")"))
glm_rcs <- glm(data = actual_df, formula = formula, family = "gaussian") %>%
visreg::visreg(plot = F) %>%
.$fit %>%
as_tibble() %>%
## Translate the num_date back into a datetime object so it is correct in the figures!
mutate(date_date = as.POSIXct.numeric(round(num_date), origin = "1970/01/01"))
ggplot(data = glm_rcs, aes(date_date,
y = visregFit)) +
geom_ribbon(aes(ymin = visregLwr, ymax = visregUpr), alpha = .5) +
geom_line()
What I'm doing
I'm using a library for R called ggplot2, which allows for a lot of different options for creating graphics and other things. I'm using that to display two different data sets on one graph with different colours for each set of data I want to display.
The Problem
I'm also trying to get a legend to to show up in my graph that will tell the user which set of data corresponds to which colour. So far, I've not been able to get it to show.
What I've tried
I've set it to have a position at the top/bottom/left/right to make sure nothing was making it's position to none by default, which would've hidden it.
The Code
# PDF/Plot generation
pdf("activity-plot.pdf")
ggplot(data.frame("Time"=times), aes(x=Time)) +
#Data Set 1
geom_density(fill = "#1A3552", colour = "#4271AE", alpha = 0.8) +
geom_text(x=mean(times)-1, y=max(density(times)$y/2), label="Mean {1} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(times)), color="cyan", linetype="dashed", size=1, alpha = 0.5) +
# Data Set 2
geom_density(data=data.frame("Time"=timesSec), fill = "gray", colour = "orange", alpha = 0.8) +
geom_text(x=mean(timesSec)-1, y=max(density(timesSec)$y/2), label="Mean {2} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(timesSec)), color="orange", linetype="dashed", size=1, alpha = 0.5) +
# Main Graph Info
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
dev.off()
Result
As pointed out by #Ben, you should pass the color into an aes in order to get the legend being displayed.
However, a better way to get a ggplot is to merge your two values "Time" and "Timesec" into a single dataframe and reshape your dataframe into a longer format. Here, to illustrate this, I created this dummy dataframe:
Time = sample(1:24, 200, replace = TRUE)
Timesec = sample(1:24, 200, replace = TRUE)
df <- data.frame(Time, Timesec)
Time Timesec
1 22 23
2 21 9
3 19 9
4 10 6
5 7 24
6 15 9
... ... ...
So, the first step is to reshape your dataframe into a longer format. Here, I'm using pivot_longer function from tidyr package:
library(tidyr)
library(dplyr)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val")
# A tibble: 400 x 2
var val
<chr> <int>
1 Time 22
2 Timesec 23
3 Time 21
4 Timesec 9
5 Time 19
6 Timesec 9
7 Time 10
8 Timesec 6
9 Time 7
10 Timesec 24
# … with 390 more rows
To add geom_vline and geom_text based on the mean of your values, a nice way of doing it easily is to create a second dataframe gathering the mean and the maximal density values needed to be plot:
library(tidyr)
library(dplyr)
df_lab <- df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
group_by(var) %>%
summarise(Mean = mean(val),
Density = max(density(val)$y))
# A tibble: 2 x 3
var Mean Density
<chr> <dbl> <dbl>
1 Time 11.6 0.0555
2 Timesec 12.1 0.0517
So, using df and df_lab, you can generate your entire plot. Here, we passed color and fill arguments into the aes and use scale_color_manual and scale_fill_manual to set appropriate colors:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
ggplot(aes(x = val, fill = var, colour = var))+
geom_density(alpha = 0.8)+
scale_color_manual(values = c("#4271AE", "orange"))+
scale_fill_manual(values = c("#1A3552", "gray"))+
geom_vline(inherit.aes = FALSE, data = df_lab,
aes(xintercept = Mean, color = var), linetype = "dashed", size = 1,
show.legend = FALSE)+
geom_text(inherit.aes = FALSE, data = df_lab,
aes(x = Mean-0.5, y = Density/2, label = var, color = var), angle = 90,
show.legend = FALSE)+
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
Does it answer your question ?