Organize scale of x axis of time series graph - r

Here I have data that looks like this:
# Data
df <- data.frame("Hospital" = c("Buge Hospital", "Buge Hospital", "Greta Hospital", "Greta Hospital",
"Makor Hospital", "Makor Hospital"),
"Period" = c("Jul-18","Aug-18", "Jul-19","Aug-19", "Jul-20","Aug-20"),
"Medical admissions" = c(12,56,0,40,5,56),
"Surgical admissions" = c(10,2,0,50,20,56),
"Inpatient admissions" = c(9,5,6,0,60,96))
Now this data has a column called period which is monthy data for different years, 2018,2019 and 2020
if I plot this data, here is how it looks
library(ggplot2
# Melt data into long format
df2 <- melt(data = df,
id.vars = c("Hospital","Period"),
measure.vars = names(df[3:5]))
# Stacked barplot
ggplot( df2, aes(x = Period, y = value, fill = variable, group = variable)) +
geom_bar(stat = "identity") +
theme(legend.position = "none") +
ggtitle(unique(df2$Hospital))+
scale_x_date(date_labels = %Y)+
labs(x = "Month", y = "Number of People", fill = "Type")
It plots well but the x axis is not organized in ascending order, I have tried to use scale_x_date function but still the plot is the same. What I want is months for the year 2018 to start, then followed with months for 2019 and 2020. I mean x axis to be organized in ascending order based on years like this
Aug-18, Jul-18, Aug-19,Jul-19, Aug-20,Jul-20.

To solve your issue, you need to convert your Period in a date format.
For example, you can use parse_date function from lubridate package:
library(lubridate)
library(tidyr)
library(dplyr)
df %>% mutate(Date = parse_date(as.character(Period), format = "%b-%y")) %>%
pivot_longer(cols = Medical.admissions:Inpatient.admissions, names_to = "Var", values_to = "Val")
# A tibble: 18 x 5
Hospital Period Date Var Val
<fct> <fct> <date> <chr> <dbl>
1 Buge Hospital Jul-18 2018-07-01 Medical.admissions 12
2 Buge Hospital Jul-18 2018-07-01 Surgical.admissions 10
3 Buge Hospital Jul-18 2018-07-01 Inpatient.admissions 9
4 Buge Hospital Aug-18 2018-08-01 Medical.admissions 56
5 Buge Hospital Aug-18 2018-08-01 Surgical.admissions 2
6 Buge Hospital Aug-18 2018-08-01 Inpatient.admissions 5
7 Greta Hospital Jul-19 2019-07-01 Medical.admissions 0
8 Greta Hospital Jul-19 2019-07-01 Surgical.admissions 0
9 Greta Hospital Jul-19 2019-07-01 Inpatient.admissions 6
10 Greta Hospital Aug-19 2019-08-01 Medical.admissions 40
11 Greta Hospital Aug-19 2019-08-01 Surgical.admissions 50
12 Greta Hospital Aug-19 2019-08-01 Inpatient.admissions 0
13 Makor Hospital Jul-20 2020-07-01 Medical.admissions 5
14 Makor Hospital Jul-20 2020-07-01 Surgical.admissions 20
15 Makor Hospital Jul-20 2020-07-01 Inpatient.admissions 60
16 Makor Hospital Aug-20 2020-08-01 Medical.admissions 56
17 Makor Hospital Aug-20 2020-08-01 Surgical.admissions 56
18 Makor Hospital Aug-20 2020-08-01 Inpatient.admissions 96
So, then, you can use scale_x_date to set appropriate labeling option on your x axis:
library(lubridate)
library(tidyr)
library(dplyr)
library(ggplot2)
df %>% mutate(Date = parse_date(as.character(Period), format = "%b-%y")) %>%
pivot_longer(cols = Medical.admissions:Inpatient.admissions, names_to = "Var", values_to = "Val") %>%
ggplot(aes(x = Date, y = Val, fill= Var, group = Var))+
geom_col()+
scale_x_date(date_breaks = "month", date_labels = "%b %Y")+
labs(x = "Month", y = "Number of People", fill = "Type")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Does it answer your question ?
EDIT: Using `lubridate v1.7.8
On lubridate version 1.7.8, parse_date does not exist anymore. You will have to replace it by parse_date_time as follow:
library(lubridate)
library(dplyr)
df %>% mutate(Date = ymd(parse_date_time2(as.character(Period), orders = "%b-%y"))) %>% ....

Related

How label min and max values per group in ggplot?

I have a dataset that counts number of posts per month per year. Looks like that:
monthdate year n
<date> <dbl> <int>
1 2020-01-01 2001 133
2 2020-01-01 2002 129
3 2020-01-01 2003 149
4 2020-01-01 2004 96
5 2020-01-01 2005 94
6 2020-01-01 2006 109
7 2020-01-01 2007 158
8 2020-01-01 2008 138
9 2020-01-01 2009 83
(monthdate as a date is needed only for rendering month names in ggplot).
So the resulting plot is generated like that:
posts %>% mutate(monthdate = as.Date(paste("2020", month, '01', sep = "-"))) %>%
group_by(monthdate, year) %>% summarise(n = n()) %>%
ggplot(aes(x = monthdate, y = n)) +
geom_point(, stat = 'identity') +
geom_smooth(method = "loess") +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
and looks like that:
I want to give year labels for topmost and bottom most outliers, so for each month it can be seen which year produced the least and the most posts per month. What is the efficient way to do it?
Ok, I found the solution. Pretty simple:
posts %>% mutate(monthdate = as.Date(paste("2020", month, '01', sep = "-"))) %>%
group_by(monthdate, year) %>% summarise(n = n()) %>% group_by(monthdate) %>% mutate(lab=case_when(n==max(n)|n==min(n)~year))%>%
ggplot(aes(x = monthdate, y = n)) +
geom_point(, stat = 'identity') +
geom_smooth(method = "loess") +
scale_x_date(date_breaks = "1 month", date_labels = "%b") +
xlab('Month')+
ylab('Number of posts')+ geom_text(aes(label=lab))
and the resulting plot is:

Subset data by Year from period column

I have data with month and year column, how can I subset this to be per year along, here is sample dataset
# Libraries
library(ggplot2)
library(reshape2)
# Data
df <- data.frame("Hospital" = c("Buge Hospital", "Buge Hospital", "Greta Hospital", "Greta Hospital",
"Makor Hospital", "Makor Hospital"),
"Period" = c("Jul-18","Aug-18", "Jul-19","Aug-19", "Jul-20","Aug-20"),
"Medical admissions" = c(12,56,0,40,5,56),
"Surgical admissions" = c(10,2,0,50,20,56),
"Inpatient admissions" = c(9,5,6,0,60,96))
I have tried this but I get empty dataset
data_18 <- subset(df, format(as.Date(df$Period, format="%b/%Y"),"%Y")== 2018)
I want to pull out monthly data for each year so that so that I can observe data trends for that monthly period
Expected result is to subset and get only data for each year individually example is like pull out monthly data for 2018.
I am not sure if this is what you are looking for:
data <- Filter(nrow,split(df,list(gsub(".*-","",df$Period),df$Hospital)))
data_18 <- data[grepl("^18",names(data))]
which gives
> data
$`18.Buge Hospital`
Hospital Period Medical.admissions Surgical.admissions Inpatient.admissions
1 Buge Hospital Jul-18 12 10 9
2 Buge Hospital Aug-18 56 2 5
$`19.Greta Hospital`
Hospital Period Medical.admissions Surgical.admissions Inpatient.admissions
3 Greta Hospital Jul-19 0 0 6
4 Greta Hospital Aug-19 40 50 0
$`20.Makor Hospital`
Hospital Period Medical.admissions Surgical.admissions Inpatient.admissions
5 Makor Hospital Jul-20 5 20 60
6 Makor Hospital Aug-20 56 56 96
and
> data_18
$`18.Buge Hospital`
Hospital Period Medical.admissions Surgical.admissions Inpatient.admissions
1 Buge Hospital Jul-18 12 10 9
2 Buge Hospital Aug-18 56 2
EDIT
If you just want to subset data in 2018 (thanks to #G. Grothendieck )
data_18 <- subset(df, grepl("18", Period))
I think what you were trying for is :
subset(df, format(as.Date(paste('1', Period), '%d %b-%y'), "%Y") == 2018)
# Hospital Period Medical.admissions Surgical.admissions Inpatient.admissions
#1 Buge Hospital Jul-18 12 10 9
#2 Buge Hospital Aug-18 56 2 5
Or using zoo's yearmon.
library(zoo)
subset(df, floor(as.yearmon(Period, "%b-%y")) == 2018)
There are several possibilities, for example with strsplit or with tidyverse as follows:
library(tidyr)
library(dplyr)
df %>% separate(Period, into=c("Month", "Year"), "-") %>% filter(Year == 18)
and if you want to summarize, plot or something, use group_by instead of filter, for example:
df %>%
separate(Period, into=c("Month", "Year"), "-") %>%
group_by(Year) %>%
summarize(sum(Medical.admissions))
And for a more pedestrian approach in response to your desire to subset on both year and month, and to reflect how the approach in your own code could be made to work:
# Libraries
library(ggplot2)
library(reshape2)
library(lubridate)
# Data
df <- data.frame("Hospital" = c("Buge Hospital", "Buge Hospital", "Greta Hospital", "Greta Hospital",
"Makor Hospital", "Makor Hospital"),
"Period" = c("Jul-18","Aug-18", "Jul-19","Aug-19", "Jul-20","Aug-20"),
"Medical admissions" = c(12,56,0,40,5,56),
"Surgical admissions" = c(10,2,0,50,20,56),
"Inpatient admissions" = c(9,5,6,0,60,96),
stringsAsFactors = FALSE)
# data wrangle to give you a valid date and year varibles, subsetting on year should be straightforward using dplyr::group_by(year, month)
df1 <-
df %>%
mutate(date = as.Date(paste0("01-", Period),format = "%d-%b-%y"),
year = year(date),
month = month(date))

pie chart for selected combobox item

I want a chart like this
I plot a pie chart in dashboard, but I want to plot a pie chart for the selected item in combobox, with the function plotly
my Data
State=c ('USA', 'Belgium', 'France','Russia')
totalcases= c(553, 226, 742,370)
totalrecovered=c(12,22,78,21)
totaldeath=c(48,24,12,22)
DTF = data.frame(State,totalcases,totalrecovered,totaldeath)
My code to plot one pie-chart:
labels=c("unrecovered","death","recovered")
USA=filter(DTF,DTF$State=="USA" )
USA=c(USA$Totalcases,USA$Totaldeath,USA$Totalrecovred)
p1= plot_ly(labels = ~labels,
values = ~USA, type = 'pie',
marker = list(colors = brewer.pal(7,"Spectral")))
p1
Thanks.
The problem is: your dataset is a total mess.(; Try this:
library(plotly)
library(RColorBrewer)
library(dplyr)
library(tidyr)
State=c ('USA', 'Belgium', 'France','Russia')
totalcases= c(553, 226, 742,370)
totalrecovered=c(12,22,78,21)
totaldeath=c(48,24,12,22)
DTF = data.frame(State,totalcases,totalrecovered,totaldeath)
dtf_long <- DTF %>%
pivot_longer(-State, names_to = "labels") %>%
mutate(labels = gsub("total", "", labels),
labels = ifelse(labels == "cases", "unrecovered", labels))
dtf_long
#> # A tibble: 12 x 3
#> State labels value
#> <fct> <chr> <dbl>
#> 1 USA unrecovered 553
#> 2 USA recovered 12
#> 3 USA death 48
#> 4 Belgium unrecovered 226
#> 5 Belgium recovered 22
#> 6 Belgium death 24
#> 7 France unrecovered 742
#> 8 France recovered 78
#> 9 France death 12
#> 10 Russia unrecovered 370
#> 11 Russia recovered 21
#> 12 Russia death 22
usa <- filter(dtf_long, State == "USA")
p1 <- usa %>%
plot_ly(labels = ~labels,
values = ~value, type = 'pie',
marker = list(colors = brewer.pal(7, "Spectral")))
p1
Created on 2020-04-04 by the reprex package (v0.3.0)

Visualize rank-change using alluvial in R ggalluvial

I have a pretty basic df in which I have calculated the rank-change of values between two timestamps:
value rank_A rank_B group
1 A 1 1 A
2 B 2 3 A
3 C 3 2 B
4 D 4 4 B
5 E 5 8 A
6 F 6 5 C
7 G 7 6 C
8 H 8 7 A
What makes it a bit tricky (for me) is plotting the values on the Y-axis.
ggplot(df_alluvial, aes(y = value, axis1 = rank_A, axis2 = rank_B))+
geom_alluvium(aes(fill = group), width = 1/12)+
...
As of now, I can plot the rank-change and the groups successfully, but they are not linked to my value-names - there are no axis names and I don't know how to add them.
In the end it should look similiar to this:
https://www.reddit.com/r/GraphicalExcellence/comments/4imh5f/alluvial_diagram_population_size_and_rank_of_uk/
Thanks for your advice!
Your update made the question more clear to me.
The y parameter should be a numerical value, and the data should be in 'long' format. I'm not sure how to change your data to fulfill these requirements. Therefore, I create some new data in this example. I have tried to make the data similar to the data in the plot that you have linked to.
Labels and stratum refer to the city-names. You can use geom_text to label the strata.
# Load libraries
library(tidyverse)
library(ggalluvial)
# Create some data
df_alluvial <- tibble(
city = rep(c("London", "Birmingham", "Manchester"), 4),
year = rep(c(1901, 1911, 1921, 1931), each = 3),
size = c(0, 10, 100, 10, 15, 100, 15, 20, 100, 30, 25, 100))
# Notice the data is in long-format
df_alluvial
#> # A tibble: 12 x 3
#> city year size
#> <chr> <dbl> <dbl>
#> 1 London 1901 0
#> 2 Birmingham 1901 10
#> 3 Manchester 1901 100
#> 4 London 1911 10
#> 5 Birmingham 1911 15
#> 6 Manchester 1911 100
#> 7 London 1921 15
#> 8 Birmingham 1921 20
#> 9 Manchester 1921 100
#> 10 London 1931 30
#> 11 Birmingham 1931 25
#> 12 Manchester 1931 100
ggplot(df_alluvial,
aes(x = as.factor(year), stratum = city, alluvium = city,
y = size,
fill = city, label = city))+
geom_stratum(alpha = .5)+
geom_alluvium()+
geom_text(stat = "stratum", size = 3)
If you want to sort the cities based on their size, you can add decreasing = TRUE to all layers in the plot.
ggplot(df_alluvial,
aes(x = as.factor(year), stratum = city, alluvium = city,
y = size,
fill = city, label = city))+
geom_stratum(alpha = .5, decreasing = TRUE)+
geom_alluvium(decreasing = TRUE)+
geom_text(stat = "stratum", size = 3, decreasing = TRUE)
Created on 2019-11-08 by the reprex package (v0.3.0)

generate seasonal plot, but with fiscal year start/end dates

Hello! Is there a way to index a chart to start and end at specific points
(which may be out of numeric order)?
I have data that begins October 1st, and ends September 31st the following year. The series repeats through multiple years past, and i want to build a daily seasonality chart. The challenge is the X axis is not from low to high, it runs 10-11-12-1-2-3-4-5-6-7-8-9.
Question 1:
Can you order the index by month 10-11-12-1-2-3-4-5-6-7-8-9?
while, being compatible with %m-%d formatting, as the real problem is in
daily format, but for the sake of brevity, I am only using months.
the result should look something like this...sorry i had to use excel...
Question 2:
Can we remove the connected chart lines, or will the solution to 1, naturally fix
question 2? examples in the attempts below.
Question 3:
Can the final formatting of the solution allow to take a moving average, or other
mutations of the initial data? The table in attempt #2 would allow to take the average of each month by year. Since July 17 is 6 and July 18 is 12, we would plot a 9 in the chart, ect for the entire plot.
Question 4:
Is there and XTS equivalent to solve this problem?
THANK YOU, THANK YOU, THANK YOU!
library(ggplot2)
library(plotly)
library(tidyr)
library(reshape2)
Date <- seq(as.Date("2016-10-1"), as.Date("2018-09-01"), by="month")
values <- c(2,3,4,3,4,5,6,4,5,6,7,8,9,10,8,9,10,11,12,13,11,12,13,14)
YearEnd <-c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,
2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018)
df <- data.frame(Date,values,YearEnd)
## PLOT THE TIMESERIES
plot_ly(df, x = ~Date, y = ~values, type = "scatter", mode = "lines")
## PLOT THE DATA BY MONTH: attempt 1
df$Month <- format(df$Date, format="%m")
df2 <- df %>%
select(values, Month, YearEnd)
plot_ly(df2, x = ~Month, y = ~values, type = "scatter", mode = "lines",
connectgaps = FALSE)
## Plot starts on the 10th month, which is good, but the index is
## in standard order, not 10-11-12-1-2-3-4-5-6-7-8-9
## It also still connects the gaps, bad.
## CREATE A PIVOTTABLE: attempt 2
table <- spread(df2,YearEnd, values)
df3 <- melt(table , id.vars = 'Month', variable.name = 'series')
plot_ly(df3, x = ~Month, y = ~values, type = "scatter", mode = "lines",
connectgaps = FALSE)
## now the data are in the right order, but the index is still wrong
## I also do not understand how plotly is ordering it correctly, as 2
## is not the starting point in January.
You just need to set the desired levels for the Month inside factor
library(magrittr)
library(tidyverse)
library(lubridate)
library(plotly)
Date <- seq(as.Date("2016-10-1"), as.Date("2018-09-01"), by = "month")
values <- c(2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7, 8, 9, 10, 8, 9, 10, 11, 12, 13, 11, 12, 13, 14)
YearEnd <- c(
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018
)
df <- data.frame(Date, values, YearEnd)
# to fiscal year order
df %<>%
mutate(
Month = month(Date),
YearEnd = factor(YearEnd)) %>%
mutate(Month = factor(Month,
levels = c(10:12, 1:9),
labels = c(month.abb[10:12], month.abb[1:9])))
df
#> Date values YearEnd Month
#> 1 2016-10-01 2 2017 Oct
#> 2 2016-11-01 3 2017 Nov
#> 3 2016-12-01 4 2017 Dec
#> 4 2017-01-01 3 2017 Jan
#> 5 2017-02-01 4 2017 Feb
#> 6 2017-03-01 5 2017 Mar
#> 7 2017-04-01 6 2017 Apr
#> 8 2017-05-01 4 2017 May
#> 9 2017-06-01 5 2017 Jun
#> 10 2017-07-01 6 2017 Jul
#> 11 2017-08-01 7 2017 Aug
#> 12 2017-09-01 8 2017 Sep
...
p1 <- ggplot(df, aes(
x = Month, y = values,
color = YearEnd,
group = YearEnd)) +
geom_line() +
theme_classic(base_size = 12)
ggplotly(p1)
Edit: to plot by Julian day, we use a similar method to the 3rd one from this answer
# Generate random data
set.seed(2018)
date = seq(from = as.Date("2016-10-01"), to = as.Date("2018-09-30"),
by = "days")
values = c(rnorm(length(date)/2, 8, 1.5), rnorm(length(date)/2, 16, 2))
dat <- data.frame(date, values)
df <- dat %>%
tbl_df() %>%
mutate(jday = factor(yday(date)),
Month = month(date),
Year = year(date),
# only create label for the 1st day of the month
myLabel = case_when(day(date) == 1L ~ format(date, "%b-%d"),
TRUE ~ NA_character_)) %>%
# create fiscal year column
mutate(fcyear = case_when(Month > 9 ~ as.factor(Year + 1),
TRUE ~ as.factor(Year))) %>%
mutate(Month = factor(Month,
levels = c(10:12, 1:9),
labels = c(month.abb[10:12], month.abb[1:9])))
df
#> # A tibble: 730 x 7
#> date values jday Month Year myLabel fcyear
#> <date> <dbl> <fct> <fct> <dbl> <chr> <fct>
#> 1 2016-10-01 7.37 275 Oct 2016 Oct-01 2017
#> 2 2016-10-02 5.68 276 Oct 2016 <NA> 2017
#> 3 2016-10-03 7.90 277 Oct 2016 <NA> 2017
#> 4 2016-10-04 8.41 278 Oct 2016 <NA> 2017
#> 5 2016-10-05 10.6 279 Oct 2016 <NA> 2017
#> 6 2016-10-06 7.60 280 Oct 2016 <NA> 2017
#> 7 2016-10-07 11.1 281 Oct 2016 <NA> 2017
#> 8 2016-10-08 9.30 282 Oct 2016 <NA> 2017
#> 9 2016-10-09 7.08 283 Oct 2016 <NA> 2017
#> 10 2016-10-10 8.96 284 Oct 2016 <NA> 2017
#> # ... with 720 more rows
# Create a row number for plotting to make sure ggplot plot in
# the exact order of a fiscal year
df1 <- df %>%
group_by(fcyear) %>%
mutate(order = row_number()) %>%
ungroup()
df1
#> # A tibble: 730 x 8
#> date values jday Month Year myLabel fcyear order
#> <date> <dbl> <fct> <fct> <dbl> <chr> <fct> <int>
#> 1 2016-10-01 7.37 275 Oct 2016 Oct-01 2017 1
#> 2 2016-10-02 5.68 276 Oct 2016 <NA> 2017 2
#> 3 2016-10-03 7.90 277 Oct 2016 <NA> 2017 3
#> 4 2016-10-04 8.41 278 Oct 2016 <NA> 2017 4
#> 5 2016-10-05 10.6 279 Oct 2016 <NA> 2017 5
#> 6 2016-10-06 7.60 280 Oct 2016 <NA> 2017 6
#> 7 2016-10-07 11.1 281 Oct 2016 <NA> 2017 7
#> 8 2016-10-08 9.30 282 Oct 2016 <NA> 2017 8
#> 9 2016-10-09 7.08 283 Oct 2016 <NA> 2017 9
#> 10 2016-10-10 8.96 284 Oct 2016 <NA> 2017 10
#> # ... with 720 more rows
# plot with `order` as x-axis
p2 <- ggplot(df1,
aes(x = order, y = values,
color = fcyear,
group = fcyear)) +
geom_line() +
theme_classic(base_size = 12) +
xlab(NULL)
# now replace `order` label with `myLabel` created above
x_break <- df1$order[!is.na(df1$myLabel)][1:12]
x_label <- df1$myLabel[x_break]
x_label
#> [1] "Oct-01" "Nov-01" "Dec-01" "Jan-01" "Feb-01" "Mar-01" "Apr-01"
#> [8] "May-01" "Jun-01" "Jul-01" "Aug-01" "Sep-01"
p3 <- p2 +
scale_x_continuous(
breaks = x_break,
labels = x_label) +
theme(axis.text.x = element_text(angle = 90)) +
scale_color_brewer("Fiscal Year", palette = "Dark2") +
xlab(NULL)
p3
ggplotly(p3)
Created on 2018-09-09 by the reprex package (v0.2.0.9000).
Consider this an appendix to Tung's excellent answer. Here I've made it obvious how to alter the code for different start and end months of financial (or production) years which varies by country (and industry), with the Parameter EndMonth. I've also added an annual average, which seems like a pretty obvious thing to want as well (though outside the OP's request).
library(tidyverse)
library(lubridate)
## Generate random data
set.seed(2018)
date = seq(from = as.Date("2016-06-01"), to = as.Date("2016-06-01")+729,
by = "days") # about 2 years, but even number of days
values = c(rnorm(length(date)/2, 8, 1.5), rnorm(length(date)/2, 16, 2))
dat <- data.frame(date, values)
EndMonth <- 5 #i.e. if last month of financial year is May, use 5 for 5th month of calendar year
df <- dat %>%
tbl_df() %>%
mutate(jday = factor(yday(date)),
Month = month(date),
Year = year(date),
# only create label for the 1st day of the month
myLabel = case_when(day(date) == 1L ~ format(date, "%b%e"),
TRUE ~ NA_character_)) %>%
# create fiscal year column
mutate(fcyear = case_when(Month > EndMonth ~ as.factor(Year + 1),
TRUE ~ as.factor(Year))) %>%
mutate(Month = factor(Month,
levels = c((EndMonth+1):12, 1:(EndMonth)),
labels = c(month.abb[(EndMonth+1):12], month.abb[1:EndMonth])))
df
#make 2 (or n) year average
df_mean <- df %>%
group_by(jday) %>%
mutate(values = mean(values, na.rm=TRUE)) %>%
filter(fcyear %in% c("2017")) %>% #note hard code for first fcyear in dataset
mutate(fcyear = "Average")
#Add average to data frame
df <- bind_rows(df, df_mean)
# Create a row number for plotting to make sure ggplot plot in
# the exact order of a fiscal year
df1 <- df %>%
group_by(fcyear) %>%
mutate(order = row_number()) %>%
ungroup()
df1
# plot with `order` as x-axis
p2 <- ggplot(df1,
aes(x = order, y = values,
color = fcyear,
group = fcyear)) +
geom_line() +
theme_classic(base_size = 12) +
xlab(NULL)
p2
# now replace `order` label with `myLabel` created above
x_break <- df1$order[!is.na(df1$myLabel)][1:12]
x_label <- df1$myLabel[x_break]
x_label
p3 <- p2 +
scale_x_continuous(
breaks = x_break,
labels = x_label) +
theme(axis.text.x = element_text(angle = 90)) +
scale_color_brewer("Fiscal Year", palette = "Dark2") +
xlab(NULL)
p3

Resources