Plotting a non-standard year (water year) with ggplot2 - r

Building on this question and the use of "water year" in R I have question regarding plotting in ggplot2 with a common date axis over many years. A water year is definitely the start of the year to be October 1st ending September 30. It is a little more sensible for the hydrological cycle.
So say I have this data set:
library(dplyr)
library(ggplot2)
library(lubridate)
df <- data.frame(Date=seq.Date(as.Date("1910/1/1"), as.Date("1915/1/1"), "days"),
y=rnorm(1827,100,1))
Then here is the wtr_yr function:
wtr_yr <- function(dates, start_month=10) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(dates)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}
What I would like to do is use colour as a grouping variable, then make a x axes that only consists of month and date information. Usually I've done like so (using the lubridate package):
ymd(paste0("1900","-",month(df$Date),"-",day(df$Date)))
This works fine if year is arranged normally. However in this water year scenario, the real year span the water year. So ideally I'd like a plot that goes from October 1 to September 30 and plot separate lines for each water year maintaining all the correct water years. Here is where I am so far:
df1 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
mutate(CDate=as.Date(paste0("1900","-",month(Date),"-",day(Date))))
df1 <- %>%
ggplot(aes(x=CDate, y=y, colour=wtr_yrVAR)) +
geom_point()
So plotting that obviously date spans from Jan to Dec. Any ideas how I can force ggplot2 to plot these along the water year lines?

Here is a method that works:
df3 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
#seq along dates starting with the beginning of your water year
mutate(CDate=as.Date(paste0(ifelse(month(Date) < 10, "1901", "1900"),
"-", month(Date), "-", day(Date))))
Then:
df3 %>%
ggplot(., aes(x = CDate, y = y, colour = wtr_yrVAR)) +
geom_point() +
scale_x_date(date_labels = "%b %d")
Which gives:

not very elegant but this should work:
df1 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
mutate(CDdate= as.Date(as.numeric(Date - as.Date(paste0(wtr_yrVAR,"-10-01"))), origin = "1900-10-01"))
df1 %>% ggplot(aes(x =CDdate, y=y, colour=wtr_yrVAR)) +
geom_line() + theme_bw()+scale_x_date(date_breaks = "1 month", date_labels = "%b", limits = c(as.Date("1899-09-30"),as.Date("1900-10-01")))+theme_bw()

Related

Plotting one daily time serie per year in R (ggplot2)

Similar to this question: Split up time series per year for plotting which has done in Python, I want to display the daily time series as multiple lines by year. How can I achieve this in R?
library(ggplot2)
library(dplyr)
# Dummy data
df <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) + seq(-140, 224)^2 / 10000
)
# Most basic bubble plot
p <- ggplot(df, aes(x=day, y=value)) +
geom_line() +
xlab("")
p
Out:
One solution is using ggplot2, but date_labels are displayed incorrectly:
library(tidyverse)
library(lubridate)
p <- df %>%
# mutate(date = ymd(date)) %>%
mutate(date=as.Date(date)) %>%
mutate(
year = factor(year(date)), # use year to define separate curves
date = update(date, year = 1) # use a constant year for the x-axis
) %>%
ggplot(aes(date, value, color = year)) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
# Raw daily data
p + geom_line()
Out:
Alternative solution is to use gg_season from feasts package:
library(feasts)
library(tsibble)
library(dplyr)
tsibbledata::aus_retail %>%
filter(
State == "Victoria",
Industry == "Cafes, restaurants and catering services"
) %>%
gg_season(Turnover)
Out:
References:
Split up time series per year for plotting
R - How to create a seasonal plot - Different lines for years
If you want your x axis to represent the months from January to February, then perhaps getting the yday of the date and adding it to the first of January on a random year would be simplest:
library(tidyverse)
library(lubridate)
df <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) + seq(-140, 224)^2 / 10000
)
df %>%
mutate(year = factor(year(day)), date = yday(day) + as.Date('2017-01-01')) %>%
ggplot(aes(date, value, color = year)) +
geom_line() +
scale_x_date(breaks = seq(as.Date('2017-01-01'), by = 'month', length = 12),
date_labels = '%b')
Created on 2023-02-07 with reprex v2.0.2
I tend to think simple is better:
transform(df, year = format(day, "%Y")) |>
ggplot(aes(x=day, y=value, group=year, color=year)) +
geom_line() +
xlab(NULL)
optionally removing the year legend with + guides(colour = "none").

Expand axis dates to a full month in each facet

I am plotting router statistics (collected from merlin speed monitoring tool).
The graphs are faceted by year-month, and I want each month's x axis to expand to the entire month, even when I only have part of a months data.
In the example below, the data for January 2022 is incomplete (just 6 hours or
so of data).
The code I have tried:
library(tidyverse)
library(scales)
X.df <- read.csv(url("https://pastebin.com/raw/sGAzEDe6")) %>%
mutate(date = as.POSIXct(date, origin="1970-01-01"))
ggplot(X.df , aes(date, Download, colour = Download)) +
geom_line()+
facet_wrap(~ month, scale="free_x", ncol = 1) +
scale_colour_gradient(low="red",high="green", limits=c(0.0, 50), oob = squish) +
scale_x_datetime(date_labels = "%d/%m", breaks = "7 day", minor_breaks = "1 day") +
coord_cartesian(ylim = c(0, 60))
Again, I want the range of the x axis in each facet to cover the entire month. Thus, I want the X axis for the 2021-12 facet to run from 1st Dec 2021 to 31st Dec 2021, and the X axis for the 2022-01 facet to run from 1st Jan 2022 to 31st Jan 2022.
Is there some way of forcing this within ggplot2?
An additional, smaller self-contained example to try your code on:
X.df <- tribble(
~date, ~month, ~Download,
"2021-12-01T00:30:36Z","2021-12",20.13,
"2021-12-07T06:30:31Z","2021-12",38.95,
"2021-12-14T08:00:31Z","2021-12",38.44,
"2021-12-21T09:30:29Z","2021-12",28.57,
"2021-12-28T16:00:31Z","2021-12",30.78,
"2021-12-31T13:00:28Z","2021-12",55.45,
"2022-01-01T00:00:28Z","2022-1",55.44,
"2022-01-01T02:30:29Z","2022-1",55.63,
"2022-01-01T03:00:29Z","2022-1",55.75,
"2022-01-01T05:00:29Z","2022-1",55.8,
"2022-01-07T03:00:29Z","2022-1",53.6,
"2022-01-07T05:00:29Z","2022-1",51.8
)
As always, thanks in advance. Pete
Updat II: Removed prior versions:
In your database there is only one january 2022 date
in the dataframe we complete the dates of januare of 2022 using complete from tidyr package.
library(tidyverse)
library(lubridate)
X.df %>%
mutate(date = ymd(date)) %>%
group_by(month(date)) %>%
complete(date = seq(min(date), max(ceiling_date(date, unit = "month") - ddays(1)), by = 'day')) %>%
fill(month) %>%
ggplot(aes(x = date, Download, colour = Download)) +
geom_line()+
facet_wrap(~ month, scale="free_x", ncol = 1) +
scale_colour_gradient(low="red",high="green", limits=c(0.0, 50), oob = squish) +
scale_x_date(date_breaks = "1 day", date_labels = "%d/%m", expand = c(0, 0)) +
coord_cartesian(ylim = c(0, 60))

How to align facet_wrap

I am having a problem aligning my graphics using facet_wrap(). I have multiple years of data but I am subsetting to display only two years. For some unknown reason the upper plot is shifted to the left and the lower plot is shifted to the right (See attached snapshot). My dataset is very large to post here so it can be downloaded from the link below if someone is willing to help me to align the plots. https://login.filesanywhere.com/fs/v.aspx?v=8c6b678a5c61707ab0ae
Here is the snapshot:
This is what I have tried:
library(writexl)
library(readxl)
library(tidyverse)
library(lubridate)
mydat <- read_excel("all.xlsx", sheet="Sheet1")
#subset 2 years
start <- 1998
end <- 1999
a <- dplyr::filter(mydat, year %in% start:end)
ggplot(a,aes(date,Salinity,color=Box)) +
geom_line(size=.8) +
theme(legend.position = 'none') +
scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) +
labs(x="",y= "test") +
facet_wrap(~year,ncol=1)
I will be subsetting up to 10 years in the future. I am also wondering what's the best to subset multiple years using dplyr or base. Thanks beforehand.
I get the following error after trying your suggestion:
subset 2 years
start <- 1998
end <- 2000
a <- mydat %>%
dplyr::filter(year %in% start:end) %>% # if you need to subset the years
mutate(date = as.Date(gsub("\\d{4}", "0000", date)))
ggplot(a,aes(date, Salinity,color=Box)) +
geom_line(size=.8) +
scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) +
labs(x="",y= "test") +
facet_wrap(~year,ncol=1,scales="free_x")
Error in charToDate(x) :
character string is not in a standard unambiguous format
Maybe 'date' doesn't like the zeros?
Question: Does it work for you using the dataset from the link?
The plot from the OP appears shifted because the date column in the data contains separate years. Thus, the ggplot function is displaying those dates chronologically along the x-axis, spanning years.
A solution is to change the year in all the dates in the date column to be the same (e.g., "1111"), while the actual year information is retained in the year column. The years in the year column is used to stratify the plot.
# example data
mydat <- data.frame(
date = as.Date(
c("1998-10-1", "1998-11-1", "1999-10-1", "1999-11-1", "1999-12-1")),
value = 1:5,
year = c(1998, 1998, 1999, 1999, 1999))
mydat
# date value year
#1 1998-10-01 1 1998
#2 1998-11-01 2 1998
#3 1999-10-01 3 1999
#4 1999-11-01 4 1999
#5 1999-12-01 5 1999
# solution by changing the year in all the dates the same
library(ggplot2)
mydat$date <- as.Date(gsub("\\d{4}", "1111", mydat$date))
ggplot(mydat, aes(date, value)) +
geom_line(size=.8) +
scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) +
labs(x="",y= "test") +
facet_wrap(~year, ncol = 1)
In gsub("\\d{4}", "1111", mydat$date), a regular expression is used for the pattern. It finds four consecutive digits and substitutes them with "1111".
EDIT
If you don't want to make a hard change to the actual date data, you can use pipe operators (%>%) and other functions from the dplyr package:
library(ggplot2)
library(dplyr) # or library(tidyverse) works, too
# subset 2 years
start <- 1998
end <- 1999
mydat %>%
filter(year %in% start:end) %>% # if you need to subset the years
mutate(date = as.Date(gsub("\\d{4}", "1111", date))) %>%
ggplot(aes(date, value)) +
geom_line(size = 0.8) +
scale_x_date(date_breaks = "1 month", date_labels = "%b", expand = c(0,0.5)) +
labs(x = "", y = "test") +
facet_wrap(~year, ncol = 1)
UPDATE NOTE
I noticed that the year 0000 (or with just 0) or any year that is 9999+ do not work in the OP's dataset. The year may have to be an integer between 1 and 9998. As long as the arbitrary year is the same throughout the date column, the plot will work as intended.

Why can't I get the right horizontal axis labels on my ggplot2 chart?

I am trying to do a faceted plot of a grouped dataframe with ggplot2, using geom_line(). My dataframe has a Date column and I would like to have dates on the horizontal axis. If I just use Date in aes(x=Date, ...) I get nice labels on the horizontal axis. However, the line has an almost horizontal section where the date jumps from the end of one group to the beginning of the next group. This code and chart shows that:
dts <- seq.Date(as.Date("2020-01-01"), as.Date("2021-12-31"), by="day")
mos <- sapply(dts, month)
df <- data.frame(Date=dts, Month=mos)
nr <- nrow(df)
df$X <- rep(1, nr)
df %>%
group_by(Month) -> dfgrp
dfgrp %>%
group_by(Month) %>%
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=Date, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I would not like my chart to have those almost-horizontal lines when the date changes by a large amount. I was able to generate a chart without those lines using integers on aes() as follows:
dfgrp %>%
mutate(Time = 1:n() %>% as.integer(),
Z = cumsum(X)) %>%
ggplot(aes(x=Time, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
scale_x_continuous(breaks = seq(from=1, to=nr, by=10) %>% as.integer(),
labels = function(x) as.character(dfgrp$Date[x])) +
theme(axis.text.x = element_text(angle=45, size=7))
The line on the chart looks like I want it but the dates on the horizontal axis are not correct: they end in February 2020 in every facet while the dates in the dataframe end in December 2021 and the dates in the first chart begin and end on different months in different facets.
I tried many things but nothing worked. Any suggestions on how to have a chart with dates like in the first chart above and lines like in the second chart above?
Help will be much appreciated.
You may want to adjust the dates to be in the same year, but noting the original year as a variable:
library(lubridate)
dfgrp %>%
group_by(Month) %>%
mutate(year = year(Date),
adj_date = ymd(paste(2020, month(Date), day(Date)))) %>%
# 2020 was leap year so 2/29 won't be lost
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=adj_date, y=Z, color = year, group = year)) +
geom_line(size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))

ggplot2 overlayed line chart by year?

Starting with the following dataset:
$ Orders,Year,Date
1608052.2,2019,2019-08-02
1385858.4,2018,2018-07-27
1223593.3,2019,2019-07-25
1200356.5,2018,2018-01-20
1198226.3,2019,2019-07-15
837866.1,2019,2019-07-02
Trying to make a similar format as:
with the criteria: X-axis will be days or months, y-axis will be sum of Orders, grouping / colors will be by year.
Attempts:
1) No overlay
dataset %>%
ggplot( aes(x=`Merge Date`, y=`$ Orders`, group=`Merge Date (Year)`, color=`Merge Date (Year)`)) +
geom_line()
2) ggplot month grouping
dataset %>%
mutate(Date = as.Date(`Date`) %>%
mutate(Year = format(Date,'%Y')) %>%
mutate(Month = format(Date,'%b')) -> dataset2
ggplot(data=dataset2, aes(x=Month, y=`$ Orders`, group=Year, color=factor(Year))) +
geom_line(size=.75) +
ylab("Volume")
The lubridate package is your answer. Extract month from the Date field and turn it into a variable. This code worked for me:
library(tidyverse)
library(lubridate)
dataset <- read_delim("OrderValue,Year,Date\n1608052.2,2019,2019-08-02\n1385858.4,2018,2018-07-27\n1223593.3,2019,2019-07-25\n1200356.5,2018,2018-01-20\n1198226.3,2019,2019-07-15\n837866.1,2019,2019-07-02", delim = ",")
dataset <- dataset %>%
mutate(theMonth = month(Date))
ggplot(dataset, aes(x = as.factor(theMonth), y = OrderValue, group = as.factor(Year), color = as.factor(Year))) +
geom_line()

Resources