Similar to this question: Split up time series per year for plotting which has done in Python, I want to display the daily time series as multiple lines by year. How can I achieve this in R?
library(ggplot2)
library(dplyr)
# Dummy data
df <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) + seq(-140, 224)^2 / 10000
)
# Most basic bubble plot
p <- ggplot(df, aes(x=day, y=value)) +
geom_line() +
xlab("")
p
Out:
One solution is using ggplot2, but date_labels are displayed incorrectly:
library(tidyverse)
library(lubridate)
p <- df %>%
# mutate(date = ymd(date)) %>%
mutate(date=as.Date(date)) %>%
mutate(
year = factor(year(date)), # use year to define separate curves
date = update(date, year = 1) # use a constant year for the x-axis
) %>%
ggplot(aes(date, value, color = year)) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
# Raw daily data
p + geom_line()
Out:
Alternative solution is to use gg_season from feasts package:
library(feasts)
library(tsibble)
library(dplyr)
tsibbledata::aus_retail %>%
filter(
State == "Victoria",
Industry == "Cafes, restaurants and catering services"
) %>%
gg_season(Turnover)
Out:
References:
Split up time series per year for plotting
R - How to create a seasonal plot - Different lines for years
If you want your x axis to represent the months from January to February, then perhaps getting the yday of the date and adding it to the first of January on a random year would be simplest:
library(tidyverse)
library(lubridate)
df <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) + seq(-140, 224)^2 / 10000
)
df %>%
mutate(year = factor(year(day)), date = yday(day) + as.Date('2017-01-01')) %>%
ggplot(aes(date, value, color = year)) +
geom_line() +
scale_x_date(breaks = seq(as.Date('2017-01-01'), by = 'month', length = 12),
date_labels = '%b')
Created on 2023-02-07 with reprex v2.0.2
I tend to think simple is better:
transform(df, year = format(day, "%Y")) |>
ggplot(aes(x=day, y=value, group=year, color=year)) +
geom_line() +
xlab(NULL)
optionally removing the year legend with + guides(colour = "none").
Related
I am trying to do a faceted plot of a grouped dataframe with ggplot2, using geom_line(). My dataframe has a Date column and I would like to have dates on the horizontal axis. If I just use Date in aes(x=Date, ...) I get nice labels on the horizontal axis. However, the line has an almost horizontal section where the date jumps from the end of one group to the beginning of the next group. This code and chart shows that:
dts <- seq.Date(as.Date("2020-01-01"), as.Date("2021-12-31"), by="day")
mos <- sapply(dts, month)
df <- data.frame(Date=dts, Month=mos)
nr <- nrow(df)
df$X <- rep(1, nr)
df %>%
group_by(Month) -> dfgrp
dfgrp %>%
group_by(Month) %>%
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=Date, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I would not like my chart to have those almost-horizontal lines when the date changes by a large amount. I was able to generate a chart without those lines using integers on aes() as follows:
dfgrp %>%
mutate(Time = 1:n() %>% as.integer(),
Z = cumsum(X)) %>%
ggplot(aes(x=Time, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
scale_x_continuous(breaks = seq(from=1, to=nr, by=10) %>% as.integer(),
labels = function(x) as.character(dfgrp$Date[x])) +
theme(axis.text.x = element_text(angle=45, size=7))
The line on the chart looks like I want it but the dates on the horizontal axis are not correct: they end in February 2020 in every facet while the dates in the dataframe end in December 2021 and the dates in the first chart begin and end on different months in different facets.
I tried many things but nothing worked. Any suggestions on how to have a chart with dates like in the first chart above and lines like in the second chart above?
Help will be much appreciated.
You may want to adjust the dates to be in the same year, but noting the original year as a variable:
library(lubridate)
dfgrp %>%
group_by(Month) %>%
mutate(year = year(Date),
adj_date = ymd(paste(2020, month(Date), day(Date)))) %>%
# 2020 was leap year so 2/29 won't be lost
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=adj_date, y=Z, color = year, group = year)) +
geom_line(size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I have the dataframe below :
name<-c("John","John","John","John2","John2","John2")
Dealer<-c("ASD","ASD","ASD","ASDG","ASDF","ASD")
Date<-c("2020-01-03","2020-01-04","2020-01-05","2020-02-03","2020-02-04","2020-02-05")
dataset<-data.frame(name,Dealer,Date)
and I want a monthly trend visualization of the count of name , filterable by Dealer.
I have reached to the code below but I do not know how to find the count of each name. I feel that I have to convert my dataframe somehow.
library(ggplot2)
ggplot(dataset, aes(x = Date, y = , color = Dealer)) +
geom_line() +
scale_x_date(date_breaks = "1 months", date_labels = "%b '%y") +
theme_minimal()
*edited dataframe with a dataset with all values being the same in name and Dealer
name<-c("John","John","John","John","John","John","John")
Dealer<-c("ASD","ASD","ASD","ASD","ASD","ASD","ASD")
Date<-c("2020-01-03","2020-01-04","2020-01-05","2020-01-06","2020-01-07","2020-01-08","2020-01-09")
dataset<-data.frame(name,Dealer,Date)
Maybe something like this:
library(tidyverse); library(lubridate)
dataset %>%
# Convert "Date" into date form
mutate(Date = ymd(Date)) %>%
# Count how many occasions of each name-Dealer-month combo
count(name, Dealer, month = floor_date(Date, "month")) %>%
# Add rows for missing months for each existing name-Dealer combo
complete(month, nesting(name, Dealer), fill = list(n = 0)) %>%
ggplot(aes(month, n, color = name)) +
geom_line() +
scale_x_date(date_breaks = "1 months", date_labels = "%b\n'%y") +
theme_minimal() +
facet_wrap(~Dealer)
Starting with the following dataset:
$ Orders,Year,Date
1608052.2,2019,2019-08-02
1385858.4,2018,2018-07-27
1223593.3,2019,2019-07-25
1200356.5,2018,2018-01-20
1198226.3,2019,2019-07-15
837866.1,2019,2019-07-02
Trying to make a similar format as:
with the criteria: X-axis will be days or months, y-axis will be sum of Orders, grouping / colors will be by year.
Attempts:
1) No overlay
dataset %>%
ggplot( aes(x=`Merge Date`, y=`$ Orders`, group=`Merge Date (Year)`, color=`Merge Date (Year)`)) +
geom_line()
2) ggplot month grouping
dataset %>%
mutate(Date = as.Date(`Date`) %>%
mutate(Year = format(Date,'%Y')) %>%
mutate(Month = format(Date,'%b')) -> dataset2
ggplot(data=dataset2, aes(x=Month, y=`$ Orders`, group=Year, color=factor(Year))) +
geom_line(size=.75) +
ylab("Volume")
The lubridate package is your answer. Extract month from the Date field and turn it into a variable. This code worked for me:
library(tidyverse)
library(lubridate)
dataset <- read_delim("OrderValue,Year,Date\n1608052.2,2019,2019-08-02\n1385858.4,2018,2018-07-27\n1223593.3,2019,2019-07-25\n1200356.5,2018,2018-01-20\n1198226.3,2019,2019-07-15\n837866.1,2019,2019-07-02", delim = ",")
dataset <- dataset %>%
mutate(theMonth = month(Date))
ggplot(dataset, aes(x = as.factor(theMonth), y = OrderValue, group = as.factor(Year), color = as.factor(Year))) +
geom_line()
I have a dataframe in R where:
Date MeanVal
2002-01 37.70722
2002-02 43.50683
2002-03 45.31268
2002-04 14.96000
2002-05 29.95932
2002-09 52.95333
2002-10 12.15917
2002-12 53.55144
2003-03 41.15083
2003-04 21.26365
2003-05 33.14714
2003-07 66.55667
.
.
2011-12 40.00518
And when I plot a time series using ggplot with:
ggplot(mean_data, aes(Date, MeanVal, group =1)) + geom_line()+xlab("")
+ ylab("Mean Value")
I am getting:
but as you can see, the x axis scale is not very neat at all. Is there any way I could just scale it by year (2002,2003,2004..2011)?
Let's use lubridate's parse_date_time() to convert your Date to a date class:
library(tidyverse)
library(lubridate)
mean_data %>%
mutate(Date = parse_date_time(as.character(Date), "Y-m")) %>%
ggplot(aes(Date, MeanVal)) +
geom_line()
Similarly, we can convert to an xts and use autoplot():
library(timetk)
mean_data %>%
mutate(Date = parse_date_time(as.character(Date), "Y-m")) %>%
tk_xts(silent = T) %>%
autoplot()
This achieves the plot above as well.
library(dplyr)
mean_data %>%
mutate(Date = as.integer(gsub('-.*', '', Date)) %>%
#use the mutate function in dplyr to remove the month and cast the
#remaining year value as an integer
ggplot(aes(Date, MeanVal, group = 1)) + geom_line() + xlab("")
+ ylab("Mean Value")
Building on this question and the use of "water year" in R I have question regarding plotting in ggplot2 with a common date axis over many years. A water year is definitely the start of the year to be October 1st ending September 30. It is a little more sensible for the hydrological cycle.
So say I have this data set:
library(dplyr)
library(ggplot2)
library(lubridate)
df <- data.frame(Date=seq.Date(as.Date("1910/1/1"), as.Date("1915/1/1"), "days"),
y=rnorm(1827,100,1))
Then here is the wtr_yr function:
wtr_yr <- function(dates, start_month=10) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(dates)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}
What I would like to do is use colour as a grouping variable, then make a x axes that only consists of month and date information. Usually I've done like so (using the lubridate package):
ymd(paste0("1900","-",month(df$Date),"-",day(df$Date)))
This works fine if year is arranged normally. However in this water year scenario, the real year span the water year. So ideally I'd like a plot that goes from October 1 to September 30 and plot separate lines for each water year maintaining all the correct water years. Here is where I am so far:
df1 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
mutate(CDate=as.Date(paste0("1900","-",month(Date),"-",day(Date))))
df1 <- %>%
ggplot(aes(x=CDate, y=y, colour=wtr_yrVAR)) +
geom_point()
So plotting that obviously date spans from Jan to Dec. Any ideas how I can force ggplot2 to plot these along the water year lines?
Here is a method that works:
df3 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
#seq along dates starting with the beginning of your water year
mutate(CDate=as.Date(paste0(ifelse(month(Date) < 10, "1901", "1900"),
"-", month(Date), "-", day(Date))))
Then:
df3 %>%
ggplot(., aes(x = CDate, y = y, colour = wtr_yrVAR)) +
geom_point() +
scale_x_date(date_labels = "%b %d")
Which gives:
not very elegant but this should work:
df1 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
mutate(CDdate= as.Date(as.numeric(Date - as.Date(paste0(wtr_yrVAR,"-10-01"))), origin = "1900-10-01"))
df1 %>% ggplot(aes(x =CDdate, y=y, colour=wtr_yrVAR)) +
geom_line() + theme_bw()+scale_x_date(date_breaks = "1 month", date_labels = "%b", limits = c(as.Date("1899-09-30"),as.Date("1900-10-01")))+theme_bw()