smarter way relevel column containing alphanumerical date in R - r

so for data below, I am looking for a neat way to help R understand Dec 2009 should come before Jan 2022 and so on. I can achieve this by releving the factor column maually. but is there another way in case I have a broad range of date.
library(dplyr)
library(ggplot2)
df <- data.frame(date = c("Jan-2010", "Feb-2010", "Mar-2010", "Apr-2010" ,"Dec-2009"),
value = c(2,1,4,3, 2))
df %>%
ggplot() +
geom_line(aes(x=date,
y=value),
group=1)

Lubridate is great for these types of issues. We can use lubridate's ym function:
library(lubridate)
df <- data.frame(date = c("Jan-2010", "Feb-2010",
"Mar-2010", "Apr-2010" ,"Dec-2009"), value = c(2,1,4,3, 2))
df$date <- lubridate::my(df$date)
df %>% ggplot() +
geom_line(aes(x=date,
y=value),
group=1)+
scale_x_date(date_labels="%b %Y")

Try the code below:
df %>%
mutate(date = zoo::as.yearmon(date, "%b-%Y")) %>%
ggplot() +
geom_line(aes(x=date,
y=value),
group=1) +
scale_x_continuous(labels = df$date)

Related

Plotting one daily time serie per year in R (ggplot2)

Similar to this question: Split up time series per year for plotting which has done in Python, I want to display the daily time series as multiple lines by year. How can I achieve this in R?
library(ggplot2)
library(dplyr)
# Dummy data
df <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) + seq(-140, 224)^2 / 10000
)
# Most basic bubble plot
p <- ggplot(df, aes(x=day, y=value)) +
geom_line() +
xlab("")
p
Out:
One solution is using ggplot2, but date_labels are displayed incorrectly:
library(tidyverse)
library(lubridate)
p <- df %>%
# mutate(date = ymd(date)) %>%
mutate(date=as.Date(date)) %>%
mutate(
year = factor(year(date)), # use year to define separate curves
date = update(date, year = 1) # use a constant year for the x-axis
) %>%
ggplot(aes(date, value, color = year)) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
# Raw daily data
p + geom_line()
Out:
Alternative solution is to use gg_season from feasts package:
library(feasts)
library(tsibble)
library(dplyr)
tsibbledata::aus_retail %>%
filter(
State == "Victoria",
Industry == "Cafes, restaurants and catering services"
) %>%
gg_season(Turnover)
Out:
References:
Split up time series per year for plotting
R - How to create a seasonal plot - Different lines for years
If you want your x axis to represent the months from January to February, then perhaps getting the yday of the date and adding it to the first of January on a random year would be simplest:
library(tidyverse)
library(lubridate)
df <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) + seq(-140, 224)^2 / 10000
)
df %>%
mutate(year = factor(year(day)), date = yday(day) + as.Date('2017-01-01')) %>%
ggplot(aes(date, value, color = year)) +
geom_line() +
scale_x_date(breaks = seq(as.Date('2017-01-01'), by = 'month', length = 12),
date_labels = '%b')
Created on 2023-02-07 with reprex v2.0.2
I tend to think simple is better:
transform(df, year = format(day, "%Y")) |>
ggplot(aes(x=day, y=value, group=year, color=year)) +
geom_line() +
xlab(NULL)
optionally removing the year legend with + guides(colour = "none").

Lubridatate month() for multiple years

I'd like to make a plot for 2019 and 2020, but I'm running into a problem with the month() function from lubridate.
If I run this:
df %>%
mutate (Month = month(Date, label=T)) %>%
group_by(Month, Var1) %>%
summarize (sum = sum(numeric_variable)) %>%
ggplot(aes(Month, sum)) +
geom_col() +
facet_wrap(. ~ Var1, scales ="free_y")
The data for January 2019 and 2020 and other months are combined in the plot, which makes sense since they're both labelled as 'Jan' in the Month variable for both 2019 and 2020.
How can I best separate the months for 2019 and 2020 while still keeping the label ('janb, 'feb') and the order of my Month variable? Do I have to reorder them as a factor manually or is there a better way?
Lubridate is nice for some things, but I much prefer zoo::as.yearmon for months and years. There is even a nice scale_x_yearmon function for ggplot:
library(zoo)
df %>%
mutate (Month = zoo::as.yearmon(Date)) %>%
group_by(Month, Var1) %>%
summarize (sum = sum(numeric_variable)) %>%
ggplot(aes(Month, sum)) +
geom_col() +
facet_wrap(. ~ Var1, scales ="free_y") +
zoo::scale_x_yearmon(format = "%b")
Sample data:
set.seed(123)
df <- data.frame(Date = rep(seq(as.Date("2019-01-01"),as.Date("2020-12-31"), by = "day"),2),
Var1 = rep(LETTERS[1:2],each = 731),
numeric_variable = round(runif(2*731,1,100)))
Thanks Ian you set me on the right path. The scale_x_yearmon didn't work for me though. But the since order was right, so I could just convert the outcome of zoo's yearmon a factor and work from there.
df %>%
mutate (Month = as.factor(zoo::as.yearmon(Date))) %>%
group_by(Month, Var1) %>%
ggplot(aes(Month, numeric_variable)) +
geom_col() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
I know have a 'jan 2019'/ 'dec 2019' and 'jan 2020' / 'dec 2020' at the x-axis which is good enough for now. Issue solved.

Order of weekdays in a bar chart using lubridate and ggplot

When creating a bar plot where the x-axis is Weekdays (using lubridate and ggplot), the order of my weekdays does't follow the chronological "mon, tues, wed... "
I've tried to replicate the problem but as you can see below, in my replication, the order of labels on the x-axis if fine.
library(lubridate)
library(dplyr)
my_data <- data.frame(dates = sample(seq(as.Date('2010/01/01'), as.Date('2020/01/01'), by="day"), 100),
group = rep(c(1,2,3,4), times = 5))
my_data <- my_data %>%
mutate(Weekdays = wday(dates, label = TRUE)) %>%
filter(Weekdays != "Sat" &
Weekdays != "Sun")
ggplot(my_data, aes(Weekdays))+
geom_bar()+
facet_wrap(~group)
Output:
The code below is what I have used with the real data. As you can see the week days are not in order. I'm not sure where my actual code differs from the attempted replication above.
library(ggplot2)
df1 <- filter(df, wday != "Sat")
ggplot(df1, aes(x = wday))+
geom_bar()+
facet_wrap(~Grouping)+
theme(axis.text.x = element_text(angle =90, hjust = 1))
Output:
Any help / advice would be most appreciated.
I think this would be the solution
my_data %>%
mutate(weekday = fct_reorder(weekdays(dates, abbreviate = TRUE), # generating week days
wday(dates))) %>%
filter(!weekday %in% c("Sat", "Sun")) %>% # here you filter saturday and sunday
ggplot(aes(weekday)) +
geom_bar() +
facet_wrap(~ group)
As you can see, the wday function gives the ordered numbers that can be used with fct_reorder function.

ggplot2 overlayed line chart by year?

Starting with the following dataset:
$ Orders,Year,Date
1608052.2,2019,2019-08-02
1385858.4,2018,2018-07-27
1223593.3,2019,2019-07-25
1200356.5,2018,2018-01-20
1198226.3,2019,2019-07-15
837866.1,2019,2019-07-02
Trying to make a similar format as:
with the criteria: X-axis will be days or months, y-axis will be sum of Orders, grouping / colors will be by year.
Attempts:
1) No overlay
dataset %>%
ggplot( aes(x=`Merge Date`, y=`$ Orders`, group=`Merge Date (Year)`, color=`Merge Date (Year)`)) +
geom_line()
2) ggplot month grouping
dataset %>%
mutate(Date = as.Date(`Date`) %>%
mutate(Year = format(Date,'%Y')) %>%
mutate(Month = format(Date,'%b')) -> dataset2
ggplot(data=dataset2, aes(x=Month, y=`$ Orders`, group=Year, color=factor(Year))) +
geom_line(size=.75) +
ylab("Volume")
The lubridate package is your answer. Extract month from the Date field and turn it into a variable. This code worked for me:
library(tidyverse)
library(lubridate)
dataset <- read_delim("OrderValue,Year,Date\n1608052.2,2019,2019-08-02\n1385858.4,2018,2018-07-27\n1223593.3,2019,2019-07-25\n1200356.5,2018,2018-01-20\n1198226.3,2019,2019-07-15\n837866.1,2019,2019-07-02", delim = ",")
dataset <- dataset %>%
mutate(theMonth = month(Date))
ggplot(dataset, aes(x = as.factor(theMonth), y = OrderValue, group = as.factor(Year), color = as.factor(Year))) +
geom_line()

Time series label in R

I have a dataframe in R where:
Date MeanVal
2002-01 37.70722
2002-02 43.50683
2002-03 45.31268
2002-04 14.96000
2002-05 29.95932
2002-09 52.95333
2002-10 12.15917
2002-12 53.55144
2003-03 41.15083
2003-04 21.26365
2003-05 33.14714
2003-07 66.55667
.
.
2011-12 40.00518
And when I plot a time series using ggplot with:
ggplot(mean_data, aes(Date, MeanVal, group =1)) + geom_line()+xlab("")
+ ylab("Mean Value")
I am getting:
but as you can see, the x axis scale is not very neat at all. Is there any way I could just scale it by year (2002,2003,2004..2011)?
Let's use lubridate's parse_date_time() to convert your Date to a date class:
library(tidyverse)
library(lubridate)
mean_data %>%
mutate(Date = parse_date_time(as.character(Date), "Y-m")) %>%
ggplot(aes(Date, MeanVal)) +
geom_line()
Similarly, we can convert to an xts and use autoplot():
library(timetk)
mean_data %>%
mutate(Date = parse_date_time(as.character(Date), "Y-m")) %>%
tk_xts(silent = T) %>%
autoplot()
This achieves the plot above as well.
library(dplyr)
mean_data %>%
mutate(Date = as.integer(gsub('-.*', '', Date)) %>%
#use the mutate function in dplyr to remove the month and cast the
#remaining year value as an integer
ggplot(aes(Date, MeanVal, group = 1)) + geom_line() + xlab("")
+ ylab("Mean Value")

Resources