ggplot2: Facetting by year and aligning x-axis dates by month - r

I am trying to plot daily data with days of the week (Monday:Sunday) on the y-axis, week of the year on the x-axis with monthly labels (January:December), and facet by year with each facet as its own row. I want the week of the year to align between the facets. I also want each tile to be square.
Here is a toy dataset to work with:
my_data <- tibble(Date = seq(
as.Date("1/11/2013", "%d/%m/%Y"),
as.Date("31/12/2014", "%d/%m/%Y"),
"days"),
Value = runif(length(VectorofDates)))
One solution I came up with is to use lubridate::week() to number the weeks and plot by week. This correctly aligns the x-axis between the facets. The problem is, I can't figure out how to label the x-axis with monthly labels.
my_data %>%
mutate(Week = week(Date)) %>%
mutate(Weekday = wday(Date, label = TRUE, week_start = 1)) %>%
mutate(Year = year(Date)) %>%
ggplot(aes(fill = Value, x = Week, y = Weekday)) +
geom_tile() +
theme_bw() +
facet_grid(Year ~ .) +
coord_fixed()
Alternatively, I tried plotting by the first day of the week using lubridate::floor_date and lubridate::round_date. In this solution, the x-axis is correctly labeled, but the weeks don't align between the two years. Also, the tiles aren't perfectly square, though I think this could be fixed by playing around with the coord_fixed ratio.
my_data %>%
mutate(Week = floor_date(Date, week_start = 1),
Week = round_date(Week, "week", week_start = 1)) %>%
mutate(Weekday = wday(Date, label = TRUE, week_start = 1)) %>%
mutate(Year = year(Date)) %>%
ggplot(aes(fill = Value, x = Week, y = Weekday)) +
geom_tile() +
theme_bw() +
facet_grid(Year ~ .) +
scale_x_datetime(name = NULL, labels = label_date("%b")) +
coord_fixed(7e5)
Any suggestions of how to get the columns to align correctly by week of the year while labeling the months correctly?

The concept is a little flawed, since the same week of the year is not guaranteed to fall in the same month. However, you can get a "close enough" solution by using the labels and breaks argument of scale_x_continuous. The idea here is to write a function which takes a number of weeks, adds 7 times this value as a number of days onto an arbitrary year's 1st January, then format it as month name only using strftime:
my_data %>%
mutate(Week = week(Date)) %>%
mutate(Weekday = wday(Date, label = TRUE, week_start = 1)) %>%
mutate(Year = year(Date)) %>%
ggplot(aes(fill = Value, x = Week, y = Weekday)) +
geom_tile() +
theme_bw() +
facet_grid(Year ~ .) +
coord_fixed() +
scale_x_continuous(labels = function(x) {
strftime(as.Date("2000-01-01") + 7 * x, "%B")
}, breaks = seq(1, 52, 4.2))
Another option if you're sick of reinventing the wheel is to use the calendarHeat function in the Github-only makeR package:
install_github("jbryer/makeR")
library(makeR)
calendarHeat(my_data$Date, my_data$Value)

Related

Create subplots with ggplot2

So I need to work with the hflights package and make subplots of every single weekday to show the delay of the airplanes (cancelled flights excluded). The problem is I'm not able to reproduce both x- and y-axis (x: month & y: delay in in min). I tried to use facet_wrap and facet_grid, but I'm not familiar to those function, because of not using ggplot2 that often.
The plot will be clearer if you name the months and weekdays, arrange them in the correct order, and use a logarithmic scale on the y axis. You can use facet_grid to create subplots for each weekday.
library(hflights)
library(tidyverse)
weekday <- c('Sunday', 'Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday')
hflights %>%
mutate(WeekDay = factor(weekday[DayOfWeek], weekday),
Month = factor(month.abb[Month], month.abb)) %>%
filter(Cancelled == 0) %>%
ggplot(aes(Month, DepDelay)) +
geom_boxplot() +
scale_y_log10() +
facet_wrap(.~WeekDay) +
labs(x = 'Month', y = 'Departure delay (log scale)')
To get a single line going through each panel, you need to have an average for each unique combination of month and day. The simplest way to get this is via geom_smooth
hflights %>%
mutate(WeekDay = factor(weekday[DayOfWeek], weekday),
Month = factor(month.abb[Month], month.abb)) %>%
filter(Cancelled == 0) %>%
ggplot(aes(Month, DepDelay, group = WeekDay)) +
geom_smooth(se = FALSE, color = 'black') +
facet_wrap(.~WeekDay) +
labs(x = 'Month', y = 'Departure delay (log scale)')
Though you can also summarize the data yourself and use geom_line
hflights %>%
filter(Cancelled == 0) %>%
mutate(WeekDay = factor(weekday[DayOfWeek], weekday),
Month = factor(month.abb[Month], month.abb)) %>%
group_by(Month, WeekDay) %>%
summarize(Delay = mean(DepDelay)) %>%
ggplot(aes(Month, Delay, group = WeekDay)) +
geom_line(color = 'black') +
facet_wrap(.~WeekDay) +
labs(x = 'Month', y = 'Departure delay (log scale)')
now since you didn't post the code let's assume you have saved a plot that plots everything in one plot under a.
a + facet_grid(rows = vars(weekday))
("weekday" is the column name where the weekdays are in, replace it if they are named diffrently)
If this isn't what you were searching for, it would be great if you could post some code...
Suppose you want to show the ArrDelay,
hflights %>%
filter(Cancelled!=1) %>%
ggplot(aes(x=as.factor(Month), y=mean(ArrDelay,na.rm=T)))+
geom_col()+
labs(x='Month',y='Mean arrival Delay')+
facet_wrap(~DayOfWeek)

How to order time in y axis

I have a data frame (a tibble) like this:
library(tidyverse)
library(lubridate)
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
value=c("on", "off", "on", "off"))
x$day<- as.factor(day(x$date))
x$time <- paste0(str_pad(hour(x$date),2,pad="0"),":",str_pad(minute(x$date),2,pad="0"))
When I plot the data:
x %>% ggplot() + geom_col(aes(x=day,y=time, fill=value))
the times in the y axis do not follow the bars. Each time data is supposed to be side by side with each bar segment.
I tried using as.factor(time) but that didn't solve.
I also tried to add a numeric scale:
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
fake_y=c(1,1,1,1)
value=c("on", "off", "on", "off"))
x %>% ggplot() + geom_col(aes(x=day,y=fake_y, fill=value))
but then the order of the on/off bars is lost.
How can I fix this?
Since you are looking for a time line, you would probably be best with geom_segment rather than geom_col. The reason is that since you might have multiple 'on' or 'off' values in a single day, it would be difficult to get these to stack correctly. You would also need to diff the on-off times to get them to stack. Furthermore, your labels would be wrong using columns if "off" represents the time of going from an on state to an off state.
When working with times in R, it is often best to keep them in time format for plotting. If you convert times to character strings before plotting, they will be interpreted as factor levels, and therefore will not be proportionately spaced correctly.
Since you want to have the day along one axis, you will need quite a bit of data manipulation to ensure that you record the state at the start of each day and the end of each day, but it can be achieved by doing:
p <- x %>%
mutate(date = as.POSIXct(date)) %>%
mutate(day = as.factor(day(date))) %>%
group_by(day) %>%
group_modify(~ add_row(.x,
date = floor_date(as.POSIXct(first(.x$date)), 'day'),
value = ifelse(first(.x$value) == 'on', 'off', 'on'),
.before = 1)) %>%
group_modify(~ add_row(.x,
date = ceiling_date(as.POSIXct(last(.x$date)), 'day') - 1,
value = last(.x$value))) %>%
mutate(ends = lead(date)) %>%
filter(!is.na(ends)) %>%
mutate(date = hms::as_hms(date), ends = hms::as_hms(ends)) %>%
ggplot(aes(x = day, y = date)) +
geom_segment(aes(xend = day, yend = ends, color = value),
size = 20) +
coord_cartesian(ylim = c(25120, 26500)) +
labs(y = 'time') +
guides(color = guide_legend(override.aes = list(size = 8)))
p
And of course, you can easily flip the co-ordinates if you wish, and apply theme elements to make the plot more appealing:
p + coord_flip(ylim = c(25120, 26500)) +
scale_color_manual(values = c('deepskyblue4', 'orange')) +
theme_light(base_size = 16)

Why can't I get the right horizontal axis labels on my ggplot2 chart?

I am trying to do a faceted plot of a grouped dataframe with ggplot2, using geom_line(). My dataframe has a Date column and I would like to have dates on the horizontal axis. If I just use Date in aes(x=Date, ...) I get nice labels on the horizontal axis. However, the line has an almost horizontal section where the date jumps from the end of one group to the beginning of the next group. This code and chart shows that:
dts <- seq.Date(as.Date("2020-01-01"), as.Date("2021-12-31"), by="day")
mos <- sapply(dts, month)
df <- data.frame(Date=dts, Month=mos)
nr <- nrow(df)
df$X <- rep(1, nr)
df %>%
group_by(Month) -> dfgrp
dfgrp %>%
group_by(Month) %>%
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=Date, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I would not like my chart to have those almost-horizontal lines when the date changes by a large amount. I was able to generate a chart without those lines using integers on aes() as follows:
dfgrp %>%
mutate(Time = 1:n() %>% as.integer(),
Z = cumsum(X)) %>%
ggplot(aes(x=Time, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
scale_x_continuous(breaks = seq(from=1, to=nr, by=10) %>% as.integer(),
labels = function(x) as.character(dfgrp$Date[x])) +
theme(axis.text.x = element_text(angle=45, size=7))
The line on the chart looks like I want it but the dates on the horizontal axis are not correct: they end in February 2020 in every facet while the dates in the dataframe end in December 2021 and the dates in the first chart begin and end on different months in different facets.
I tried many things but nothing worked. Any suggestions on how to have a chart with dates like in the first chart above and lines like in the second chart above?
Help will be much appreciated.
You may want to adjust the dates to be in the same year, but noting the original year as a variable:
library(lubridate)
dfgrp %>%
group_by(Month) %>%
mutate(year = year(Date),
adj_date = ymd(paste(2020, month(Date), day(Date)))) %>%
# 2020 was leap year so 2/29 won't be lost
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=adj_date, y=Z, color = year, group = year)) +
geom_line(size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))

Create a monthly trend line chart with the count of a value as y variable

I have the dataframe below :
name<-c("John","John","John","John2","John2","John2")
Dealer<-c("ASD","ASD","ASD","ASDG","ASDF","ASD")
Date<-c("2020-01-03","2020-01-04","2020-01-05","2020-02-03","2020-02-04","2020-02-05")
dataset<-data.frame(name,Dealer,Date)
and I want a monthly trend visualization of the count of name , filterable by Dealer.
I have reached to the code below but I do not know how to find the count of each name. I feel that I have to convert my dataframe somehow.
library(ggplot2)
ggplot(dataset, aes(x = Date, y = , color = Dealer)) +
geom_line() +
scale_x_date(date_breaks = "1 months", date_labels = "%b '%y") +
theme_minimal()
*edited dataframe with a dataset with all values being the same in name and Dealer
name<-c("John","John","John","John","John","John","John")
Dealer<-c("ASD","ASD","ASD","ASD","ASD","ASD","ASD")
Date<-c("2020-01-03","2020-01-04","2020-01-05","2020-01-06","2020-01-07","2020-01-08","2020-01-09")
dataset<-data.frame(name,Dealer,Date)
Maybe something like this:
library(tidyverse); library(lubridate)
dataset %>%
# Convert "Date" into date form
mutate(Date = ymd(Date)) %>%
# Count how many occasions of each name-Dealer-month combo
count(name, Dealer, month = floor_date(Date, "month")) %>%
# Add rows for missing months for each existing name-Dealer combo
complete(month, nesting(name, Dealer), fill = list(n = 0)) %>%
ggplot(aes(month, n, color = name)) +
geom_line() +
scale_x_date(date_breaks = "1 months", date_labels = "%b\n'%y") +
theme_minimal() +
facet_wrap(~Dealer)

ggplot2 overlayed line chart by year?

Starting with the following dataset:
$ Orders,Year,Date
1608052.2,2019,2019-08-02
1385858.4,2018,2018-07-27
1223593.3,2019,2019-07-25
1200356.5,2018,2018-01-20
1198226.3,2019,2019-07-15
837866.1,2019,2019-07-02
Trying to make a similar format as:
with the criteria: X-axis will be days or months, y-axis will be sum of Orders, grouping / colors will be by year.
Attempts:
1) No overlay
dataset %>%
ggplot( aes(x=`Merge Date`, y=`$ Orders`, group=`Merge Date (Year)`, color=`Merge Date (Year)`)) +
geom_line()
2) ggplot month grouping
dataset %>%
mutate(Date = as.Date(`Date`) %>%
mutate(Year = format(Date,'%Y')) %>%
mutate(Month = format(Date,'%b')) -> dataset2
ggplot(data=dataset2, aes(x=Month, y=`$ Orders`, group=Year, color=factor(Year))) +
geom_line(size=.75) +
ylab("Volume")
The lubridate package is your answer. Extract month from the Date field and turn it into a variable. This code worked for me:
library(tidyverse)
library(lubridate)
dataset <- read_delim("OrderValue,Year,Date\n1608052.2,2019,2019-08-02\n1385858.4,2018,2018-07-27\n1223593.3,2019,2019-07-25\n1200356.5,2018,2018-01-20\n1198226.3,2019,2019-07-15\n837866.1,2019,2019-07-02", delim = ",")
dataset <- dataset %>%
mutate(theMonth = month(Date))
ggplot(dataset, aes(x = as.factor(theMonth), y = OrderValue, group = as.factor(Year), color = as.factor(Year))) +
geom_line()

Resources