plotting daily distribution of a time series data in R - r

I have a time series data (date column and a value column). I am trying for a daily distribution plot.
In the below image is the weekly distribution plot that plots the values of the days of the week. Similarly I am trying to plot a daily distribution plot where x axis would be months, y axis is the value and the plot has 10 lines where each line gives you the date 1, date 2 , date 3 and so on until date 10 (since 30 days in one subplot will be clumsy so i wanted to divide the plots into 3 , 1-10, 11-20 and 21-31)
Code for weekly distribution for reference:
#dummy data
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2021-12-31")
date_seq <- seq(from = start_date, to = end_date, by = "day")
set.seed(123)
value <- round(runif(length(date_seq), min = 10000, max = 100000000), 0)
df <- data.frame(date = date_seq, value = value)
df$week_number <- as.numeric(format(as.Date(df$date), "%U")) + 1
df$weekday <- weekdays(as.Date(df$date))
df$year <- as.numeric(format(as.Date(df$date), "%Y"))
years <- unique(df$year)
# Create a list of ggplots, one for each year
plots <- lapply(years, function(y) {
year_df <- df[df$year == y, ]
ggplot(year_df, aes(x = week_number, y = value, color = weekday)) +
geom_line() +
scale_color_discrete(limits = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")) +
ggtitle(paste("Weekday Distribution", y)) +
xlab("Week number") +
ylab("Value") +
theme(legend.key.size = unit(0.4, "cm")) +
theme(plot.title = element_text(hjust = 0.5, vjust = 1.5))
library(cowplot)
plot_grid(plotlist = plots, ncol = 1)
So at the end, there will be three plots(1 to 10 dates, 11 to 20 dates and 21 to 31 dates) and each plot would contain 2 subplots (as the dates ranges from 2020 to 2021). Can anyone help me with this?

Below how I would do this. The lubridate package is your friend. For the grouping, use cuts.
The result is a (in my opinion) pretty useless clutter of lines. But this is not the only reason why I do not endorse this visualisation. I feel this somehow defeats the point of a time series... one point is to visualise the auto-correlation of your data. Artificially separating out only specific days from each month impacts drastically on this particular advantage (and maybe: reason) of using a time series. You're not only losing information, but also making your own analytical life much more complicated.
library(ggplot2)
library(dplyr)
library(lubridate)
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = month, y = value, color = day, group=interaction(day, day_group))) +
geom_line() +
theme(legend.key.size = unit(0.4, "cm"),
plot.title = element_text(hjust = 0.5, vjust = 1.5),
axis.text.x = element_text(angle = 90)) +
facet_wrap(year~day_group)
I feel you want to show how the "typical" 1st day compares with the 2nd, etc. For this, an aggregate visualisation might be more useful. (Still not a good idea, but at least you get a better idea of your data). This you can do with "stat_summary" which you pass to geom_smooth which has a geometry that combines geom_line and geom_ribbon.
df %>%
mutate(day = mday(date),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = day, y = value)) +
geom_smooth(stat= "summary", alpha = .5, color = "black") +
facet_grid(~year)
#> No summary function supplied, defaulting to `mean_se()`
#> No summary function supplied, defaulting to `mean_se()`

Following on tjebo's answer, I would also suggest to if you must you can simply highlight a line of code that would convey something out of the clutter of lines, here is an example if you want to highlight the 11th day from the rest.
Plot
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T),
highlight = ifelse(day == 11, "Yes", "No")) %>%
ggplot(aes(x = month, y = value, color = highlight, group=interaction(day, day_group))) +
geom_line() +
theme_bw()+
theme(plot.title = element_text(hjust = 1, vjust = 2),
axis.text.x = element_text(angle = 90)) +
scale_color_manual(breaks = c("Yes", "No"),
labels = c("11th Day", "Other"),
values = c("Yes" = "red2", "No" = "grey60")) +
facet_wrap(year~day_group) +
guides(color = guide_legend(order = 1))

Related

Draw arrow on ggplot with dates as variable, by specifying co-ordinates rather than using the units of the x- and y-variables

I am attempting to plot the blood test results for a patient in a time series. I have managed to do this and included a reference range between two shaded y-intercepts. My problem is that the annotate() or geom_segment() calls want me to specify, in the units of my independent variable, which is, unhelpfully, a date (YYYY-MM-DD).
Is it possible to get R to ignore the units of the x- and y-axis and specify the arrow co-ordinates as if they were on a grid?
result <- runif(25, min = 2.0, max = 3.5)
start_date <- ymd("2021-08-16")
end_date <- ymd("2022-10-29")
date <- sample(seq(start_date, end_date, by = "days"), 25, replace = TRUE)
q <- data.table(numbers, date)
ggplot(q, aes(x = date, y = result)) +
geom_line() +
geom_point(aes(x = date, y = result), shape = 21, size = 3) +
scale_x_date(limits = c(min(q$date), max(q$date)),
breaks = date_breaks("1 month"),
labels = date_format("%b %Y")) +
ylab("Corrected calcium (mmol/L")+
xlab("Date of blood test") +
ylim(1,4)+
geom_ribbon(aes(ymin=2.1, ymax=2.6), fill="grey", alpha=0.2, colour="grey")+
geom_vline(xintercept=as.numeric(q$date[c(3, 2)]),
linetype=4, colour="black") +
theme(axis.text.x = element_text(angle = 45)) + theme_prism(base_size = 10) +
annotate("segment", x = 1, y = 2, xend = 3, yend = 4, arrow = arrow(length = unit(0.15, "cm")))
The error produced is Error: Invalid input: date_trans works with objects of class Date only.
I can confirm that:
> class(q$date)
[1] "Date"
I've just gone with test co-ordinates (1,2,3,4) for the annotate("segment"...), ideally I want to be able to get the arrow to point to a specific data point on the plot to indicate when the patient went on treatment.
Many thanks,
Sandro
You don't need to convert to points or coordinates. Just use the actual values from your data frame. I am just subsetting within annotate using a hard coded index (you can also automate this of course), but you will need to "remind" R that you are dealing with dates - thus the added lubridate::as_date call.
library(ggplot2)
library(lubridate)
result <- runif(25, min = 2.0, max = 3.5)
start_date <- ymd("2021-08-16")
end_date <- ymd("2022-10-29")
date <- sample(seq(start_date, end_date, by = "days"), 25, replace = TRUE)
q <- data.frame(result, date)
## I am arranging the data frame by date
q <- dplyr::arrange(q, date)
ggplot(q, aes(x = date, y = result)) +
geom_line() +
## for start use a random x and y so it starts whereever you want it to start
## for end, use the same row from your data frame, in this case row 20
annotate(geom = "segment",
x = as_date(q$date[2]), xend = as_date(q$date[20]),
y = min(q$result), yend = q$result[20],
arrow = arrow(),
size = 2, color = "red")

How to (re)arrange panels of facet_wrap/_grid?

R noob here. I have been stumped on this graph all day and solutions like this and this this seem to hold my answer but I cannot get them to work for me.
I have a data frame that is a large version of the below sample which I am trying to plot using ggplot.
# create data
df <- data.frame(
"ID" = rep(1:5, each = 4),
"Date" = c(seq(as.Date("2019/09/18"), by = "day", length.out = 4),
seq(as.Date("2019/09/18"), by = "day", length.out = 4),
seq(as.Date("2020/08/07"), by = "day", length.out = 4),
seq(as.Date("2020/09/12"), by = "day", length.out = 4),
seq(as.Date("2020/09/29"), by = "day", length.out = 4)),
"MaxDepth" = round(runif(20, min = 10, max = 50), 1),
"Trip" = rep(1:5, each = 4)
)
# plot using ggplot
ggplot(df, aes(Date, MaxDepth, col = factor(Trip))) +
geom_line() +
facet_grid(ID ~ format(Date, "%Y"), scales = "free_x") +
scale_y_reverse() +
scale_x_date(date_labels = "%b") +
labs(title = "Daily maximum depth\n",
x = "",
y = "Depth [m]\n",
col = "Fishing trip")
This turns out nicely as a two column, eleven row faceted graph with the fishing trips as colours.
However, it includes a lot of empty panels which I would like to avoid by creating a one column graph still with all eleven ID rows but that are separated by the same split label the two columns had. I.e. I would like the two individuals that were in the LHS 2019 plot to have that 2019 label on top, separated by the 2020 label from the other 9 individuals.
.
Hope this is clear. Please correct me or let me know what to improve for a better question.
Grateful for any help! Even if those are suggestions that this is not a good way of representation or something like this is simply not possible. Thank you all!
Here is a possible way. I am not sure whether it works for your real data.
library(ggplot2)
library(patchwork)
library(dplyr)
plot_fun <- function(dtt){
ggplot(dtt, aes(Date, MaxDepth, col = factor(Trip))) +
geom_line() +
facet_grid(ID ~ format(Date, "%Y"), scales = "free_x") +
scale_y_reverse() +
scale_x_date(date_labels = "%b") +
labs(x = NULL, y = NULL, col = "Fishing trip")
}
p1 <- plot_fun(df %>% filter(format(Date, '%Y') == '2019'))
p2 <- plot_fun(df %>% filter(format(Date, '%Y') == '2020'))
p1 / p2
ggsave('~/Downloads/test.png', width = 6, height = 6)

R and ggplot: shade the color of the categorical variable

I would like to make a graph that represents the frequency of an event per hour of each day of the week. How do I change the color of the day of the week? I would like to switch from black to red to the proximity of the weekend
weekday_hour_pickup <- my_data %>%
mutate(hour_pick = hour(new_pickup))%>%
group_by(hour_pick, weekday) %>%
count() %>%
ggplot(aes(hour_pick, n, color = weekday)) +
geom_line(size = 1.5) +
labs(x = "hour of the day", y = "count")
plot(weekday_hour_pickup)
This a good option to make your descrite values as if they have a gradient
# call libraries
library(dplyr)
library(ggplot2)
# define weekdays
weekdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
# define 7 colours from black to red
colours <- scales::seq_gradient_pal(low = "#000000", high = "#ff0000", space = "Lab")(1:7/7)
# create values for scale colour manual
values <- setNames(colours, weekdays)
# create fake data
data <- tibble(hour_pick = rep(1:24, 7),
n = rnorm(7*24) + rep(1:7, each = 24),
# weekday must be a factor if you want your legend to be sorted!
weekday = factor(rep(weekdays, each = 24), levels = weekdays, ordered = TRUE))
ggplot(data) +
geom_line(aes(x = hour_pick, y = n, colour = weekday)) +
theme_light() +
scale_colour_manual(values = values)
First, to simulate what your data might look like
TIMES = seq(from=mdy_hm("May 11, 1996 12:05"),
to=mdy_hm("May 18, 1996 12:05"),length.out=10000)
days.of.week <- weekdays(as.Date(4,"1970-01-01",tz="GMT")+0:6)
n = 1000
set.seed(100)
i = sample(1:length(TIMES),n)
my_data <- data.frame(
new_pickup=TIMES[i],
weekday = weekdays(TIMES[i])
)
Before you do plot, you need to factor the weekday variable in the correct order:
my_data$weekday <- factor(my_data$weekday,levels=days.of.week)
Then we set the color, in this case I set the extremes to be black and firebrick. You might needa play around with this
COLS <- alpha(colorRampPalette(c("black","firebrick"))(7),0.6)
names(COLS) <- days.of.week
Then we plot
weekday_hour_pickup <- my_data %>%
mutate(hour_pick = hour(new_pickup))%>%
group_by(hour_pick, weekday) %>%
count() %>%
ggplot(aes(hour_pick, n, color = weekday)) +
geom_line(size = 1) +
labs(x = "hour of the day", y = "count")+
scale_color_manual(values=COLS) +
theme_bw()
Not a very nice plot because I don't have your data, but you can adjust the COLS to get the contrast you need.

How to plot a subset of forecast in R?

I have a simple R script to create a forecast based on a file.
Data has been recorded since 2014 but I am having trouble trying to accomplish below two goals:
Plot only a subset of the forecast information (starting on 11/2017 onwards).
Include month and year in a specific format (i.e. Jun 17).
Here is the link to the dataset and below you will find the code made by me so far.
# Load required libraries
library(forecast)
library(ggplot2)
# Load dataset
emea <- read.csv(file="C:/Users/nsoria/Downloads/AMS Globales/EMEA_Depuy_Finanzas.csv", header=TRUE, sep=';', dec=",")
# Create time series object
ts_fin <- ts(emea$Value, frequency = 26, start = c(2014,11))
# Pull out the seasonal, trend, and irregular components from the time series
model <- stl(ts_fin, s.window = "periodic")
# Predict the next 3 bi weeks of tickets
pred <- forecast(model, h = 5)
# Plot the results
plot(pred, include = 5, showgap = FALSE, main = "Ticket amount", xlab = "Timeframe", ylab = "Quantity")
I appreciate any help and suggestion to my two points and a clean plot.
Thanks in advance.
Edit 01/10 - Issue 1:
I added the screenshot output for suggested code.
Plot1
Edit 01/10 - Issue 2:
Once transformed with below code, it somehow miss the date count and mess with the results. Please see two screenshots and compare the last value.
Screenshot 1
Screenshot 2
Plotting using ggplot2 w/ ggfortify, tidyverse, lubridate and scales packages
library(lubridate)
library(tidyverse)
library(scales)
library(ggfortify)
# Convert pred from list to data frame object
df1 <- fortify(pred) %>% as_tibble()
# Convert ts decimal time to Date class
df1$Date <- as.Date(date_decimal(df1$Index), "%Y-%m-%d")
str(df1)
# Remove Index column and rename other columns
# Select only data pts after 2017
df1 <- df1 %>%
select(-Index) %>%
filter(Date >= as.Date("2017-01-01")) %>%
rename("Low95" = "Lo 95",
"Low80" = "Lo 80",
"High95" = "Hi 95",
"High80" = "Hi 80",
"Forecast" = "Point Forecast")
df1
### Updated: To connect the gap between the Data & Forecast,
# assign the last non-NA row of Data column to the corresponding row of other columns
lastNonNAinData <- max(which(complete.cases(df1$Data)))
df1[lastNonNAinData, !(colnames(df1) %in% c("Data", "Fitted", "Date"))] <- df1$Data[lastNonNAinData]
# Or: use [geom_segment](http://ggplot2.tidyverse.org/reference/geom_segment.html)
plt1 <- ggplot(df1, aes(x = Date)) +
ggtitle("Ticket amount") +
xlab("Time frame") + ylab("Quantity") +
geom_ribbon(aes(ymin = Low95, ymax = High95, fill = "95%")) +
geom_ribbon(aes(ymin = Low80, ymax = High80, fill = "80%")) +
geom_point(aes(y = Data, colour = "Data"), size = 4) +
geom_line(aes(y = Data, group = 1, colour = "Data"),
linetype = "dotted", size = 0.75) +
geom_line(aes(y = Fitted, group = 2, colour = "Fitted"), size = 0.75) +
geom_line(aes(y = Forecast, group = 3, colour = "Forecast"), size = 0.75) +
scale_x_date(breaks = scales::pretty_breaks(), date_labels = "%b %y") +
scale_colour_brewer(name = "Legend", type = "qual", palette = "Dark2") +
scale_fill_brewer(name = "Intervals") +
guides(colour = guide_legend(order = 1), fill = guide_legend(order = 2)) +
theme_bw(base_size = 14)
plt1

ggplot change order of continuous y axis values

I am trying to plot shift data by hour (integer) ordered by 3 different shifts worked (8-16, 16-24, 24-8) by day as the x-axis. The hours I have are 24hr format and I want to plot them not in numerical order (0-24) but by the shift order (8-16, 16-24, 24-8).
Here is the code to create the data and make the plot. I want to put the 0-8 chunk above the 16-24 chunk.
set.seed(123)
Hour = sample(0:24, 500, replace=T)
Day = sample(0:1, 500, replace=T)
dat <- as.tibble(cbind(Hour, Day)) %>%
mutate(Day = factor(ifelse(Day == 0, "Mon", "Tues")),
Shift = cut(Hour, 3, labels = c("0-8", "8-16", "16-24")),
Exposure = factor(sample(0:1, 500, replace=T)))
ggplot(dat, aes(x = Day, y = Hour)) +
geom_jitter(aes(color = Exposure, shape = Exposure)) +
geom_hline(yintercept = 8) +
geom_hline(yintercept = 16) +
theme_classic()
Current plot
It is an interesting problem, and I have tried recoding a new hour variable that is in the order that I want but then I'm not sure how to plot it displaying the standard 24hr variable.
How would i accomplish this ordering?
Not sure if I completely understand, but if you facet your table on the Shift column, it should do what you want. First you must factor the Shift column to the order you specify:
dat$Shift <- factor(dat$Shift, levels = c("0-8", "16-24", "8-16"))
ggplot(dat, aes(x = Day, y = Hour)) +
geom_jitter(aes(color = Exposure, shape = Exposure)) +
facet_grid(Shift ~ ., scales = "free") +
theme_classic()
set.seed(123)
Hour = sample(0:24, 500, replace=T)
Day = sample(0:1, 500, replace=T)
dat <- as.tibble(cbind(Hour, Day)) %>%
mutate(Day = factor(ifelse(Day == 0, "Mon", "Tues")),
Shift = cut(Hour, 3, labels = c("0-8", "8-16", "16-24")),
Exposure = factor(sample(0:1, 500, replace=T)))
dat$Shift <- factor(dat$Shift, levels=rev(levels(dat$Shift)))
ggplot(dat, aes(x = Day, y = Shift)) +
geom_jitter(aes(color = Exposure, shape = Exposure)) +
geom_hline(yintercept = 8) +
geom_hline(yintercept = 16) +
theme_classic()
You just need to reverse the level.

Resources