Plot a 24 hour cycle monthly for multiple variables? - r

I have data that can be mimicked in the following manner:
set.seed(1234)
foo <- data.frame(month = rep(month.name, each = 24),
hour = rep(seq(1:24), 12),
value1 = rnorm(nrow(foo), 60, 1),
value2 = rnorm(nrow(foo), 60, 1))
foo <- melt(foo, id = c('month', 'hour'))
I would like to create a plot for the entire year using ggplot that displays the 24 hour cycle of each variable per month.
Here's what I've tried so far:
t.plot <- ggplot(foo,
aes(interaction(month,hour), value, group = interaction(variable,hour)))
t.plot <- t.plot + geom_line(aes(colour = variable))
print(t.plot)
I get this, which throws the data into misalignment. For such a small SD you see that the first 24 values should be nearer to 60, but they are all over the place. I don't understand what's causing this discrepancy.
https://www.dropbox.com/s/rv6uxhe7wk7q35w/foo.png
when I plot:
plot(interaction(foo$month,foo$hour)[1:24], foo$value[1:24])
I get the shape that I would expect however the xaxis is very strange and not what I was expecting.
Any help?

The solution is to set your dates to be dates (not an interaction of a factor)
eg
library(lubridate)
library(reshape2)
Date <- as.Date(dmy('01-01-2000') + seq_len(24*365)*hours(1))
foo <- data.frame(Date = Date,
value1 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 24*365-1),
value2 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 24*365-1))
foo_melt <- melt(foo, id = 'Date')
# then you can use `scale_x_date` and `r` and ggplot2 will know they are dates
# load scales library to access date_format and date_breaks
library(scales)
ggplot(foo_melt, aes(x=Date, y=value, colour = variable)) +
geom_line() +
scale_x_date(breaks = date_breaks('month'),
labels = date_format('%b'), expand =c(0,0))
Edit 1 average day per month
you can use facet_wrap to facet by month
# using your created foo data set
levels(foo$month) <- sort(month.abb)
foo$month <- factor(foo$month, levels = month.abb)
ggplot(foo, aes(x = hour, y=value, colour = variable)) +
facet_wrap(~month) + geom_line() +
scale_x_continuous(expand = c(0,0)))

Related

plotting daily distribution of a time series data in R

I have a time series data (date column and a value column). I am trying for a daily distribution plot.
In the below image is the weekly distribution plot that plots the values of the days of the week. Similarly I am trying to plot a daily distribution plot where x axis would be months, y axis is the value and the plot has 10 lines where each line gives you the date 1, date 2 , date 3 and so on until date 10 (since 30 days in one subplot will be clumsy so i wanted to divide the plots into 3 , 1-10, 11-20 and 21-31)
Code for weekly distribution for reference:
#dummy data
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2021-12-31")
date_seq <- seq(from = start_date, to = end_date, by = "day")
set.seed(123)
value <- round(runif(length(date_seq), min = 10000, max = 100000000), 0)
df <- data.frame(date = date_seq, value = value)
df$week_number <- as.numeric(format(as.Date(df$date), "%U")) + 1
df$weekday <- weekdays(as.Date(df$date))
df$year <- as.numeric(format(as.Date(df$date), "%Y"))
years <- unique(df$year)
# Create a list of ggplots, one for each year
plots <- lapply(years, function(y) {
year_df <- df[df$year == y, ]
ggplot(year_df, aes(x = week_number, y = value, color = weekday)) +
geom_line() +
scale_color_discrete(limits = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")) +
ggtitle(paste("Weekday Distribution", y)) +
xlab("Week number") +
ylab("Value") +
theme(legend.key.size = unit(0.4, "cm")) +
theme(plot.title = element_text(hjust = 0.5, vjust = 1.5))
library(cowplot)
plot_grid(plotlist = plots, ncol = 1)
So at the end, there will be three plots(1 to 10 dates, 11 to 20 dates and 21 to 31 dates) and each plot would contain 2 subplots (as the dates ranges from 2020 to 2021). Can anyone help me with this?
Below how I would do this. The lubridate package is your friend. For the grouping, use cuts.
The result is a (in my opinion) pretty useless clutter of lines. But this is not the only reason why I do not endorse this visualisation. I feel this somehow defeats the point of a time series... one point is to visualise the auto-correlation of your data. Artificially separating out only specific days from each month impacts drastically on this particular advantage (and maybe: reason) of using a time series. You're not only losing information, but also making your own analytical life much more complicated.
library(ggplot2)
library(dplyr)
library(lubridate)
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = month, y = value, color = day, group=interaction(day, day_group))) +
geom_line() +
theme(legend.key.size = unit(0.4, "cm"),
plot.title = element_text(hjust = 0.5, vjust = 1.5),
axis.text.x = element_text(angle = 90)) +
facet_wrap(year~day_group)
I feel you want to show how the "typical" 1st day compares with the 2nd, etc. For this, an aggregate visualisation might be more useful. (Still not a good idea, but at least you get a better idea of your data). This you can do with "stat_summary" which you pass to geom_smooth which has a geometry that combines geom_line and geom_ribbon.
df %>%
mutate(day = mday(date),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = day, y = value)) +
geom_smooth(stat= "summary", alpha = .5, color = "black") +
facet_grid(~year)
#> No summary function supplied, defaulting to `mean_se()`
#> No summary function supplied, defaulting to `mean_se()`
Following on tjebo's answer, I would also suggest to if you must you can simply highlight a line of code that would convey something out of the clutter of lines, here is an example if you want to highlight the 11th day from the rest.
Plot
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T),
highlight = ifelse(day == 11, "Yes", "No")) %>%
ggplot(aes(x = month, y = value, color = highlight, group=interaction(day, day_group))) +
geom_line() +
theme_bw()+
theme(plot.title = element_text(hjust = 1, vjust = 2),
axis.text.x = element_text(angle = 90)) +
scale_color_manual(breaks = c("Yes", "No"),
labels = c("11th Day", "Other"),
values = c("Yes" = "red2", "No" = "grey60")) +
facet_wrap(year~day_group) +
guides(color = guide_legend(order = 1))

How can I work with stat_density and a timeseries (Posixct on x axis)?

Based on this example:
#example from https://ggplot2.tidyverse.org/reference/geom_tile.html
cars <- ggplot(mtcars, aes(mpg,factor(cyl)))
cars + stat_density(aes(fill = after_stat(density)), geom = "raster", position = "identity")
I wanted to create a plot with the density plotted vertically per hour of my dataset. The original dataset is very long. I also want to display the single data points and a mean as a line.
Here is a simplified basic version of the code:
#reproducable example for density plot
library(reshape2)
library(ggplot2)
library(scales)
startdate <- as.POSIXct("2020-01-01 01:00", tz="UTC")
enddate <- as.POSIXct("2020-01-01 05:00", tz="UTC")
#dataframe
df <- data.frame(x = seq.POSIXt(startdate, enddate, "hour"),
y1 = c(1,2,3,4,5),
y2 = c(2,4,6,8,10),
y3 = c(3,6,9,12,15))
df$mean <- rowMeans(df[,-1])
df_melt <- melt(df, id.vars = 1, measure.vars = c(2,3,4))
#plot
g1 <- ggplot(data = df_melt, aes(factor(x), value)) +
stat_density(aes(fill = after_stat(ndensity)),
geom = "raster", position = "identity", orientation = "y") +
geom_point()
g1
This works, but the original dataset has so many hours that the labeling of the x axis is not nice. I also want to determine how the dateformat of the labels should look like and the limits of the plot. Before working with stat_density, I used to do that with scale_x_datetime. But for the density plot I have to use factor(x) instead of the original x, which is PosixcT. So the following scaling produces an error because x is a factor and not a date, obviously:
#scale x datetime (does not work)
g1 <- g1 + scale_x_datetime(labels = date_format("%b/%d", tz="UTC"),
limits = c(startdate, enddate),
breaks = function(x)
seq.POSIXt(from = startdate, to = enddate, by = "2 days"),
date_minor_breaks = "12 hours",
expand = c(0,0))
g1
I managed to scale_x_discrete but this makes it hard to determine the label format and limits with the bigger dataset:
#scale x discrete
g1 <- g1 + scale_x_discrete(limits = c(as.character(df$x)),
breaks = as.character(df$x)[c(2,4)])
g1
The next problem with factors is then that I cannot add the mean of every hour as geom_line as every factor consists of 1 observation only.
#plot mean
g1 + geom_point(data = df, aes(factor(x), mean), col = "red")
g1 + geom_line(data = df, aes(factor(x), mean), col = "red")
So, is there a way to produce the desired plot with density per hour, overplotted points and overplotted mean line? And I want to edit the x labels and limits as comfortably as possible. Maybe there is a way to use x instead of factor(x)...
I think the solution might be as simple as dropping the as.factor() and setting an explicit group in the density. Does the following work for your real case?
library(reshape2)
library(ggplot2)
library(scales)
#> Warning: package 'scales' was built under R version 4.0.3
startdate <- as.POSIXct("2020-01-01 01:00", tz="UTC")
enddate <- as.POSIXct("2020-01-01 05:00", tz="UTC")
#dataframe
df <- data.frame(x = seq.POSIXt(startdate, enddate, "hour"),
y1 = c(1,2,3,4,5),
y2 = c(2,4,6,8,10),
y3 = c(3,6,9,12,15))
df$mean <- rowMeans(df[,-1])
df_melt <- melt(df, id.vars = 1, measure.vars = c(2,3,4))
#plot
ggplot(data = df_melt, aes(x, value)) +
stat_density(aes(fill = after_stat(ndensity),
group = x),
geom = "raster", position = "identity", orientation = "y") +
geom_point()
Created on 2021-01-29 by the reprex package (v0.3.0)

ggplot change order of continuous y axis values

I am trying to plot shift data by hour (integer) ordered by 3 different shifts worked (8-16, 16-24, 24-8) by day as the x-axis. The hours I have are 24hr format and I want to plot them not in numerical order (0-24) but by the shift order (8-16, 16-24, 24-8).
Here is the code to create the data and make the plot. I want to put the 0-8 chunk above the 16-24 chunk.
set.seed(123)
Hour = sample(0:24, 500, replace=T)
Day = sample(0:1, 500, replace=T)
dat <- as.tibble(cbind(Hour, Day)) %>%
mutate(Day = factor(ifelse(Day == 0, "Mon", "Tues")),
Shift = cut(Hour, 3, labels = c("0-8", "8-16", "16-24")),
Exposure = factor(sample(0:1, 500, replace=T)))
ggplot(dat, aes(x = Day, y = Hour)) +
geom_jitter(aes(color = Exposure, shape = Exposure)) +
geom_hline(yintercept = 8) +
geom_hline(yintercept = 16) +
theme_classic()
Current plot
It is an interesting problem, and I have tried recoding a new hour variable that is in the order that I want but then I'm not sure how to plot it displaying the standard 24hr variable.
How would i accomplish this ordering?
Not sure if I completely understand, but if you facet your table on the Shift column, it should do what you want. First you must factor the Shift column to the order you specify:
dat$Shift <- factor(dat$Shift, levels = c("0-8", "16-24", "8-16"))
ggplot(dat, aes(x = Day, y = Hour)) +
geom_jitter(aes(color = Exposure, shape = Exposure)) +
facet_grid(Shift ~ ., scales = "free") +
theme_classic()
set.seed(123)
Hour = sample(0:24, 500, replace=T)
Day = sample(0:1, 500, replace=T)
dat <- as.tibble(cbind(Hour, Day)) %>%
mutate(Day = factor(ifelse(Day == 0, "Mon", "Tues")),
Shift = cut(Hour, 3, labels = c("0-8", "8-16", "16-24")),
Exposure = factor(sample(0:1, 500, replace=T)))
dat$Shift <- factor(dat$Shift, levels=rev(levels(dat$Shift)))
ggplot(dat, aes(x = Day, y = Shift)) +
geom_jitter(aes(color = Exposure, shape = Exposure)) +
geom_hline(yintercept = 8) +
geom_hline(yintercept = 16) +
theme_classic()
You just need to reverse the level.

Change X axsis scale in R

I have data which is monthly, but it displays as two year intervals. I wish for this to be displayed as monthly.
I plotted using GGfortify and ggplot 2 using the following code:
library(ggplot2)
library(ggfortify)
spendingARIMA <- arima(spendingSaas, order = c(2, 1, 0))
fianlforecastSpending <- forecast(spendingARIMA, h= 6, level = 30)
autoplot(fianlforecastSpending)
Something like this should work
dates <- seq(as.POSIXct("2016-1-1"), as.POSIXct("2018-1-1"), by="month") # make dates
df <- data.frame(dates = dates, value = rnorm(25)) # make data frame
ggplot(df, aes(x = dates, y=value)) +
geom_point() +
scale_x_datetime(date_breaks = "month") + #This is the key line
theme(axis.text.x=element_text(angle = -90, hjust = 1, vjust=.3)) #rotate x-axis 90 deg.

R heat map of annual time series by entire year

I am trying to make a heatmap of several years of daily averages of salinity in an estuary in R.
I would like the format to include month on the x-axis and year on the y-axis, so each Jan 1st directly above another Jan. 1st. In other words, NOT like a typical annual calendar style (not like this: http://www.r-bloggers.com/ggplot2-time-series-heatmaps/).
So far I have only been able to plot by the day of the year using:
{r}
d <- read.xlsx('GC salinity transposed.xlsx', sheetName = "vert-3", header = TRUE, stringsAsFactors = FALSE, colClasses = c("integer", "integer", "numeric"), endRow = 2254)
{r}
ggplot(d, aes(x = Day.Number, y = Year)) + geom_tile(aes(fill = Salinity)) + scale_fill_gradient(name = 'Mean Daily Salinity', low = 'white', high = 'blue') + theme(axis.title.y = element_blank())
And get this:
heat map not quite right
Could someone please tell me a better way to do this - a way that would include month, rather than day of the year along the x-axis? Thank you. New to R.
The lubridate package comes in handy for stuff like this. Does this code do what you want? I'm assuming you only have one salinity reading per month and there's no need to average across multiple values in the same month.
library(lubridate)
library(ggplot2)
# Define some data
df <- data.frame(date = seq.Date(from = as.Date("2015-01-01"), by = 1, length.out = 400),
salinity = runif(400, min=5, max=7))
# Create fields for plotting
df$day <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date),
"-",
ifelse(day(df$date)<10,"0",""),
day(df$date))
df$month <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date))
df$year <- year(df$date)
library(lubridate)
library(ggplot2)
# Define some data
df <- data.frame(date = seq.Date(from = as.Date("2015-01-01"), by = 1, length.out = 400),
salinity = runif(400, min=5, max=7))
# Create fields for plotting
df$day <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date),
"-",
ifelse(day(df$date)<10,"0",""),
day(df$date))
df$month <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date))
df$year <- year(df$date)
#Plot results by month
ggplot(data=df) +
geom_tile(aes(x = month, y = year, fill = salinity)) +
scale_y_continuous(breaks = c(2015,2016))
#Plot results by day
ggplot(data=df) +
geom_tile(aes(x = day, y = year, fill = salinity)) +
scale_y_continuous(breaks = c(2015,2016))
Results by month:
Results by day (do you really want this? It's very hard to read with 366 x-axis values):

Resources