Automatically set data representative breaks in ggplot with facet_grid - r

Here's a reproductible example taken from the R Graph Gallery:
library(ggplot2)
library(dplyr)
library(viridis)
library(Interpol.T)
library(lubridate)
library(ggExtra)
library(tidyr)
data <- data(Trentino_hourly_T,package = "Interpol.T")
names(h_d_t)[1:5]<- c("stationid","date","hour","temp","flag")
df <- as_tibble(h_d_t) %>%
filter(stationid =="T0001")
df$date<-ymd(df$date)
df <- df %>% mutate(date = ymd(date),
year = year(date),
month = month(date, label=TRUE),
day = day(date))
rm(list=c("h_d_t","mo_bias","Tn","Tx",
"Th_int_list","calibration_l",
"calibration_shape","Tm_list"))
df <- df %>%
filter(between(date, as.Date("2004-02-13"), as.Date("2004-04-29")) | between(date, as.Date("2005-02-13"), as.Date("2005-04-29")))
df <-df %>% select(stationid,day,hour,month,year,temp)%>%
fill(temp)
statno <-unique(df$stationid)
######## Plotting starts here#####################
p <-ggplot(df, aes(day,hour,fill=temp))+
geom_tile(color= "white",size=0.1) +
scale_fill_viridis(name="Hrly Temps C",option ="C") +
facet_grid(year~month, scales = "free") +
scale_y_continuous(trans = "reverse", breaks = unique(df$hour)) +
theme_minimal(base_size = 8) +
labs(title= paste("Hourly Temps - Station",statno), x="Day", y="Hour Commencing") +
theme(legend.position = "bottom",
plot.title=element_text(size = 14, hjust = 0),
axis.text.y=element_text(size=6),
strip.background = element_rect(colour="white"),
axis.ticks=element_blank(),
axis.text=element_text(size=7),
legend.text=element_text(size=6))+
removeGrid()
What is bothering me is that the x axis breaks don't show explicitly the first and last day of each month, even worse they show a February 30th, a March 0th and a April 0th.
My goal is to use a function that automatically and explicitly shows the REAL first and last day of each ploted month (in the example February 13th - February 29th, March 1st - March 31th and April 1st - April 29th) with 4 to 6 breaks within each month.
As this plot will be shown in a shiny app where the user can change the time period ploted, the solution REALLY needs to be automated.
Here are some things I've tried:
library(scales)
p + scale_x_continuous(breaks =breaks_pretty())
But it doesn't change much.
I've tried to write my own function but something horrible happened:
breaksFUN <- function(x){
round(seq(min(x), max(x), length.out = 5), 0)
}
p + scale_x_continuous(breaks =breaksFUN)
Thank you in advance.

Thank you Axeman for your contribution, it really helped! It works for my example but i've encountered some issues trying it out in my data. However, I modified it and it works properly now, here's my solution inspired by Axeman:
breaksFUN <- function(x) {
s <- round(c(seq(min(x) + 1.5, max(x) - 5.5, length.out = 4), max(x) - 1.5))
s[s == 0] <- 1
s[s > 31] <- 31
s <- round(seq(range(s)[1], range(s)[2], length.out = 5))
unique(s)
}
p + scale_x_continuous(breaks = breaksFUN)

Related

plotting daily distribution of a time series data in R

I have a time series data (date column and a value column). I am trying for a daily distribution plot.
In the below image is the weekly distribution plot that plots the values of the days of the week. Similarly I am trying to plot a daily distribution plot where x axis would be months, y axis is the value and the plot has 10 lines where each line gives you the date 1, date 2 , date 3 and so on until date 10 (since 30 days in one subplot will be clumsy so i wanted to divide the plots into 3 , 1-10, 11-20 and 21-31)
Code for weekly distribution for reference:
#dummy data
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2021-12-31")
date_seq <- seq(from = start_date, to = end_date, by = "day")
set.seed(123)
value <- round(runif(length(date_seq), min = 10000, max = 100000000), 0)
df <- data.frame(date = date_seq, value = value)
df$week_number <- as.numeric(format(as.Date(df$date), "%U")) + 1
df$weekday <- weekdays(as.Date(df$date))
df$year <- as.numeric(format(as.Date(df$date), "%Y"))
years <- unique(df$year)
# Create a list of ggplots, one for each year
plots <- lapply(years, function(y) {
year_df <- df[df$year == y, ]
ggplot(year_df, aes(x = week_number, y = value, color = weekday)) +
geom_line() +
scale_color_discrete(limits = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")) +
ggtitle(paste("Weekday Distribution", y)) +
xlab("Week number") +
ylab("Value") +
theme(legend.key.size = unit(0.4, "cm")) +
theme(plot.title = element_text(hjust = 0.5, vjust = 1.5))
library(cowplot)
plot_grid(plotlist = plots, ncol = 1)
So at the end, there will be three plots(1 to 10 dates, 11 to 20 dates and 21 to 31 dates) and each plot would contain 2 subplots (as the dates ranges from 2020 to 2021). Can anyone help me with this?
Below how I would do this. The lubridate package is your friend. For the grouping, use cuts.
The result is a (in my opinion) pretty useless clutter of lines. But this is not the only reason why I do not endorse this visualisation. I feel this somehow defeats the point of a time series... one point is to visualise the auto-correlation of your data. Artificially separating out only specific days from each month impacts drastically on this particular advantage (and maybe: reason) of using a time series. You're not only losing information, but also making your own analytical life much more complicated.
library(ggplot2)
library(dplyr)
library(lubridate)
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = month, y = value, color = day, group=interaction(day, day_group))) +
geom_line() +
theme(legend.key.size = unit(0.4, "cm"),
plot.title = element_text(hjust = 0.5, vjust = 1.5),
axis.text.x = element_text(angle = 90)) +
facet_wrap(year~day_group)
I feel you want to show how the "typical" 1st day compares with the 2nd, etc. For this, an aggregate visualisation might be more useful. (Still not a good idea, but at least you get a better idea of your data). This you can do with "stat_summary" which you pass to geom_smooth which has a geometry that combines geom_line and geom_ribbon.
df %>%
mutate(day = mday(date),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = day, y = value)) +
geom_smooth(stat= "summary", alpha = .5, color = "black") +
facet_grid(~year)
#> No summary function supplied, defaulting to `mean_se()`
#> No summary function supplied, defaulting to `mean_se()`
Following on tjebo's answer, I would also suggest to if you must you can simply highlight a line of code that would convey something out of the clutter of lines, here is an example if you want to highlight the 11th day from the rest.
Plot
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T),
highlight = ifelse(day == 11, "Yes", "No")) %>%
ggplot(aes(x = month, y = value, color = highlight, group=interaction(day, day_group))) +
geom_line() +
theme_bw()+
theme(plot.title = element_text(hjust = 1, vjust = 2),
axis.text.x = element_text(angle = 90)) +
scale_color_manual(breaks = c("Yes", "No"),
labels = c("11th Day", "Other"),
values = c("Yes" = "red2", "No" = "grey60")) +
facet_wrap(year~day_group) +
guides(color = guide_legend(order = 1))

exclude weekends from x axis in heatmap

I have coded a heatmap using ggplot tiles and it has sequencial days on the x axis . The problem I am trying to solve is to remove weekends from the heatmap and show only weekdays. I have found that one solution would be to transform the dates into factors but if I do that how can I format the labels in scale_x_discrete to be in %d%m date format ? Is there a way to keep the dates as date format instead of turning it into factors ?
Below is an example:
randomString <- function(n=5,length=3) {
randomStringX <- c(1:n)
for(i in 1:n) {
randomStringX[i] <- paste(sample(c(LETTERS),length,replace = TRUE),collapse = "")
}
return(randomStringX)
}
randomString()
data.frame(client=randomString(),day=rep(seq.Date(Sys.Date()-10,length.out=10,by="1 day"),2)) %>% mutate(sales=round(rnorm(20,492,300),1)) %>% mutate(scale=cut(sales,breaks=c(0,100,200,300,max(sales)),labels = c("0-100","100-200","200-300","+300"))) %>% ggplot(.,aes(x=day,y=client,fill=scale)) + geom_tile() + scale_x_date(date_breaks = "1 day")
Thanks in advance
You can exclude data from weekends using the is.weekend function from chron
The weekend dates themselves can be excluded from an x-axis using the bdscale package
library(chron)
library(bdscale)
library(scales)
library(ggplot2)
library(dplyr)
df <- as.data.frame(client = randomString(), day = rep(seq.Date(
Sys.Date() - 10, length.out = 10, by = "1 day"), 2)) %>%
mutate(sales = round(rnorm(20, 492, 300), 1)) %>%
mutate(scale =
cut(
sales,
breaks = c(0, 100, 200, 300, max(sales)),
labels = c("0-100", "100-200", "200-300", "+300")
)) %>%
filter(is.weekend(day) == FALSE)
ggplot(df, aes(x = day, y = client, color = scale, fill = scale)) +
geom_tile() +
# scale_x_date(date_breaks = "1 day") +
theme(axis.text.x = element_text(angle = 45)) +
scale_x_bd(business.dates = sort(df$day), max.major.breaks = 30, labels=scales::date_format('%d %b'))
Removing data from weekends can also be done using lubridate and the wday function as
filter(!wday(day) %in% c(1,7))
Sun/Sat are stored as 1 and 7 respectively. - Credit to #AHart

gganimate: make points stay several frames before and after

I have a data.frame containing timestamped events of different kinds, geolocated. I know how to plot an animation of each event as a point, hour by hour, with gganimate (*). It would be something like:
df = data.frame("id" = runif(500, 1e6, 1e7),
'lat' = runif(500, 45, 45.1),
'long'= runif(500, 45, 45.1),
'datetime'= seq.POSIXt(from=Sys.time(), to=Sys.time()+1e5, length.out=500),
'hour'=format(seq.POSIXt(from=Sys.time(), to=Sys.time()+1e5, length.out=500), "%H"),
'event'=paste0("type", rpois(500, 1)))
ggplot(data=df) +
aes(x=long, y=lat, color=factor(event)) +
geom_point() +
transition_states(hour, state_length = 1, transition_length = 0)
Now I would like to make points "stay" longer on screen, for instance if an event is at 5:00pm, i want it to be displayed on the animation from 2pm until 8pm (3 frames before and after his position, and if possible fade in and out). I don't know how to do that with gganimate, I tried to use transition_length but it's making the points "move" and that makes no sense for me!
Thanks,
(*) Edit: I thought of adding 6 duplicated rows for each row, and modifying the hour by -1 to +3, but it's a lot heavier and can't deal with fade in/out
library(magrittr)
df %<>% mutate(hour = hour + 2) %>% bind_rows(df)
df %<>% mutate(hour = hour + 1) %>% bind_rows(df)
df %<>% mutate(hour = hour - 4) %>% bind_rows(df)
df %<>% mutate(hour = hour %% 24 )
You can use transition_components and specify 3 hours as the enter / exit length for each point.
Data:
set.seed(123)
n <- 50 # 500 in question
df = data.frame(
id = runif(n, 1e6, 1e7),
lat = runif(n, 45, 45.1),
long = runif(n, 45, 45.1),
datetime = seq.POSIXt(from=Sys.time(), to=Sys.time()+1e5, length.out=n),
hour = format(seq.POSIXt(from=Sys.time(), to=Sys.time()+1e5, length.out=n), "%H"),
event = paste0("type", rpois(n, 1)))
Code:
df %>%
mutate(hour = as.numeric(as.character(hour))) %>%
ggplot() +
aes(x=long, y=lat, group = id, color=factor(event)) +
# I'm using geom_label to indicate the time associated with each
# point & highlight the transition times before / afterwards.
# replace with geom_point() as needed
geom_label(aes(label = as.character(hour))) +
# geom_point() +
transition_components(hour,
enter_length = 3,
exit_length = 3) +
enter_fade() +
exit_fade() +
ggtitle("Time: {round(frame_time)}")
This approach works with a datetime variable as well:
df %>%
ggplot() +
aes(x = long, y = lat, group = id, color = factor(event)) +
geom_label(aes(label = format(datetime, "%H:%M"))) +
transition_components(datetime,
enter_length = as_datetime(hm("3:0")),
exit_length = as_datetime(hm("3:0"))) +
enter_fade() +
exit_fade() +
ggtitle("Time: {format(frame_time, '%H:%M')}")
gganimate does not appear to be set up to handle leaving points on the plot. I think that you are going to have to go the manual route.
Here is a (slightly kludgy) approach to duplicate the rows including setting the times at which they should display and the offset (to be used for alpha to control fade):
df_withRange <-
df %>%
mutate(hour = parse_number(hour)) %>%
split(1:nrow(.)) %>%
lapply(function(x){
lapply(-3:3, function(this_time){
x %>%
mutate(frame_time = hour + this_time
, offset = this_time
, abs_offset = abs(this_time))
}) %>%
bind_rows()
}) %>%
bind_rows() %>%
mutate(
frame_time = ifelse(frame_time > 23, frame_time - 24, frame_time)
, frame_time = ifelse(frame_time < 0, frame_time + 24, frame_time)
)
Then, this code set up the plot:
ggplot(data=df_withRange
, aes(x=long
, y=lat
, color=factor(event)
, alpha = abs_offset
)) +
geom_point() +
transition_states(frame_time) +
labs(title = 'Hour: {closest_state}') +
scale_alpha_continuous(range = c(1,0.2))
The plot:
There is still a lot of clean up to do (e.g., the fade levels, etc.), but that should be a start at least

R heat map of annual time series by entire year

I am trying to make a heatmap of several years of daily averages of salinity in an estuary in R.
I would like the format to include month on the x-axis and year on the y-axis, so each Jan 1st directly above another Jan. 1st. In other words, NOT like a typical annual calendar style (not like this: http://www.r-bloggers.com/ggplot2-time-series-heatmaps/).
So far I have only been able to plot by the day of the year using:
{r}
d <- read.xlsx('GC salinity transposed.xlsx', sheetName = "vert-3", header = TRUE, stringsAsFactors = FALSE, colClasses = c("integer", "integer", "numeric"), endRow = 2254)
{r}
ggplot(d, aes(x = Day.Number, y = Year)) + geom_tile(aes(fill = Salinity)) + scale_fill_gradient(name = 'Mean Daily Salinity', low = 'white', high = 'blue') + theme(axis.title.y = element_blank())
And get this:
heat map not quite right
Could someone please tell me a better way to do this - a way that would include month, rather than day of the year along the x-axis? Thank you. New to R.
The lubridate package comes in handy for stuff like this. Does this code do what you want? I'm assuming you only have one salinity reading per month and there's no need to average across multiple values in the same month.
library(lubridate)
library(ggplot2)
# Define some data
df <- data.frame(date = seq.Date(from = as.Date("2015-01-01"), by = 1, length.out = 400),
salinity = runif(400, min=5, max=7))
# Create fields for plotting
df$day <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date),
"-",
ifelse(day(df$date)<10,"0",""),
day(df$date))
df$month <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date))
df$year <- year(df$date)
library(lubridate)
library(ggplot2)
# Define some data
df <- data.frame(date = seq.Date(from = as.Date("2015-01-01"), by = 1, length.out = 400),
salinity = runif(400, min=5, max=7))
# Create fields for plotting
df$day <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date),
"-",
ifelse(day(df$date)<10,"0",""),
day(df$date))
df$month <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date))
df$year <- year(df$date)
#Plot results by month
ggplot(data=df) +
geom_tile(aes(x = month, y = year, fill = salinity)) +
scale_y_continuous(breaks = c(2015,2016))
#Plot results by day
ggplot(data=df) +
geom_tile(aes(x = day, y = year, fill = salinity)) +
scale_y_continuous(breaks = c(2015,2016))
Results by month:
Results by day (do you really want this? It's very hard to read with 366 x-axis values):

Plot a 24 hour cycle monthly for multiple variables?

I have data that can be mimicked in the following manner:
set.seed(1234)
foo <- data.frame(month = rep(month.name, each = 24),
hour = rep(seq(1:24), 12),
value1 = rnorm(nrow(foo), 60, 1),
value2 = rnorm(nrow(foo), 60, 1))
foo <- melt(foo, id = c('month', 'hour'))
I would like to create a plot for the entire year using ggplot that displays the 24 hour cycle of each variable per month.
Here's what I've tried so far:
t.plot <- ggplot(foo,
aes(interaction(month,hour), value, group = interaction(variable,hour)))
t.plot <- t.plot + geom_line(aes(colour = variable))
print(t.plot)
I get this, which throws the data into misalignment. For such a small SD you see that the first 24 values should be nearer to 60, but they are all over the place. I don't understand what's causing this discrepancy.
https://www.dropbox.com/s/rv6uxhe7wk7q35w/foo.png
when I plot:
plot(interaction(foo$month,foo$hour)[1:24], foo$value[1:24])
I get the shape that I would expect however the xaxis is very strange and not what I was expecting.
Any help?
The solution is to set your dates to be dates (not an interaction of a factor)
eg
library(lubridate)
library(reshape2)
Date <- as.Date(dmy('01-01-2000') + seq_len(24*365)*hours(1))
foo <- data.frame(Date = Date,
value1 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 24*365-1),
value2 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 24*365-1))
foo_melt <- melt(foo, id = 'Date')
# then you can use `scale_x_date` and `r` and ggplot2 will know they are dates
# load scales library to access date_format and date_breaks
library(scales)
ggplot(foo_melt, aes(x=Date, y=value, colour = variable)) +
geom_line() +
scale_x_date(breaks = date_breaks('month'),
labels = date_format('%b'), expand =c(0,0))
Edit 1 average day per month
you can use facet_wrap to facet by month
# using your created foo data set
levels(foo$month) <- sort(month.abb)
foo$month <- factor(foo$month, levels = month.abb)
ggplot(foo, aes(x = hour, y=value, colour = variable)) +
facet_wrap(~month) + geom_line() +
scale_x_continuous(expand = c(0,0)))

Resources