ggplot change order of continuous y axis values - r

I am trying to plot shift data by hour (integer) ordered by 3 different shifts worked (8-16, 16-24, 24-8) by day as the x-axis. The hours I have are 24hr format and I want to plot them not in numerical order (0-24) but by the shift order (8-16, 16-24, 24-8).
Here is the code to create the data and make the plot. I want to put the 0-8 chunk above the 16-24 chunk.
set.seed(123)
Hour = sample(0:24, 500, replace=T)
Day = sample(0:1, 500, replace=T)
dat <- as.tibble(cbind(Hour, Day)) %>%
mutate(Day = factor(ifelse(Day == 0, "Mon", "Tues")),
Shift = cut(Hour, 3, labels = c("0-8", "8-16", "16-24")),
Exposure = factor(sample(0:1, 500, replace=T)))
ggplot(dat, aes(x = Day, y = Hour)) +
geom_jitter(aes(color = Exposure, shape = Exposure)) +
geom_hline(yintercept = 8) +
geom_hline(yintercept = 16) +
theme_classic()
Current plot
It is an interesting problem, and I have tried recoding a new hour variable that is in the order that I want but then I'm not sure how to plot it displaying the standard 24hr variable.
How would i accomplish this ordering?

Not sure if I completely understand, but if you facet your table on the Shift column, it should do what you want. First you must factor the Shift column to the order you specify:
dat$Shift <- factor(dat$Shift, levels = c("0-8", "16-24", "8-16"))
ggplot(dat, aes(x = Day, y = Hour)) +
geom_jitter(aes(color = Exposure, shape = Exposure)) +
facet_grid(Shift ~ ., scales = "free") +
theme_classic()

set.seed(123)
Hour = sample(0:24, 500, replace=T)
Day = sample(0:1, 500, replace=T)
dat <- as.tibble(cbind(Hour, Day)) %>%
mutate(Day = factor(ifelse(Day == 0, "Mon", "Tues")),
Shift = cut(Hour, 3, labels = c("0-8", "8-16", "16-24")),
Exposure = factor(sample(0:1, 500, replace=T)))
dat$Shift <- factor(dat$Shift, levels=rev(levels(dat$Shift)))
ggplot(dat, aes(x = Day, y = Shift)) +
geom_jitter(aes(color = Exposure, shape = Exposure)) +
geom_hline(yintercept = 8) +
geom_hline(yintercept = 16) +
theme_classic()
You just need to reverse the level.

Related

plotting daily distribution of a time series data in R

I have a time series data (date column and a value column). I am trying for a daily distribution plot.
In the below image is the weekly distribution plot that plots the values of the days of the week. Similarly I am trying to plot a daily distribution plot where x axis would be months, y axis is the value and the plot has 10 lines where each line gives you the date 1, date 2 , date 3 and so on until date 10 (since 30 days in one subplot will be clumsy so i wanted to divide the plots into 3 , 1-10, 11-20 and 21-31)
Code for weekly distribution for reference:
#dummy data
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2021-12-31")
date_seq <- seq(from = start_date, to = end_date, by = "day")
set.seed(123)
value <- round(runif(length(date_seq), min = 10000, max = 100000000), 0)
df <- data.frame(date = date_seq, value = value)
df$week_number <- as.numeric(format(as.Date(df$date), "%U")) + 1
df$weekday <- weekdays(as.Date(df$date))
df$year <- as.numeric(format(as.Date(df$date), "%Y"))
years <- unique(df$year)
# Create a list of ggplots, one for each year
plots <- lapply(years, function(y) {
year_df <- df[df$year == y, ]
ggplot(year_df, aes(x = week_number, y = value, color = weekday)) +
geom_line() +
scale_color_discrete(limits = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")) +
ggtitle(paste("Weekday Distribution", y)) +
xlab("Week number") +
ylab("Value") +
theme(legend.key.size = unit(0.4, "cm")) +
theme(plot.title = element_text(hjust = 0.5, vjust = 1.5))
library(cowplot)
plot_grid(plotlist = plots, ncol = 1)
So at the end, there will be three plots(1 to 10 dates, 11 to 20 dates and 21 to 31 dates) and each plot would contain 2 subplots (as the dates ranges from 2020 to 2021). Can anyone help me with this?
Below how I would do this. The lubridate package is your friend. For the grouping, use cuts.
The result is a (in my opinion) pretty useless clutter of lines. But this is not the only reason why I do not endorse this visualisation. I feel this somehow defeats the point of a time series... one point is to visualise the auto-correlation of your data. Artificially separating out only specific days from each month impacts drastically on this particular advantage (and maybe: reason) of using a time series. You're not only losing information, but also making your own analytical life much more complicated.
library(ggplot2)
library(dplyr)
library(lubridate)
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = month, y = value, color = day, group=interaction(day, day_group))) +
geom_line() +
theme(legend.key.size = unit(0.4, "cm"),
plot.title = element_text(hjust = 0.5, vjust = 1.5),
axis.text.x = element_text(angle = 90)) +
facet_wrap(year~day_group)
I feel you want to show how the "typical" 1st day compares with the 2nd, etc. For this, an aggregate visualisation might be more useful. (Still not a good idea, but at least you get a better idea of your data). This you can do with "stat_summary" which you pass to geom_smooth which has a geometry that combines geom_line and geom_ribbon.
df %>%
mutate(day = mday(date),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = day, y = value)) +
geom_smooth(stat= "summary", alpha = .5, color = "black") +
facet_grid(~year)
#> No summary function supplied, defaulting to `mean_se()`
#> No summary function supplied, defaulting to `mean_se()`
Following on tjebo's answer, I would also suggest to if you must you can simply highlight a line of code that would convey something out of the clutter of lines, here is an example if you want to highlight the 11th day from the rest.
Plot
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T),
highlight = ifelse(day == 11, "Yes", "No")) %>%
ggplot(aes(x = month, y = value, color = highlight, group=interaction(day, day_group))) +
geom_line() +
theme_bw()+
theme(plot.title = element_text(hjust = 1, vjust = 2),
axis.text.x = element_text(angle = 90)) +
scale_color_manual(breaks = c("Yes", "No"),
labels = c("11th Day", "Other"),
values = c("Yes" = "red2", "No" = "grey60")) +
facet_wrap(year~day_group) +
guides(color = guide_legend(order = 1))

Reorder the data based on one column in ggplot

merged 5 data frame by the r_bind and want to draw a dot plot in ggplot. Based on the code I have the data are ordered based on the second column in the descending order. But what I need, reorder based on second column, where I have all the positive values in the descending order, then the negative values in the descending order of their absolute value, and the last have all the small values. Overall, I want to categorize the data in three group significant positive, significant negative and not significant. And in my bubble ggplot, the top have all the positive significant, then negative significant and the bottom on the plot just not significant. I generate some data to clarify more:
df <- data.frame(
Weekday = c("Fri", "Tues", "Mon", "Thurs","Mon", "Tues", "Wed", "Fri","Wed", "Thurs", "Fri"),
Quarter = c(rep("Q1", 3), rep("Q2", 5), rep("Q3",3)),
Delay = runif(11, -2,5),
pval = runif(11, 0,1))
df$Quarter <- factor(df$Quarter, levels = c("Q1", "Q2", "Q3"))
df %>%
mutate(
Weekday = fct_reorder2(
.f = Weekday,
.x = Delay,
.y = Quarter,
.fun = function(x,y){mean(x[y=="Q2"])}
)) %>%
ggplot(aes(x = Quarter, y = Weekday)) +
geom_point(aes(size = -log10(pval), color = Delay), alpha = 0.8) +
scale_size_binned(range = c(-2, 12)) +
scale_color_gradient(low = "mediumblue", high = "red2", space = "Lab") +
theme_bw() +
theme(axis.text.x = element_text(angle = 25, hjust = 1, size = 10)) +
ylab(NULL) + xlab(NULL)
Thank you.

How to (re)arrange panels of facet_wrap/_grid?

R noob here. I have been stumped on this graph all day and solutions like this and this this seem to hold my answer but I cannot get them to work for me.
I have a data frame that is a large version of the below sample which I am trying to plot using ggplot.
# create data
df <- data.frame(
"ID" = rep(1:5, each = 4),
"Date" = c(seq(as.Date("2019/09/18"), by = "day", length.out = 4),
seq(as.Date("2019/09/18"), by = "day", length.out = 4),
seq(as.Date("2020/08/07"), by = "day", length.out = 4),
seq(as.Date("2020/09/12"), by = "day", length.out = 4),
seq(as.Date("2020/09/29"), by = "day", length.out = 4)),
"MaxDepth" = round(runif(20, min = 10, max = 50), 1),
"Trip" = rep(1:5, each = 4)
)
# plot using ggplot
ggplot(df, aes(Date, MaxDepth, col = factor(Trip))) +
geom_line() +
facet_grid(ID ~ format(Date, "%Y"), scales = "free_x") +
scale_y_reverse() +
scale_x_date(date_labels = "%b") +
labs(title = "Daily maximum depth\n",
x = "",
y = "Depth [m]\n",
col = "Fishing trip")
This turns out nicely as a two column, eleven row faceted graph with the fishing trips as colours.
However, it includes a lot of empty panels which I would like to avoid by creating a one column graph still with all eleven ID rows but that are separated by the same split label the two columns had. I.e. I would like the two individuals that were in the LHS 2019 plot to have that 2019 label on top, separated by the 2020 label from the other 9 individuals.
.
Hope this is clear. Please correct me or let me know what to improve for a better question.
Grateful for any help! Even if those are suggestions that this is not a good way of representation or something like this is simply not possible. Thank you all!
Here is a possible way. I am not sure whether it works for your real data.
library(ggplot2)
library(patchwork)
library(dplyr)
plot_fun <- function(dtt){
ggplot(dtt, aes(Date, MaxDepth, col = factor(Trip))) +
geom_line() +
facet_grid(ID ~ format(Date, "%Y"), scales = "free_x") +
scale_y_reverse() +
scale_x_date(date_labels = "%b") +
labs(x = NULL, y = NULL, col = "Fishing trip")
}
p1 <- plot_fun(df %>% filter(format(Date, '%Y') == '2019'))
p2 <- plot_fun(df %>% filter(format(Date, '%Y') == '2020'))
p1 / p2
ggsave('~/Downloads/test.png', width = 6, height = 6)

How to find clusters of values over threshold for timeseries

I have timeseries and need to find clusters of values over threshold and plot that cluster on separate plot.
My code example. Unfortunately I don't know how to generate well clustered values.
#generate sample data
Sys.setlocale("LC_ALL","English")
set.seed(8)
Values <- sample(0:100,24241, replace = T)
Values <- rpois(24241, lambda=60)
start <- as.POSIXct("2012-01-15 06:10:00")
interval <- 15
end <- start + as.difftime(4, units="days") + as.difftime(5, units = "hours")
DateTimes <- seq(from=start, by=interval, to=end)
my_data_sample <- tibble(datetime = DateTimes, Value = Values)
threshold <- 82
ggplot(data = my_data_sample, aes(x = datetime, y = Value)) +
geom_line(size = 1, color = "darkgreen") +
geom_hline(yintercept=threshold, linetype="dashed", color = "red") +
theme_bw() +
labs(
x= "" ,
y = "",
title = paste("Threshold:", threshold )
) +
scale_x_datetime(date_breaks = "8 hour", labels = date_format("%b %d - %H:%M")) +
theme(axis.text.x = element_text(angle = 25, vjust = 1.0, hjust = 1.0))
Here is what I need:
I need to find clusters of values over threshold - consecutive or near each other, sort that clusters using cluster length in seconds (longest clusters) or sum of values (most powerful clusters), and plot let's say top 3 of that time periods on separate plots.
Any suggestions how to do that?
You can find runs that follow some expectation using run-length encoding (RLE). At the RLE level, you can filter out runs that are too short on either side. You can play with the run_threshold value until it matches your data.
# Put some actual deviating runs in the data
my_data_sample$Value[5001:5100] <- rpois(100, lambda = 80)
my_data_sample$Value[10001:11000] <- rpois(1000, lambda = 80)
threshold <- 82
rle <- rle(my_data_sample$Value > threshold)
# Find sub-threshold values in between super-threshold values,
# convert these to other class
run_threshold <- 20
rle$values[!rle$values & rle$lengths < run_threshold] <- TRUE
# Restructure rle
rle <- rle(inverse.rle(rle))
# Find short super-threshold values to filter
run_threshold <- 5
rle$values[rle$values & rle$lengths < run_threshold] <- FALSE
rle <- rle(inverse.rle(rle))
# Find run starts and ends
rle_start <- {rle_end <- cumsum(rle$lengths)} - rle$lengths + 1
# Format as data.frame for ggplot
rle_df <- data.frame(
min = my_data_sample$datetime[rle_start],
max = my_data_sample$datetime[rle_end],
value = rle$values
)
ggplot(data = my_data_sample, aes(x = datetime, y = Value)) +
geom_line(size = 1, color = "darkgreen") +
geom_rect(aes(xmin = min, xmax = max, ymin = 0, ymax = 10, fill = value),
data = rle_df, inherit.aes = FALSE) +
geom_hline(yintercept=threshold, linetype="dashed", color = "red") +
theme_bw() +
labs(
x= "" ,
y = "",
title = paste("Threshold:", threshold )
) +
scale_x_datetime(date_breaks = "8 hour", labels = date_format("%b %d - %H:%M")) +
theme(axis.text.x = element_text(angle = 25, vjust = 1.0, hjust = 1.0))

Plot a 24 hour cycle monthly for multiple variables?

I have data that can be mimicked in the following manner:
set.seed(1234)
foo <- data.frame(month = rep(month.name, each = 24),
hour = rep(seq(1:24), 12),
value1 = rnorm(nrow(foo), 60, 1),
value2 = rnorm(nrow(foo), 60, 1))
foo <- melt(foo, id = c('month', 'hour'))
I would like to create a plot for the entire year using ggplot that displays the 24 hour cycle of each variable per month.
Here's what I've tried so far:
t.plot <- ggplot(foo,
aes(interaction(month,hour), value, group = interaction(variable,hour)))
t.plot <- t.plot + geom_line(aes(colour = variable))
print(t.plot)
I get this, which throws the data into misalignment. For such a small SD you see that the first 24 values should be nearer to 60, but they are all over the place. I don't understand what's causing this discrepancy.
https://www.dropbox.com/s/rv6uxhe7wk7q35w/foo.png
when I plot:
plot(interaction(foo$month,foo$hour)[1:24], foo$value[1:24])
I get the shape that I would expect however the xaxis is very strange and not what I was expecting.
Any help?
The solution is to set your dates to be dates (not an interaction of a factor)
eg
library(lubridate)
library(reshape2)
Date <- as.Date(dmy('01-01-2000') + seq_len(24*365)*hours(1))
foo <- data.frame(Date = Date,
value1 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 24*365-1),
value2 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 24*365-1))
foo_melt <- melt(foo, id = 'Date')
# then you can use `scale_x_date` and `r` and ggplot2 will know they are dates
# load scales library to access date_format and date_breaks
library(scales)
ggplot(foo_melt, aes(x=Date, y=value, colour = variable)) +
geom_line() +
scale_x_date(breaks = date_breaks('month'),
labels = date_format('%b'), expand =c(0,0))
Edit 1 average day per month
you can use facet_wrap to facet by month
# using your created foo data set
levels(foo$month) <- sort(month.abb)
foo$month <- factor(foo$month, levels = month.abb)
ggplot(foo, aes(x = hour, y=value, colour = variable)) +
facet_wrap(~month) + geom_line() +
scale_x_continuous(expand = c(0,0)))

Resources