R heat map of annual time series by entire year - r

I am trying to make a heatmap of several years of daily averages of salinity in an estuary in R.
I would like the format to include month on the x-axis and year on the y-axis, so each Jan 1st directly above another Jan. 1st. In other words, NOT like a typical annual calendar style (not like this: http://www.r-bloggers.com/ggplot2-time-series-heatmaps/).
So far I have only been able to plot by the day of the year using:
{r}
d <- read.xlsx('GC salinity transposed.xlsx', sheetName = "vert-3", header = TRUE, stringsAsFactors = FALSE, colClasses = c("integer", "integer", "numeric"), endRow = 2254)
{r}
ggplot(d, aes(x = Day.Number, y = Year)) + geom_tile(aes(fill = Salinity)) + scale_fill_gradient(name = 'Mean Daily Salinity', low = 'white', high = 'blue') + theme(axis.title.y = element_blank())
And get this:
heat map not quite right
Could someone please tell me a better way to do this - a way that would include month, rather than day of the year along the x-axis? Thank you. New to R.

The lubridate package comes in handy for stuff like this. Does this code do what you want? I'm assuming you only have one salinity reading per month and there's no need to average across multiple values in the same month.
library(lubridate)
library(ggplot2)
# Define some data
df <- data.frame(date = seq.Date(from = as.Date("2015-01-01"), by = 1, length.out = 400),
salinity = runif(400, min=5, max=7))
# Create fields for plotting
df$day <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date),
"-",
ifelse(day(df$date)<10,"0",""),
day(df$date))
df$month <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date))
df$year <- year(df$date)
library(lubridate)
library(ggplot2)
# Define some data
df <- data.frame(date = seq.Date(from = as.Date("2015-01-01"), by = 1, length.out = 400),
salinity = runif(400, min=5, max=7))
# Create fields for plotting
df$day <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date),
"-",
ifelse(day(df$date)<10,"0",""),
day(df$date))
df$month <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date))
df$year <- year(df$date)
#Plot results by month
ggplot(data=df) +
geom_tile(aes(x = month, y = year, fill = salinity)) +
scale_y_continuous(breaks = c(2015,2016))
#Plot results by day
ggplot(data=df) +
geom_tile(aes(x = day, y = year, fill = salinity)) +
scale_y_continuous(breaks = c(2015,2016))
Results by month:
Results by day (do you really want this? It's very hard to read with 366 x-axis values):

Related

plotting daily distribution of a time series data in R

I have a time series data (date column and a value column). I am trying for a daily distribution plot.
In the below image is the weekly distribution plot that plots the values of the days of the week. Similarly I am trying to plot a daily distribution plot where x axis would be months, y axis is the value and the plot has 10 lines where each line gives you the date 1, date 2 , date 3 and so on until date 10 (since 30 days in one subplot will be clumsy so i wanted to divide the plots into 3 , 1-10, 11-20 and 21-31)
Code for weekly distribution for reference:
#dummy data
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2021-12-31")
date_seq <- seq(from = start_date, to = end_date, by = "day")
set.seed(123)
value <- round(runif(length(date_seq), min = 10000, max = 100000000), 0)
df <- data.frame(date = date_seq, value = value)
df$week_number <- as.numeric(format(as.Date(df$date), "%U")) + 1
df$weekday <- weekdays(as.Date(df$date))
df$year <- as.numeric(format(as.Date(df$date), "%Y"))
years <- unique(df$year)
# Create a list of ggplots, one for each year
plots <- lapply(years, function(y) {
year_df <- df[df$year == y, ]
ggplot(year_df, aes(x = week_number, y = value, color = weekday)) +
geom_line() +
scale_color_discrete(limits = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")) +
ggtitle(paste("Weekday Distribution", y)) +
xlab("Week number") +
ylab("Value") +
theme(legend.key.size = unit(0.4, "cm")) +
theme(plot.title = element_text(hjust = 0.5, vjust = 1.5))
library(cowplot)
plot_grid(plotlist = plots, ncol = 1)
So at the end, there will be three plots(1 to 10 dates, 11 to 20 dates and 21 to 31 dates) and each plot would contain 2 subplots (as the dates ranges from 2020 to 2021). Can anyone help me with this?
Below how I would do this. The lubridate package is your friend. For the grouping, use cuts.
The result is a (in my opinion) pretty useless clutter of lines. But this is not the only reason why I do not endorse this visualisation. I feel this somehow defeats the point of a time series... one point is to visualise the auto-correlation of your data. Artificially separating out only specific days from each month impacts drastically on this particular advantage (and maybe: reason) of using a time series. You're not only losing information, but also making your own analytical life much more complicated.
library(ggplot2)
library(dplyr)
library(lubridate)
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = month, y = value, color = day, group=interaction(day, day_group))) +
geom_line() +
theme(legend.key.size = unit(0.4, "cm"),
plot.title = element_text(hjust = 0.5, vjust = 1.5),
axis.text.x = element_text(angle = 90)) +
facet_wrap(year~day_group)
I feel you want to show how the "typical" 1st day compares with the 2nd, etc. For this, an aggregate visualisation might be more useful. (Still not a good idea, but at least you get a better idea of your data). This you can do with "stat_summary" which you pass to geom_smooth which has a geometry that combines geom_line and geom_ribbon.
df %>%
mutate(day = mday(date),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = day, y = value)) +
geom_smooth(stat= "summary", alpha = .5, color = "black") +
facet_grid(~year)
#> No summary function supplied, defaulting to `mean_se()`
#> No summary function supplied, defaulting to `mean_se()`
Following on tjebo's answer, I would also suggest to if you must you can simply highlight a line of code that would convey something out of the clutter of lines, here is an example if you want to highlight the 11th day from the rest.
Plot
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T),
highlight = ifelse(day == 11, "Yes", "No")) %>%
ggplot(aes(x = month, y = value, color = highlight, group=interaction(day, day_group))) +
geom_line() +
theme_bw()+
theme(plot.title = element_text(hjust = 1, vjust = 2),
axis.text.x = element_text(angle = 90)) +
scale_color_manual(breaks = c("Yes", "No"),
labels = c("11th Day", "Other"),
values = c("Yes" = "red2", "No" = "grey60")) +
facet_wrap(year~day_group) +
guides(color = guide_legend(order = 1))

I need help plotting using smooth.spline in r

I am using spline.smooth in r and need some help. Here are my codes:
my_df_1 <- data.frame(date = seq(as.Date("2020/1/15"), by = "day", length.out = 100),
rate = runif(100, min=0, max=0.7),
count=sample(2:50, 100, replace = TRUE))
fit <- smooth.spline(my_df_1$date, my_df_1$rate)
plot(my_df_1$date, my_df_1$rate)
lines(fit)
First, I want to change the size of points on plot using column count. Second, is there a way to plot smooth spline in ggplot? Third, how can I format my x-axis for the date. I want to have a day and month in the x-axis. Thanks in advance everyone!
How about this:
my_df_1 %>%
ggplot(aes(x = date, y = rate)) +
geom_smooth() +
geom_point(aes(size = count)) +
scale_x_date(date_labels = "%d-%B")

adjust x axis dates to match start of the ("fiscal") year

Is there any way to shift the dates of a seasonal graph so that they match an arbitrary fiscal year (for example MARCH/FEB instead of DEC/JAN)?
I have this of far:
startMonth = 3 # march
startDate = as.Date(paste0(year(today()), '-', startMonth, '-1'))
dts = seq.Date(from = today() - 500, to = today(), by = 'day')
dat = data.frame(date = dts, value = runif(n = length(dts), min = 1, max = 10))
dat$month = month(dat$date)
dat$year = year(dat$date)
dat$yearPlot = ifelse(test = dat$month < startMonth, yes = (dat$year - 1), no = dat$year)
dat$year = as.character(dat$year)
dat$ydaydiff = yday(dat$date) - yday(startDate)
dat$datePlot1 = ifelse(dat$ydaydiff < 0, dat$ydaydiff + 365, dat$ydaydiff)
dat$datePlot1 = as.Date('0001-01-01') + days(dat$datePlot1)
dat$yearPlot = as.character(dat$yearPlot)
ggplot(dat) +
geom_path(aes(x = datePlot1, y = value, color = yearPlot)) +
scale_x_date(date_labels = '%b', )
Which makes this plot:
However I'd like the x-axis to start at March instead of Jan. Is there any way to adjust this? I thought of using the month column in dat but not sure how to implement.
Here is a not-too-pretty solution. The logic somewhat follows your own: find the starting date/time for each fiscal year (March 1 = time 1) and the last date/time (Feb 28 = time 365). Use this separate 'time' variable as your x-axis, then re-label the tick marks. You can change the scale_x_continuous() breaks and labels to get your desired dates along the x-axis.
t <- data.frame(date=seq.Date(as.Date('2018-03-01'),as.Date('2020-02-28'),by='days'),
fy=1)
t$fy[t$date>='2019-03-01'] <- 2
t <- t %>% group_by(fy) %>% mutate(time=seq(1:n()))
dat <- left_join(dat,t)
dat %>% ggplot(.) +
geom_path(aes(x = time, y = value, color = factor(fy),group=fy)) +
scale_x_continuous(breaks = c(1,100,200,300),labels=c('March 1','June 8','Sept 16','Dec 25'))
The breaks_width argument to scale_x_date() allows you to offset the breaks by a few months in a year.
The labels argument accepts a function to format the labels as a fiscal year, e.g. to convert a date 2019-03-01 to "19/20".
# Function to create fiscal year labels like "14/15" for the 2014/15 fiscal year
fiscal_year <- function(x) {
year_number <- lubridate::year(x)
paste(substr(year_number, 3, 4),
substr(year_number + 1, 3, 4),
sep = "/")
}
ggplot(dat) +
geom_path(aes(x = date, y = value, color = yearPlot)) +
scale_x_date(labels = fiscal_year, # Use the function to create the labels
breaks = scales::breaks_width("1 year", offset = 90)) # Offset by 90 days to March

How to Create a Graph of Statistical Time Series

I have data in the following format:
Date Year Month Day Flow
1 1953-10-01 1953 10 1 530
2 1953-10-02 1953 10 2 530
3 1953-10-03 1953 10 3 530
I would like to create a graph like this:
Here is my current image and code:
library(ggplot2)
library(plyr)
library(reshape2)
library(scales)
## Read Data
df <- read.csv("Salt River Flow.csv")
## Convert Date column to R-recognized dates
df$Date <- as.Date(df$Date, "%m/%d/%Y")
## Finds Water Years (Oct - Sept)
df$WY <- as.POSIXlt(as.POSIXlt(df$Date)+7948800)$year+1900
## Normalizes Water Years so stats can be applied to just months and days
df$w <- ifelse(month(df$Date) %in% c(10,11,12), 1903, 1904)
##Creates New Date (dat) Column
df$dat <- as.Date(paste(df$w,month(df$Date),day(df$Date), sep = "-"))
## Creates new data frame with summarised data by MonthDay
PlotData <- ddply(df, .(dat), summarise, Min = min(Flow), Tenth = quantile(Flow, p = 0.05), TwentyFifth = quantile(Flow, p = 0.25), Median = quantile(Flow, p = 0.50), Mean = mean(Flow), SeventyFifth = quantile(Flow, p = 0.75), Ninetieth = quantile(Flow, p = 0.90), Max = max(Flow))
## Melts data so it can be plotted with ggplot
m <- melt(PlotData, id="dat")
## Plots
p <- ggplot(m, aes(x = dat)) +
geom_ribbon(aes(min = TwentyFifth, max = Median), data = PlotData, fill = alpha("black", 0.1), color = NA) +
geom_ribbon(aes(min = Median, max = SeventyFifth), data = PlotData, fill = alpha("black", 0.5), color = NA) +
scale_x_date(labels = date_format("%b"), breaks = date_breaks("month"), expand = c(0,0)) +
geom_line(data = subset(m, variable == "Mean"), aes(y = value), size = 1.2) +
theme_bw() +
geom_line(data = subset(m, variable %in% c("Min","Max")), aes(y = value, group = variable)) +
geom_line(data = subset(m, variable %in% c("Ninetieth","Tenth")), aes(y = value, group = variable), linetype = 2) +
labs(x = "Water Year", y = "Flow (cfs)")
p
I am very close but there are some issues I'm having. First, if you can see a way to improve my code, please let me know. The main problem I ran into was that I needed two dataframes to make this graph: one melted, and one not. The unmelted dataframe was necessary (I think) to create the ribbons. I tried many ways to use the melted dataframe for the ribbons, but there was always a problem with the aesthetic length.
Second, I know to have a legend - and I want one, I need to have something in the aesthetics of each line/ribbon, but I am having trouble getting that to work. I think it would involve scale_fill_manual.
Third, and I don't know if this is possible, I would like to have each month label in between the tick marks, not on them (like in the above image).
Any help is greatly appreciated (especially with creating more efficient code).
Thank you.
Something along these lines might get you close with base:
library(lubridate)
library(reshape2)
# simulating data...
Date <- seq(as.Date("1953-10-01"),as.Date("2010-10-01"),by="day")
Year <- year(Date)
Month <- month(Date)
Day <- day(Date)
set.seed(1)
Flow <- rpois(length(Date), 2000)
Data <- data.frame(Date=Date,Year=Year,Month=Month,Day=Day,Flow=Flow)
# use acast to get it in a convenient shape:
PlotData <- acast(Data,Year~Month+Day,value.var="Flow")
# apply for quantiles
Quantiles <- apply(PlotData,2,function(x){
quantile(x,probs=c(1,.9,.75,.5,.25,.1,0),na.rm=TRUE)
})
Mean <- colMeans(PlotData, na.rm=TRUE)
# ugly way to get month tick separators
MonthTicks <- cumsum(table(unlist(lapply(strsplit(names(Mean),split="_"),"[[",1))))
# and finally your question:
plot(1:366,seq(0,max(Flow),length=366),type="n",xlab = "Water Year",ylab="Discharge",axes=FALSE)
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["75%",])),border=NA,col=gray(.6))
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["25%",])),border=NA,col=gray(.4))
lines(1:366,Quantiles["90%",], col = gray(.5), lty=4)
lines(1:366,Quantiles["10%",], col = gray(.5))
lines(1:366,Quantiles["100%",], col = gray(.7))
lines(1:366,Quantiles["0%",], col = gray(.7), lty=4)
lines(1:366,Mean,lwd=3)
axis(1,at=MonthTicks, labels=NA)
text(MonthTicks-15,-100,1:12,pos=1,xpd=TRUE)
axis(2)
The plotting code really isn't that tricky. You'll need to clean up the aesthetics, but polygon() is usually my strategy for shaded regions in plots (confidence bands, whatever).
Perhaps this will get you closer to what you're looking for, using ggplot2 and plyr:
library(ggplot2)
library(plyr)
library(lubridate)
library(scales)
df$MonthDay <- df$Date - years( year(df$Date) + 100 ) #Normalize points to same year
df <- ddply(df, .(Month, Day), mutate, MaxDayFlow = max(Flow) ) #Max flow on day
df <- ddply(df, .(Month, Day), mutate, MinDayFlow = min(Flow) ) #Min flow on day
p <- ggplot(df, aes(x=MonthDay) ) +
geom_smooth(size=2,level=.8,color="black",aes(y=Flow)) + #80% conf. interval
geom_smooth(size=2,level=.5,color="black",aes(y=Flow)) + #50% conf. interval
geom_line( linetype="longdash", aes(y=MaxDayFlow) ) +
geom_line( linetype="longdash", aes(y=MinDayFlow) ) +
labs(x="Month",y="Flow") +
scale_x_date( labels = date_format("%b") ) +
theme_bw()
Edit: Fixed X scale and X scale label
(Partial answer with base plotting function and not including the min, max, or mean.) I suspect you will need to construct a dataset before passing to ggplot, since that is typical for that function. I already do something similar and then pass the resulting matrix to matplot. (It doesn't do that kewl highlighting, but maybe ggplot can do it>
HDL.mon.mat <- aggregate(dfrm$Flow,
list( dfrm$Year + dfrm$Month/12),
quantile, prob=c(0.1,0.25,0.5,0.75, 0.9), na.rm=TRUE)
matplot(HDL.mon.mat[,1], HDL.mon.mat$x, type="pl")

Plot a 24 hour cycle monthly for multiple variables?

I have data that can be mimicked in the following manner:
set.seed(1234)
foo <- data.frame(month = rep(month.name, each = 24),
hour = rep(seq(1:24), 12),
value1 = rnorm(nrow(foo), 60, 1),
value2 = rnorm(nrow(foo), 60, 1))
foo <- melt(foo, id = c('month', 'hour'))
I would like to create a plot for the entire year using ggplot that displays the 24 hour cycle of each variable per month.
Here's what I've tried so far:
t.plot <- ggplot(foo,
aes(interaction(month,hour), value, group = interaction(variable,hour)))
t.plot <- t.plot + geom_line(aes(colour = variable))
print(t.plot)
I get this, which throws the data into misalignment. For such a small SD you see that the first 24 values should be nearer to 60, but they are all over the place. I don't understand what's causing this discrepancy.
https://www.dropbox.com/s/rv6uxhe7wk7q35w/foo.png
when I plot:
plot(interaction(foo$month,foo$hour)[1:24], foo$value[1:24])
I get the shape that I would expect however the xaxis is very strange and not what I was expecting.
Any help?
The solution is to set your dates to be dates (not an interaction of a factor)
eg
library(lubridate)
library(reshape2)
Date <- as.Date(dmy('01-01-2000') + seq_len(24*365)*hours(1))
foo <- data.frame(Date = Date,
value1 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 24*365-1),
value2 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 24*365-1))
foo_melt <- melt(foo, id = 'Date')
# then you can use `scale_x_date` and `r` and ggplot2 will know they are dates
# load scales library to access date_format and date_breaks
library(scales)
ggplot(foo_melt, aes(x=Date, y=value, colour = variable)) +
geom_line() +
scale_x_date(breaks = date_breaks('month'),
labels = date_format('%b'), expand =c(0,0))
Edit 1 average day per month
you can use facet_wrap to facet by month
# using your created foo data set
levels(foo$month) <- sort(month.abb)
foo$month <- factor(foo$month, levels = month.abb)
ggplot(foo, aes(x = hour, y=value, colour = variable)) +
facet_wrap(~month) + geom_line() +
scale_x_continuous(expand = c(0,0)))

Resources