I am trying to plot multiple time-periods on the same time-series graph by month. This is my data: https://pastebin.com/458t2YLg. I was trying to avoid dput() example but I think it would have caused confusion to reduce the sample and still keep the structure of the original data. Here is basically a glimpse of how it looks like:
date fl_all_cumsum
671 2015-11-02 0.785000
672 2015-11-03 1.046667
673 2015-11-04 1.046667
674 2015-11-05 1.099000
675 2015-11-06 1.099000
676 2015-11-07 1.099000
677 2015-11-08 1.151333
Basically, it is daily data that spans several years. My goal is to compare the cumulative snow gliding (fl_all_cumsum) of several winter seasons (
It is very similar to this: ggplot: Multiple years on same plot by month however, there are some differences, such as: 1) the time periods are not years but winter seasons (1.10.xxxx - 6.30.xxxx+1); 2) Because I care only about the winter periods I would like the x-axis to go only from October to end of June the following year; 3) the data is not consistent (there are a lot of NA gaps during the months).
I managed to produce this:
library(zoo)
library(lubridate)
library(ggplot2)
library(scales)
library(patchwork)
library(dplyr)
library(data.table)
startTime <- as.Date("2016-10-01")
endTime <- as.Date("2017-06-30")
start_end <- c(startTime,endTime)
ggplot(data = master_dataset, aes(x = date, y = fl_all_cumsum))+
geom_line(size = 1, na.rm=TRUE)+
ggtitle("Cumulative Seasonal Gliding Distance")+
labs(color = "")+
xlab("Month")+
ylab("Accumulated Distance [mm]")+
scale_x_date(limits=start_end,breaks=date_breaks("1 month"),labels=date_format("%d %b"))+
theme(axis.text.x = element_text(angle = 50, size = 10 , vjust = 0.5),
axis.text.y = element_text(size = 10, vjust = 0.5),
panel.background = element_rect(fill = "gray100"),
plot.background = element_rect(fill = "gray100"),
panel.grid.major = element_line(colour = "lightblue"),
plot.margin = unit(c(1, 1, 1, 1), "cm"),
plot.title = element_text(hjust = 0.5, size = 22))
This actually works good visually as the x axis goes from October to June as desired; however, I did it by setting limits,
startTime <- as.Date("2016-10-01")
endTime <- as.Date("2017-06-30")
start_end <- c(startTime,endTime)
and then setting breaks of 1 month.
scale_x_date(limits=start_end,breaks=date_breaks("1 month"),labels=date_format("%d %b"))+
It is needless to say that this technique will not work if I would like to include other winter seasons and a legend.
I also tried to assign a season to certain time periods and then use them as a factor:
master_dataset <- master_dataset %>%
mutate(season = case_when(date>=as.Date('2015-11-02')&date<=as.Date('2016-06-30')~"season 2015-16",
date>=as.Date('2016-11-02')&date<=as.Date('2017-06-30')~"season 2016-17",
date>=as.Date('2017-10-13')&date<=as.Date('2018-06-30')~"season 2017-18",
date>=as.Date('2018-10-18')&date<=as.Date('2019-06-30')~"season 2018-19"))
ggplot(master_dataset, aes(month(date, label=TRUE, abbr=TRUE), fl_all_cumsum, group=factor(season),colour=factor(season)))+
geom_line()+
labs(x="Month", colour="Season")+
theme_classic()
As you can see, I managed to include the other seasons in the graph but there are several issues now:
grouped by month it aggregates the daily values and I lose the daily dynamic in the graph (look how it is based on monthly steps)
the x-axis goes in chronological order which messes up my visualization (remember I care for the winter season development so I need the x-axis to go from October-End of June; see the first graph I produced)
Not big of an issue but because the data has NA gaps, the legend also shows a factor "NA"
I am not a programmer so I can't wrap my mind around on how to code for such an issue. In a perfect world, I would like to have something like the first graph I produced but with all winter seasons included and a legend. Does someone have a solution for this? Thanks in advance.
Zorin
This is indeed kind of a pain and rather fiddly. I create "fake dates" that are the same as your date column, but the year is set to 2015/2016 (using 2016 for the dates that will fall in February so leap days are not lost). Then we plot all the data, telling ggplot that it's all 2015-2016 so it gets plotted on the same axis, but we don't label the year. (The season labels are used and are not "fake".)
## Configure some constants:
start_month = 10 # first month on x-axis
end_month = 6 # last month on x-axis
fake_year_start = 2015 # year we'll use for start_month-December
fake_year_end = fake_year_start + 1 # year we'll use for January-end_month
fake_limits = c( # x-axis limits for plot
ymd(paste(fake_year_start, start_month, "01", sep = "-")),
ceiling_date(ymd(paste(fake_year_end, end_month, "01", sep = "-")), unit = "month")
)
df = df %>%
mutate(
## add (real) year and month columns
year = year(date),
month = month(date),
## add the year for the season start and end
season_start = ifelse(month >= start_month, year, year - 1),
season_end = season_start + 1,
## create season label
season = paste(season_start, substr(season_end, 3, 4), sep = "-"),
## add the appropriate fake year
fake_year = ifelse(month >= start_month, fake_year_start, fake_year_end),
## make a fake_date that is the same as the real date
## except set all the years to the fake_year
fake_date = date,
fake_date = "year<-"(fake_date, fake_year)
) %>%
filter(
## drop irrelevant data
month >= start_month | month <= end_month,
!is.na(fl_all_cumsum)
)
ggplot(df, aes(x = fake_date, y = fl_all_cumsum, group = season,colour= season))+
geom_line()+
labs(x="Month", colour = "Season")+
scale_x_date(
limits = fake_limits,
breaks = scales::date_breaks("1 month"),
labels = scales::date_format("%d %b")
) +
theme_classic()
Related
I have a time series data (date column and a value column). I am trying for a daily distribution plot.
In the below image is the weekly distribution plot that plots the values of the days of the week. Similarly I am trying to plot a daily distribution plot where x axis would be months, y axis is the value and the plot has 10 lines where each line gives you the date 1, date 2 , date 3 and so on until date 10 (since 30 days in one subplot will be clumsy so i wanted to divide the plots into 3 , 1-10, 11-20 and 21-31)
Code for weekly distribution for reference:
#dummy data
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2021-12-31")
date_seq <- seq(from = start_date, to = end_date, by = "day")
set.seed(123)
value <- round(runif(length(date_seq), min = 10000, max = 100000000), 0)
df <- data.frame(date = date_seq, value = value)
df$week_number <- as.numeric(format(as.Date(df$date), "%U")) + 1
df$weekday <- weekdays(as.Date(df$date))
df$year <- as.numeric(format(as.Date(df$date), "%Y"))
years <- unique(df$year)
# Create a list of ggplots, one for each year
plots <- lapply(years, function(y) {
year_df <- df[df$year == y, ]
ggplot(year_df, aes(x = week_number, y = value, color = weekday)) +
geom_line() +
scale_color_discrete(limits = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")) +
ggtitle(paste("Weekday Distribution", y)) +
xlab("Week number") +
ylab("Value") +
theme(legend.key.size = unit(0.4, "cm")) +
theme(plot.title = element_text(hjust = 0.5, vjust = 1.5))
library(cowplot)
plot_grid(plotlist = plots, ncol = 1)
So at the end, there will be three plots(1 to 10 dates, 11 to 20 dates and 21 to 31 dates) and each plot would contain 2 subplots (as the dates ranges from 2020 to 2021). Can anyone help me with this?
Below how I would do this. The lubridate package is your friend. For the grouping, use cuts.
The result is a (in my opinion) pretty useless clutter of lines. But this is not the only reason why I do not endorse this visualisation. I feel this somehow defeats the point of a time series... one point is to visualise the auto-correlation of your data. Artificially separating out only specific days from each month impacts drastically on this particular advantage (and maybe: reason) of using a time series. You're not only losing information, but also making your own analytical life much more complicated.
library(ggplot2)
library(dplyr)
library(lubridate)
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = month, y = value, color = day, group=interaction(day, day_group))) +
geom_line() +
theme(legend.key.size = unit(0.4, "cm"),
plot.title = element_text(hjust = 0.5, vjust = 1.5),
axis.text.x = element_text(angle = 90)) +
facet_wrap(year~day_group)
I feel you want to show how the "typical" 1st day compares with the 2nd, etc. For this, an aggregate visualisation might be more useful. (Still not a good idea, but at least you get a better idea of your data). This you can do with "stat_summary" which you pass to geom_smooth which has a geometry that combines geom_line and geom_ribbon.
df %>%
mutate(day = mday(date),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = day, y = value)) +
geom_smooth(stat= "summary", alpha = .5, color = "black") +
facet_grid(~year)
#> No summary function supplied, defaulting to `mean_se()`
#> No summary function supplied, defaulting to `mean_se()`
Following on tjebo's answer, I would also suggest to if you must you can simply highlight a line of code that would convey something out of the clutter of lines, here is an example if you want to highlight the 11th day from the rest.
Plot
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T),
highlight = ifelse(day == 11, "Yes", "No")) %>%
ggplot(aes(x = month, y = value, color = highlight, group=interaction(day, day_group))) +
geom_line() +
theme_bw()+
theme(plot.title = element_text(hjust = 1, vjust = 2),
axis.text.x = element_text(angle = 90)) +
scale_color_manual(breaks = c("Yes", "No"),
labels = c("11th Day", "Other"),
values = c("Yes" = "red2", "No" = "grey60")) +
facet_wrap(year~day_group) +
guides(color = guide_legend(order = 1))
Here's a reproductible example taken from the R Graph Gallery:
library(ggplot2)
library(dplyr)
library(viridis)
library(Interpol.T)
library(lubridate)
library(ggExtra)
library(tidyr)
data <- data(Trentino_hourly_T,package = "Interpol.T")
names(h_d_t)[1:5]<- c("stationid","date","hour","temp","flag")
df <- as_tibble(h_d_t) %>%
filter(stationid =="T0001")
df$date<-ymd(df$date)
df <- df %>% mutate(date = ymd(date),
year = year(date),
month = month(date, label=TRUE),
day = day(date))
rm(list=c("h_d_t","mo_bias","Tn","Tx",
"Th_int_list","calibration_l",
"calibration_shape","Tm_list"))
df <- df %>%
filter(between(date, as.Date("2004-02-13"), as.Date("2004-04-29")) | between(date, as.Date("2005-02-13"), as.Date("2005-04-29")))
df <-df %>% select(stationid,day,hour,month,year,temp)%>%
fill(temp)
statno <-unique(df$stationid)
######## Plotting starts here#####################
p <-ggplot(df, aes(day,hour,fill=temp))+
geom_tile(color= "white",size=0.1) +
scale_fill_viridis(name="Hrly Temps C",option ="C") +
facet_grid(year~month, scales = "free") +
scale_y_continuous(trans = "reverse", breaks = unique(df$hour)) +
theme_minimal(base_size = 8) +
labs(title= paste("Hourly Temps - Station",statno), x="Day", y="Hour Commencing") +
theme(legend.position = "bottom",
plot.title=element_text(size = 14, hjust = 0),
axis.text.y=element_text(size=6),
strip.background = element_rect(colour="white"),
axis.ticks=element_blank(),
axis.text=element_text(size=7),
legend.text=element_text(size=6))+
removeGrid()
What is bothering me is that the x axis breaks don't show explicitly the first and last day of each month, even worse they show a February 30th, a March 0th and a April 0th.
My goal is to use a function that automatically and explicitly shows the REAL first and last day of each ploted month (in the example February 13th - February 29th, March 1st - March 31th and April 1st - April 29th) with 4 to 6 breaks within each month.
As this plot will be shown in a shiny app where the user can change the time period ploted, the solution REALLY needs to be automated.
Here are some things I've tried:
library(scales)
p + scale_x_continuous(breaks =breaks_pretty())
But it doesn't change much.
I've tried to write my own function but something horrible happened:
breaksFUN <- function(x){
round(seq(min(x), max(x), length.out = 5), 0)
}
p + scale_x_continuous(breaks =breaksFUN)
Thank you in advance.
Thank you Axeman for your contribution, it really helped! It works for my example but i've encountered some issues trying it out in my data. However, I modified it and it works properly now, here's my solution inspired by Axeman:
breaksFUN <- function(x) {
s <- round(c(seq(min(x) + 1.5, max(x) - 5.5, length.out = 4), max(x) - 1.5))
s[s == 0] <- 1
s[s > 31] <- 31
s <- round(seq(range(s)[1], range(s)[2], length.out = 5))
unique(s)
}
p + scale_x_continuous(breaks = breaksFUN)
I know the plot generated using below code would be pretty messy because the data is generated using uniform distribution. I want to select only 5 years where we observed highest value based on Variable A. That means, values for Variable X may not be the high values as we are conditioning selection based on Variable A. Multiple peak in any particular year should be considered as one occurrence. Once the top 5 years are sorted, i would then like to have a plot like below.
library(tidyverse)
library(lubridate)
set.seed(1500)
FakeData <- data.frame(Date = seq(as.Date("2001-01-01"), to= as.Date("2010-12-31"), by="day"),
A = runif(3652, 10,100),
X = runif(3652, 5,500)) %>%
mutate(Year = year(Date), JDay = yday(Date))
ggplot(FakeData, aes(JDay, A, col=as.factor(Year)))+
geom_line()+
theme_bw() + labs(col = "Year", x = "Month of the year", y = "Levels (m)")+
theme(axis.text = element_text(size = 14), axis.title = element_text(size = 16, face = "bold"))+
scale_x_continuous(breaks = c(1,32,60,91,121,152,182,213,244,274,305,335),
labels = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),
limits = c(0,365), expand = c(0, 0))
Output
for example, in Figure below, i am plotting almost 38 years data, however, i am interested in only 5 years where we have highest value for the Level. so i want to select only those five years for further analysis.
Is this what you are looking for?
library(tidyverse)
library(lubridate)
Find the maximum value for A in each year, sort in descending order, and extract the top five years to use as a filter to the complete dataset for plotting.
top_yrs <-
FakeData %>%
group_by(Year) %>%
summarise(maxA = max(A)) %>%
arrange(desc(maxA)) %>%
top_n(5)
FakeData %>%
filter(Year %in% top_yrs$Year) %>%
ggplot(aes(JDay, A, col=as.factor(Year)))+
geom_line()+
theme_bw() + labs(col = "Year", x = "Month of the year", y = "Levels (m)")+
theme(axis.text = element_text(size = 14), axis.title = element_text(size = 16, face = "bold"))+
scale_x_continuous(breaks = c(1,32,60,91,121,152,182,213,244,274,305,335),
labels = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),
limits = c(0,365), expand = c(0, 0))
#> Warning: Removed 1 row(s) containing missing values (geom_path).
Created on 2020-07-08 by the reprex package (v0.3.0)
I have an example dataframe composed of:
example dataframe
I have used ggplot2 to plot dates on the x-axis with a count on the y-axis:
df_ggplot <- read.csv("ggplot_ex.csv", header = T, na.strings = "", fileEncoding = "UTF-8-BOM")
df_ggplot$Date <- mdy(df_ggplot$Date)
df_ggplot$Ccount <- as.numeric(as.character(df_ggplot$Ccount))
ggplot(df_ggplot, aes(x=Date, y = Ccount)) +
geom_line() +
geom_point()
ggplot ex output
I am wanting points that occur less than 4 weeks after the previous point to turn red. Can anyone help? In this example, the second point would be red as it occurs about 2 weeks after the previous point.
You probably have to do the calculation in the dataframe before the plot (make sure your Date column is in the correct date format).
One option you can try:
df_ggplot <- df_ggplot %>%
mutate(time_diff = difftime(time1 = Date, time2 = lag(x = Date, n = 1), units = "weeks"),
is_red = as.factor(time_diff < 4))
will give you the points that must be flagged.
Date Ccount time_diff is_red
1 2019-08-17 20000 NA weeks <NA>
2 2019-08-30 15000 1.857143 weeks TRUE
3 2019-09-30 25000 4.285714 weeks FALSE
Then you can plot, using some the colors you want.
ggplot(df_ggplot, aes(x = Date, y = Ccount)) +
geom_line() +
geom_point(aes(color = is_red)) +
scale_color_manual(values = c("black", "red"), na.value = "black")
I am trying to make a heatmap of several years of daily averages of salinity in an estuary in R.
I would like the format to include month on the x-axis and year on the y-axis, so each Jan 1st directly above another Jan. 1st. In other words, NOT like a typical annual calendar style (not like this: http://www.r-bloggers.com/ggplot2-time-series-heatmaps/).
So far I have only been able to plot by the day of the year using:
{r}
d <- read.xlsx('GC salinity transposed.xlsx', sheetName = "vert-3", header = TRUE, stringsAsFactors = FALSE, colClasses = c("integer", "integer", "numeric"), endRow = 2254)
{r}
ggplot(d, aes(x = Day.Number, y = Year)) + geom_tile(aes(fill = Salinity)) + scale_fill_gradient(name = 'Mean Daily Salinity', low = 'white', high = 'blue') + theme(axis.title.y = element_blank())
And get this:
heat map not quite right
Could someone please tell me a better way to do this - a way that would include month, rather than day of the year along the x-axis? Thank you. New to R.
The lubridate package comes in handy for stuff like this. Does this code do what you want? I'm assuming you only have one salinity reading per month and there's no need to average across multiple values in the same month.
library(lubridate)
library(ggplot2)
# Define some data
df <- data.frame(date = seq.Date(from = as.Date("2015-01-01"), by = 1, length.out = 400),
salinity = runif(400, min=5, max=7))
# Create fields for plotting
df$day <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date),
"-",
ifelse(day(df$date)<10,"0",""),
day(df$date))
df$month <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date))
df$year <- year(df$date)
library(lubridate)
library(ggplot2)
# Define some data
df <- data.frame(date = seq.Date(from = as.Date("2015-01-01"), by = 1, length.out = 400),
salinity = runif(400, min=5, max=7))
# Create fields for plotting
df$day <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date),
"-",
ifelse(day(df$date)<10,"0",""),
day(df$date))
df$month <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date))
df$year <- year(df$date)
#Plot results by month
ggplot(data=df) +
geom_tile(aes(x = month, y = year, fill = salinity)) +
scale_y_continuous(breaks = c(2015,2016))
#Plot results by day
ggplot(data=df) +
geom_tile(aes(x = day, y = year, fill = salinity)) +
scale_y_continuous(breaks = c(2015,2016))
Results by month:
Results by day (do you really want this? It's very hard to read with 366 x-axis values):