Convert YYYY-MM-DD to YYYY-YY Qx in R - r

I'm trying to plot data by quarter then display in ggplot. Dates in dataset are of the format YYYY-MM-DD, and I want the ggplot x-axis to display the financial year like YYYY-YY Qx. The financial year starts July 1.
Data is in long format. This is where I've got to:
Data set named: TOX
TREE_ID PM_Date variable value
1: 2013000584 2013-04-02 elm 0
2: 2013000498 2013-06-11 elm 1
3: 2013000123 2013-09-03 maple 0
4: 2013000642 2014-02-15 maple 0
5: 2013000778 2016-07-08 maple 1
PM_Dateq <- as.yearqtr(TOX$PM_Date, format)
Tox_longer_yr <- TOX [,list(value=sum(value)), by=list(PM_Dateq, variable)]
ggplot(Tox_longer_yr, aes(x = PM_Dateq, y = value, colour = variable))
+ geom_line()
The X-axis currently displaying as:
2015, 2016, 2017...etc
(Though it is grouped into quarters in ggplot correctly.)
I want the x-axis to look like:
2015-16 Q3, 2015-16 Q4, 2016-17 Q1, 2016-17 Q2...etc
So an event happening on 2016-02-13 would be grouped into "2015-16 Q3".

How about something like this.
library(lubridate)
df %>%
mutate(
PM_Date = as.Date(PM_Date),
Qtr = sprintf("%s-%s Q%i",
year(PM_Date),
year(PM_Date %m+% years(1)),
cut(
month(tmp$PM_Date),
breaks = c(0, 3, 6, 9, 12),
labels = c("Q3", "Q4", "Q1", "Q2")))) %>%
group_by(Qtr, variable) %>%
summarise(value = sum(value)) %>%
ggplot(aes(x = Qtr, y = value, colour = variable, group = variable)) +
geom_line()
Explanation: We construct a new Qtr variable in the form YYYY-YYYY QX by extracting the year from PM_Date, and binning the months into 3 month bins starting from 1 July using cut. We use lubridate for easy extraction of the year and "date arithmetic" (for the second YYYY we add one year to the current year).
Sample data
df <- read.table(text =
"TREE_ID PM_Date variable value
2013000584 2013-04-02 elm 0
2013000498 2013-06-11 elm 1
2013000123 2013-09-03 maple 0
2013000642 2014-02-15 maple 0
2013000778 2016-07-08 maple 1", header = T)

Related

Plotting monthly average over time of a column grouped by another column - RStudio

I have a dataframe df_have that looks like this:
Arrival_Date Cust_ID Wait_Time_Mins Cust_Priority
<chr> <int> <int> <int>
1 1/01/2010 612345 114 1
2 1/01/2010 415911 146 4
3 1/01/2010 445132 13 2
4 1/01/2010 515619 72 3
5 1/01/2010 725521 155 4
6 1/01/2010 401404 100 5
... ... ... ...
And I want to create five line graphs - 1 for each of the unique values in Cust_Priority -
overlayed on the same plot, such that it is the such that it is the average Wait_Time_Mins by Cust_Priority by month.
How would I do this?
I know how
You can use floor_date to change the date to 1st day of the month. Then for each Cust_Priority in each Month get the average wait time and create a line plot.
We use scale_x_date to format the labels on X-axis.
library(dplyr)
library(lubridate)
library(ggplot2)
df %>%
#If the date is in mdy format use mdy() function to change Arrival_Date to date
mutate(Arrival_Date = dmy(Arrival_Date),
date = floor_date(Arrival_Date, 'month')) %>%
group_by(Cust_Priority, date) %>%
summarise(Wait_Time_Mins = mean(Wait_Time_Mins), .groups = 'drop') %>%
ggplot(aes(date, Wait_Time_Mins, color = factor(Cust_Priority),
group = Cust_Priority)) +
geom_line() +
labs(x = "Month", y = "Average wait time",
title = "Average wait time for each month", color = "Customer Priority") +
scale_x_date(date_labels = '%b - %Y', date_breaks = '1 month')

Add factor column for POSIXct Date format

I have the following df with the Date column having hourly marks for an entire year:
Date TD RN D.RN Press Temp G.Temp. Rad
1 2018-01-01 00:00:00 154.0535 9.035156 1.416667 950.7833 7.000000 60.16667 11.27000
2 2018-01-01 01:00:00 154.5793 9.663900 1.896667 951.2000 6.766667 59.16667 11.23000
3 2018-01-01 01:59:59 154.5793 7.523438 2.591667 951.0000 6.066667 65.16667 11.23500
4 2018-01-01 02:59:59 154.0535 7.994792 2.993333 951.1833 5.733333 64.00000 11.16833
5 2018-01-01 03:59:59 154.4041 6.797526 3.150000 951.4833 5.766667 57.83333 11.13500
6 2018-01-01 04:59:59 155.1051 12.009766 3.823333 951.0833 5.216667 61.33333 11.22167
I want to add a factor column 'Quarters' that indicates each quarter according to the 'Date'.
As far as I understand I can do that by:
Radiation$Quarter<-cut(Radiation$Date, breaks = "quarters", labels = c("Q1", "Q2", "Q3", "Q4"))
But I also want to add a factor column 'Day/Night' which indicates whether it's day or night, having:
Day → 8am - 8pm
Night → 8pm - 8am
It seems like with the cut() function there's no way to indicate time ranges.
You can use an ifelse/case_when statement after extracting hour from time.
library(dplyr)
library(lubridate)
df %>%
mutate(hour = hour(Date),
label = case_when(hour >= 8 & hour <= 19 ~ 'Day',
TRUE ~ 'Night'))
In base R :
df$hour = as.integer(format(df$Date, '%H'))
transform(df, label = ifelse(hour >= 8 & hour <= 19, 'Day', 'Night'))
We can also do
library(dplyr)
library(lubridate)
df %>%
mutate(hour = hour(Date),
label = case_when(between(hour, 8, 19) ~ "Day", TRUE ~ "Night"))

R Coding for ggridges

I am new to coding in R so please excuse the simple question. I am trying to run ggridges geom in R to create monthly density plots. The code is below, but it creates a plot with the months in the wrong order:
The code references a csv data file with 3 columns (see image) - MST, Aeco_5a, and month: Any suggestions on how to fix this would be greatly appreciated. Here is my code:
> library(ggridges)
> read_csv("C:/Users/Calvin Johnson/Desktop/Aeco_Price_2017.csv")
Parsed with column specification:
cols(
MST = col_character(),
Month = col_character(),
Aeco_5a = col_double()
)
# A tibble: 365 x 3
MST Month Aeco_5a
<chr> <chr> <dbl>
1 1/1/2017 January 3.2678
2 1/2/2017 January 3.2678
3 1/3/2017 January 3.0570
4 1/4/2017 January 2.7811
5 1/5/2017 January 2.6354
6 1/6/2017 January 2.7483
7 1/7/2017 January 2.7483
8 1/8/2017 January 2.7483
9 1/9/2017 January 2.5905
10 1/10/2017 January 2.6902
# ... with 355 more rows
>
> mins<-min(Aeco_Price_2017$Aeco_5a)
> maxs<-max(Aeco_Price_2017$Aeco_5a)
>
> ggplot(Aeco_Price_2017,aes(x = Aeco_5a,y=Month,height=..density..))+
+ geom_density_ridges(scale=3) +
+ scale_x_continuous(limits = c(mins,maxs))
This has two parts: (1) you want your months to be factor instead of chr, and (2) you need to order the factors the way we typically order months.
With some reproducible data:
library(ggridges)
df <- sapply(month.abb, function(x) { rnorm(10, rnorm(1), sd = 1)})
df <- as_tibble(x) %>% gather(key = "month")
Then you need to mutate month to be a factor, and use the levels defined by the actual order they show up in the data.frame (unique gives the unique levels in the dataset, and orders them in the way they're ordered in your data ("Jan", "Feb", ...)). Then you need to reverse them, because this way "Jan" will be at the bottom (it's the first factor).
df %>%
# switch to factor, and define the levels they way you want them to show up
# in the ggplot; "Dec", "Nov", "Oct", ...
mutate(month = factor(month, levels = rev(unique(df$month)))) %>%
ggplot(aes(x = value, y = month)) +
geom_density_ridges()

R ggplot by month and values group by Week

With ggplot2, I would like to create a multiplot (facet_grid) where each plot is the weekly count values for the month.
My data are like this :
day_group count
1 2012-04-29 140
2 2012-05-06 12595
3 2012-05-13 12506
4 2012-05-20 14857
I have created for this dataset two others colums the Month and the Week based on day_group :
day_group count Month Week
1 2012-04-29 140 Apr 17
2 2012-05-06 12595 May 18
3 2012-05-13 12506 May 19
4 2012-05-20 14857 May 2
Now I would like for each Month to create a barplot where I have the sum of the count values aggregated by week. So for example for a year I would have 12 plots with 4 bars (one per week).
Below is what I use to generate the plot :
ggplot(data = count_by_day, aes(x=day_group, y=count)) +
stat_summary(fun.y="sum", geom = "bar") +
scale_x_date(date_breaks = "1 month", date_labels = "%B") +
facet_grid(facets = Month ~ ., scales="free", margins = FALSE)
So far, my plot looks like this
https://dl.dropboxusercontent.com/u/96280295/Rplot.png
As you can see the x axes is not as I'm looking for. Instead of showing only week 1, 2, 3 and 4, it displays all the month.
Do you know what I must change to get what I'm looking for ?
Thanks for your help
Okay, now that I see what you want, I wrote a small program to illustrate it. The key to your order of month problem is making month a factor with the levels in the right order:
library(dplyr)
library(ggplot2)
#initialization
set.seed(1234)
sday <- as.Date("2012-01-01")
eday <- as.Date("2012-07-31")
# List of the first day of the months
mfdays <- seq(sday,length.out=12,by="1 month")
# list of months - this is key to keeping the order straight
mlabs <- months(mfdays)
# list of first weeks of the months
mfweek <- trunc((mfdays-sday)/7)
names(mfweek) <- mlabs
# Generate a bunch of event-days, and then months, then week numbs in our range
n <- 1000
edf <-data.frame(date=sample(seq(sday,eday,by=1),n,T))
edf$month <- factor(months(edf$date),levels=mlabs) # use the factor in the right order
edf$week <- 1 + as.integer(((edf$date-sday)/7) - mfweek[edf$month])
# Now summarize with dplyr
ndf <- group_by(edf,month,week) %>% summarize( count = n() )
ggplot(ndf) + geom_bar(aes(x=week,y=count),stat="identity") + facet_wrap(~month,nrow=1)
Yielding:
(As an aside, I am kind of proud I did this without lubridate ...)
I think you have to do this but I am not sure I understand your question:
ggplot(data = count_by_day, aes(x=Week, y=count, group= Month, color=Month))

How to plot values for a subgroup based on time in R?

So currently I have a dataset that looks like this:
yearMon V1
1 012011 2.534161
2 012012 1.818421
3 012013 1.635179
4 012014 1.609195
5 012015 1.794979
6 022011 3.408389
7 022012 1.756303
8 022013 1.577855
9 022014 1.511905
10 022015 1.748879
11 032011 2.664336
12 032012 1.912023
13 032013 1.408602
14 032014 1.646091
15 032015 1.705069
16 042011 2.532895
17 042012 3.342926
18 042013 3.056657
I want to plot the averages for a certain month every year, IE the averages for March 2011, March 2012, March 2013, March 2014 all in one graph, and repeat this for each of the 12 months. How would I go about doing this?
1) monthplot Convert the data to zoo (using "yearmon" class -- we also show in the comments an alternative converter) and then to "ts" class and then use monthplot (in the base of R) with the "ts" object (or further below we use autoplot.zoo (which uses the ggplot2 package) with the zoo object).
library(zoo)
# to_yearmon <- function(x) as.yearmon((x %% 10000) + (x %/% 10000 - 1) / 12)
to_yearmon <- function(x) as.yearmon(sub("(.*)(....)$", "\\2-\\1", x))
ser_zoo <- read.zoo(ser_df, FUN = to_yearmon) # convert to DF to zoo
ser_ts <- as.ts(ser_zoo) # convert zoo to ts
monthplot(ser_ts)
(continued after plot)
2) autoplot.zoo We show how to plot (i) one line per year (2011, 2012, ...) all in one chart and (ii) in separate panels and (iii) one line per month (1, 2, 3, ...) all in one chart and (iv) separate panels.
We create a data frame ser_df2 with 3 columns representing month, year and the value of the series. Then we convert this long form series to a wide form, ser_zoo2 with with times 1, 2, 3, ... representing the months and one column per year. We also convert this long form series to a wide form, ser_zoo2, with times 2011, 2012, ... representing years and one column per month. By plotting each of these in a single panel and in multiple panels we get 2x2 = 4 charts which we show below.
library(ggplot2)
library(gridExtra)
ser_df2 <- data.frame(month = cycle(ser_zoo),
year = floor(as.numeric(time(ser_zoo))),
ser = coredata(ser_zoo))
ser_zoo2 <- read.zoo(ser_df2, index = 1, split = 2) # split into one column per year
p1 <- autoplot(ser_zoo2, facet = NULL)
p2 <- autoplot(ser_zoo2)
ser_zoo3 <- read.zoo(ser_df2, index = 2, split = 1) # split into one column per month
p3 <- autoplot(ser_zoo3, facet = NULL)
p4 <- autoplot(ser_zoo3)
grid.arrange(p1, p3, p2, p4, ncol = 2)
(click on chart to enlarge)
Note: We used this as the input data frame ser_df:
Lines <- "
yearMon V1
1 012011 2.534161
2 012012 1.818421
3 012013 1.635179
4 012014 1.609195
5 012015 1.794979
6 022011 3.408389
7 022012 1.756303
8 022013 1.577855
9 022014 1.511905
10 022015 1.748879
11 032011 2.664336
12 032012 1.912023
13 032013 1.408602
14 032014 1.646091
15 032015 1.705069
16 042011 2.532895
17 042012 3.342926
18 042013 3.056657
"
ser_df <- read.table(text = Lines)
Here is a way to do it more explicitly with ggplot:
library(dplyr)
library(ggplot)
library(lubridate)
data %>%
mutate(date =
yearMon %>%
parse_date_time("%m%y"),
month =
date %>%
format("%B") %>%
ordered(month.name),
year =
date %>%
format("%Y") %>%
as.numeric) %>%
ggplot +
aes(x = year, y = V1, color = month) +
geom_line()

Resources