Creating Calendar df in R - r

I am currently creating a Calendar df to join to my other dfs and originally code it in the following way:
Date <- seq(as.Date("2020-01-01"), as.Date("2021-12-31"), by="days")
Calendar <- data.frame(Date)
Calendar$DateNo <- format(Calendar$Date, format = "%d")
Calendar$NameDay <- format(Calendar$Date, format = "%A")
Calendar$MonthNo <- format(Calendar$Date, format = "%m")
Calendar$NameMonth <- format(Calendar$Date, format = "%B")
Calendar$NameMonthShort <- format(Calendar$Date, format = "%b")
Calendar$Week <- format(Calendar$Date, format = "%V")
Calendar$Year <- format(Calendar$Date, format = "%Y")
Calendar$Quarter <- quarter(Calendar$Date, with_year = F, fiscal_start = 7)
Calendar$Month_Year <-paste(Calendar$NameMonthShort,Calendar$Year,sep="-")
Calendar$Quarter_Year <-paste(Calendar$Quarter,Calendar$Year,sep="-")
After some issues with plotting my data into ggplot I came across an alternate way of creating it using lubridate package with mutate. My new code is as follows:
Date <- seq(as.Date("2020-01-01"), as.Date("2021-12-31"), by="days")
Calendar <- data.frame(Date)
Calendar <- Calendar %>%
mutate(
DateNo = day(Date),
NameDay = wday(Date,label = TRUE),
MonthNo = month(Date),
NameMonth = month(Date, label = TRUE),
NameMonthShort = month(Date, label = TRUE),
Week = week(Date),
Year = year(Date),
Quarter = quarter(Date, with_year = F, fiscal_start = 7))
The issues I am encountering are that I can't add the unabbreviated date/month and not sure if I can add Month_Year/Quarter_Year inside the mutate so that the values are factored in. Is it possible to add those values in or do I have to add them how I did previously? Thanks!

You might find it easier if you use built-in as.POSIXlt, no lubridate needed. Just apply it on your sequence and you'd get a list-type format,
Date <- as.POSIXlt(seq(as.Date("2020-01-01"), as.Date("2020-06-30"), by="7 days"))
## Note: shortened for sake of brevity
that has the desired information already stored in objects that can be accessed by $.
attr(Date, "names")
# [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst"
There are some minor conversions needed due to the storage format, and some helper functions like weekdays, quarters, and strftime. In addition we may use the built-in constants month.name and month.abb.
Calendar <- data.frame(Date,
DateNo=Date$mday,
NameDay=weekdays(Date),
MonthNo=Date$mon + 1,
NameMonth=month.name[Date$mon + 1],
NameMonthShort=month.abb[Date$mon + 1],
Week=strftime(Date, "%V"),
Year=1900 + Date$year,
Quarter=quarters(Date)
)
Result
Calendar
# Date DateNo NameDay MonthNo NameMonth NameMonthShort Week Year Quarter
# 1 2020-01-01 1 Wednesday 1 January Jan 01 2020 Q1
# 2 2020-01-08 8 Wednesday 1 January Jan 02 2020 Q1
# 3 2020-01-15 15 Wednesday 1 January Jan 03 2020 Q1
# 4 2020-01-22 22 Wednesday 1 January Jan 04 2020 Q1
# 5 2020-01-29 29 Wednesday 1 January Jan 05 2020 Q1
# 6 2020-02-05 5 Wednesday 2 February Feb 06 2020 Q1
# 7 2020-02-12 12 Wednesday 2 February Feb 07 2020 Q1
# 8 2020-02-19 19 Wednesday 2 February Feb 08 2020 Q1
# 9 2020-02-26 26 Wednesday 2 February Feb 09 2020 Q1
# 10 2020-03-04 4 Wednesday 3 March Mar 10 2020 Q1
# 11 2020-03-11 11 Wednesday 3 March Mar 11 2020 Q1
# 12 2020-03-18 18 Wednesday 3 March Mar 12 2020 Q1
# 13 2020-03-25 25 Wednesday 3 March Mar 13 2020 Q1
# 14 2020-04-01 1 Wednesday 4 April Apr 14 2020 Q2
# 15 2020-04-08 8 Wednesday 4 April Apr 15 2020 Q2
# 16 2020-04-15 15 Wednesday 4 April Apr 16 2020 Q2
# 17 2020-04-22 22 Wednesday 4 April Apr 17 2020 Q2
# 18 2020-04-29 29 Wednesday 4 April Apr 18 2020 Q2
# 19 2020-05-06 6 Wednesday 5 May May 19 2020 Q2
# 20 2020-05-13 13 Wednesday 5 May May 20 2020 Q2
# 21 2020-05-20 20 Wednesday 5 May May 21 2020 Q2
# 22 2020-05-27 27 Wednesday 5 May May 22 2020 Q2
# 23 2020-06-03 3 Wednesday 6 June Jun 23 2020 Q2
# 24 2020-06-10 10 Wednesday 6 June Jun 24 2020 Q2
# 25 2020-06-17 17 Wednesday 6 June Jun 25 2020 Q2
# 26 2020-06-24 24 Wednesday 6 June Jun 26 2020 Q2

Long month names are easy to add by including abbr=FALSE switch to month().
Pasting quarters or months to years needs a second mutate as below.
Edit Since paste creates character vectors and not factors, you will need to specify factor levels manually:
monthlevels = c(
'Jan-2020','Feb-2020','Mar-2020','Apr-2020','May-2020','Jun-2020',
'Jul-2020','Aug-2020','Sep-2020','Oct-2020','Nov-2020','Dec-2020',
'Jan-2021','Feb-2021','Mar-2021','Apr-2021','May-2021','Jun-2021',
'Jul-2021','Aug-2021','Sep-2021','Oct-2021','Nov-2021','Dec-2021')
quarterlevels = c('1-2020','2-2020','3-2020','4-2020','1-2021','2-2021','3-2021','4-2021')
Calendar %>%
mutate(
DateNo = day(Date),
NameDay = wday(Date,label = TRUE),
MonthNo = month(Date),
NameMonth = month(Date, label = TRUE, abbr=FALSE), ## added abbr=FALSE
NameMonthShort = month(Date, label = TRUE),
Week = week(Date),
Year = year(Date),
Quarter = quarter(Date, with_year = F, fiscal_start = 7)) %>%
## added second mutate() to paste fields created by the first mutate
mutate(
QuarterYear = factor(paste(Quarter, Year, sep='-'), levels=quarterlevels),
MonthYear = factor(paste(NameMonthShort,Year,sep="-"),levels=monthlevels
) %>% head()
Returns:
Date DateNo NameDay MonthNo NameMonth NameMonthShort Week Year Quarter
1 2020-01-01 1 Wed 1 January Jan 1 2020 3
2 2020-01-02 2 Thu 1 January Jan 1 2020 3
3 2020-01-03 3 Fri 1 January Jan 1 2020 3
4 2020-01-04 4 Sat 1 January Jan 1 2020 3
5 2020-01-05 5 Sun 1 January Jan 1 2020 3
6 2020-01-06 6 Mon 1 January Jan 1 2020 3
QuarterYear MonthYear
1 3-2020 Jan-2020
2 3-2020 Jan-2020
3 3-2020 Jan-2020
4 3-2020 Jan-2020
5 3-2020 Jan-2020
6 3-2020 Jan-2020

Related

R: Turn Months into Quarters

I have a dataset that looks like this:
> ex
# A tibble: 10 × 2
tenor delivery_window
<chr> <chr>
1 month Nov 22
2 quarter Jan 22
3 year Cal 24
4 year Cal 22
5 month Feb 22
6 quarter Jan 21
7 month Sep 22
8 quarter Jan 21
9 month Jun 21
10 month Aug 21
And which I want to turn into something like this:
> ex
# A tibble: 10 × 3
tenor delivery_window new_tenor
<chr> <chr> <chr>
1 month Nov 22 Nov 22
2 quarter Jan 22 Q1 22
3 year Cal 24 Cal 24
4 year Cal 22 Cal 22
5 month Feb 22 Feb 22
6 quarter Jan 21 Q1 21
7 month Sep 22 Sep 22
8 quarter Jan 21 Q1 21
9 month Jun 21 Jun 21
10 month Aug 21 Aug 21
That is, if the tenor is quarter, I want to show only the quarter corresponding to the delivery window, not the month. Monthly and Yearly tenors can remain as they are.
Can someone please give me a hint as to how to achieve this? Thank you in advance.
EDIT
The new_tenor should be Q1 YY for months from Jan to Mar, Q2 YY for months from Apr YY to Jun YY, Q3 YY for months from Jul YY to Sep YY, and Q4 YY for months from Oct YY to Dec YY.
We can convert to yearqtr with as.yearqtr (from zoo), and use case_when to replace the elements in 'delivery_window' with the converted value
library(dplyr)
library(stringr)
library(zoo)
ex <- ex %>%
mutate(new_tenor = case_when(tenor == 'quarter'
~ str_replace(as.yearqtr(paste('1', delivery_window),
'%d %b %Y'), "(\\d+) (\\w+)", "\\2 \\1")
, TRUE ~ delivery_window))

Conditional rolling counting function

I would like to implement a rolling count function for the working days in a month. Weekends (Saturday and Sunday) should be assigned a NA.
A replicable example:
#Change language if your are in a non-English location like me
Sys.setlocale("LC_TIME", "C")
workdays <- c("Mon","Tue","Wed","Thu","Fri")
dataset <- data.frame(Date = seq(as.Date("2020-03-01"),as.Date("2020-04-01")-1,"days"))
dataset$Day <- format(dataset$Date,format="%d")
dataset$WeekDay <- format(dataset$Date,format="%a")
dataset$Month <- format(dataset$Date,format="%m")
dataset$Year <- format(dataset$Date,format="%y")
dataset$Workday <- dataset$WeekDay %in% workdays
I wanted to use dplry grouped by the respective month and year to sum conditionally for the working days.
dataset %>%
group_by(Month,Year) %>%
mutate(WorkdayNo = ???)
In my example, the first ten rows should then look like this:
[1] NA 1 2 3 4 5 NA NA 6 7 (...)
cumsum with ifelse should help -
library(dplyr)
dataset %>%
group_by(Month,Year) %>%
mutate(WorkdayNo = if_else(Workday, cumsum(Workday), NA_integer_)) %>%
ungroup
# Date Day WeekDay Month Year Workday WorkdayNo
# <date> <chr> <chr> <chr> <chr> <lgl> <int>
# 1 2020-03-01 01 Sun 03 20 FALSE NA
# 2 2020-03-02 02 Mon 03 20 TRUE 1
# 3 2020-03-03 03 Tue 03 20 TRUE 2
# 4 2020-03-04 04 Wed 03 20 TRUE 3
# 5 2020-03-05 05 Thu 03 20 TRUE 4
# 6 2020-03-06 06 Fri 03 20 TRUE 5
# 7 2020-03-07 07 Sat 03 20 FALSE NA
# 8 2020-03-08 08 Sun 03 20 FALSE NA
# 9 2020-03-09 09 Mon 03 20 TRUE 6
#10 2020-03-10 10 Tue 03 20 TRUE 7
# … with 21 more rows

Assigning Values in R by Date Range

I am trying to create a "week" variable in my dataset of daily observations that begins with a new value (1, 2, 3, et cetera) whenever a new Monday happens. My dataset has observations beginning on April 6th, 2020, and the data are stored in a "YYYY-MM-DD" as.date() format. In this example, an observation between April 6th and April 12th would be a "1", an observation between April 13th and April 19 would be a "2", et cetera.
I am aware of the week() package in lubridate, but unfortunately that doesn't work for my purposes because there are not exactly 54 weeks in the year, and therefore "week 54" would only be a few days long. In other words, I would like the days of December 28th, 2020 to January 3rd, 2021 to be categorized as the same week.
Does anyone have a good solution to this problem? I appreciate any insight folks might have.
This will also do
df <- data.frame(date = as.Date("2020-04-06")+ 0:365)
library(dplyr)
library(lubridate)
df %>% group_by(d= year(date), week = (isoweek(date))) %>%
mutate(week = cur_group_id()) %>% ungroup() %>% select(-d)
# A tibble: 366 x 2
date week
<date> <int>
1 2020-04-06 1
2 2020-04-07 1
3 2020-04-08 1
4 2020-04-09 1
5 2020-04-10 1
6 2020-04-11 1
7 2020-04-12 1
8 2020-04-13 2
9 2020-04-14 2
10 2020-04-15 2
# ... with 356 more rows
Subtract the dates with the minimum date, divide the difference by 7 and use floor to get 1 number for each 7 days.
x <- as.Date(c('2020-04-06','2020-04-07','2020-04-13','2020-12-28','2021-01-03'))
as.integer(floor((x - min(x))/7) + 1)
#[1] 1 1 2 39 39
Maybe lubridate::isoweek() and lubridate::isoyear() is what you want?
Some data:
df1 <- data.frame(date = seq.Date(as.Date("2020-04-06"),
as.Date("2021-01-04"),
by = "1 day"))
Example code:
library(dplyr)
library(lubridate)
df1 <- df1 %>%
mutate(week = isoweek(date),
year = isoyear(date)) %>%
group_by(year) %>%
mutate(week2 = 1 + (week - min(week))) %>%
ungroup()
head(df1, 8)
# A tibble: 8 x 4
date week year week2
<date> <dbl> <dbl> <dbl>
1 2020-04-06 15 2020 1
2 2020-04-07 15 2020 1
3 2020-04-08 15 2020 1
4 2020-04-09 15 2020 1
5 2020-04-10 15 2020 1
6 2020-04-11 15 2020 1
7 2020-04-12 15 2020 1
8 2020-04-13 16 2020 2
tail(df1, 8)
# A tibble: 8 x 4
date week year week2
<date> <dbl> <dbl> <dbl>
1 2020-12-28 53 2020 39
2 2020-12-29 53 2020 39
3 2020-12-30 53 2020 39
4 2020-12-31 53 2020 39
5 2021-01-01 53 2020 39
6 2021-01-02 53 2020 39
7 2021-01-03 53 2020 39
8 2021-01-04 1 2021 1

Apply automation in R to change rows of numbers into date

I have created a simple data.frame of 1 column:
x<-as.data.frame(replicate(1, sample(1:27, 1250, rep=TRUE)))
So x will be a column with repeated values from 1 to 27.
I wish to change these values into dates, eg.
x[x==1]<-"31 June 2018"
x[x==2]<-"1 July 2018"
x[x==3]<-"2 July 2018"
Is there a faster way to do this?
I believe I can do this using apply... but I have not much experience using apply..
Thank you for your suggestions.
Here's one way with as.Date() -
x$date <- as.Date(x$V1, origin = "2018-06-30")
head(x)
V1 date
1 5 2018-07-05
2 19 2018-07-19
3 13 2018-07-13
4 9 2018-07-09
5 10 2018-07-10
6 21 2018-07-21
If you want the format to be as per your post -
x$date <- as.Date(x$V1, origin = "2018-06-30") %>% format("%d %B %Y")
head(x)
V1 date
1 5 05 July 2018
2 19 19 July 2018
3 13 13 July 2018
4 9 09 July 2018
5 10 10 July 2018
6 21 21 July 2018

Using regex and tidyr in R to split column variable on first instance of match

Trying to split a column in an R data frame that has more than one space in the variable, but I want to split on just the first space. An example data frame:
df <- data.frame(game = c(1, 2, 3, 4, 5, 6), date = c("Monday Apr 3", "Tuesday Apr 4", "Wednesday Apr 5", "Thursday Apr 6", "Friday Apr 7", "Saturday Apr 8"))
I'm trying to use tidyr to split the df 'date' column on just the first space so that the day is in its own column:
game day date
1 1 Monday Apr 3
2 2 Tuesday Apr 4
3 3 Wednesday Apr 5
4 4 Thursday Apr 6
5 5 Friday Apr 7
6 6 Saturday Apr 8
The above is the problem. The below is what I've tried and what is going wrong.
By the tidyr documentation, the default value of 'sep' is 'a regular expression that matches any sequence of non-alphanumeric values.' So if I just do:
df %>% separate(date, c("day", "date"))
That will split on the space but it splits on both spaces(e.g. the space after 'Monday' and the space after 'Apr' in 'Monday Apr 3'). The result is:
game day date
1 1 Monday Apr
2 2 Tuesday Apr
3 3 Wednesday Apr
4 4 Thursday Apr
5 5 Friday Apr
6 6 Saturday Apr
Warning message:
Too many values at 6 locations: 1, 2, 3, 4, 5, 6
I can add the regex to select just the first space (and I checked that this regex worked in Sublime Text):
df %>% separate(date, c("day", "date"), sep='^[^\\s]*\\K\\s')
But that gives me:
game day date
1 1 Monday Apr 3 <NA>
2 2 Tuesday Apr 4 <NA>
3 3 Wednesday Apr 5 <NA>
4 4 Thursday Apr 6 <NA>
5 5 Friday Apr 7 <NA>
6 6 Saturday Apr 8 <NA>
Warning message:
Too few values at 6 locations: 1, 2, 3, 4, 5, 6
So what is going wrong? Or how do I make this work? Or what obvious thing am I not understanding?
You need to specify the extra parameter to be merge:
library(tidyr)
df %>% separate(date, c("day", "date"), extra = "merge")
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
We can do this easily using base R
cbind(df[1], read.csv(text=sub("\\s+", ",", df$date),
header=FALSE, col.names = c("day", "date")))
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
Or another option is extract from tidyr
library(tidyr)
extract(df, date, into = c("day", "date"), "(\\S+)\\s+(.*)")
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
Psidom has you covered with regard to your first warning message about too many values. With regard to your second approach where you ended up with too few values, that's in part because \\K doesn't work with stringi, which is what is being used by separate. You can check for yourself with stringi::stri_split_regex(df$date, '^[^\\s]*\\K\\s'). So, you don't get any splits with that regex, and you end up with the warning message about too few values.
You could specify sep as
# a space not followed by a digit
df %>% separate(date, c("day", "date"), sep = "\\s(?!\\d)")
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
Some alternatives regular expressions:
You can't use \\K, but if you need to use a variable-length look-behind, the quantifier needs to be bounded:
# a space preceded by 3 - 6 characters and "day".
# 3 - 6 characters allows "Monday" and "Wednesday"
"(?<=.{3,6}day)\\s"
# same idea
"(?<=\\S{3,6}day)\\s"
# same idea
"(?<=.?.?.?...day)\\s"
# same idea, but using ^ to anchor and not using "day"
"(?<=^\\S{0,9})\\s"
# space followed by some other characters, a space, digit(s) and the end of the line
"\\s(?=.+\\s\\d+$)"

Resources