I am trying to calculate business days between two days. Successfully, I calculated the days without Saturday and Sunday from this question(Calculate the number of weekdays between 2 dates in R), and now I am trying to implement national holidays into this code. How can I add national holidays into here?
I used this code to calculate weekdays.
Nweekdays <- function(a, b) {
sum(!weekdays(seq(a, b, "days")) %in% c("Saturday", "Sunday"))}
Updated your function a bit so holidays can be added...
Nweekdays <- function(a, b, holidays, weekend) {
possible_days <- seq(a, b, "days")
# Count all days that are not weekend and
# are not holidays
sum(!weekdays(possible_days) %in% weekend & !possible_days %in% holidays)
}
weekend <- c("Saturday", "Sunday")
holidays <- as.Date(c("2017-12-31", "2017-12-24", "2017-07-04"))
Nweekdays(as.Date("2017-08-01"), as.Date("2017-12-31"), holidays, weekend)
[1] 109
While the Gregorian calendar is pretty global, the definition of weekend and holidays is dependent on country, region, etc.
Having some issues with the bizdays package, I came across this solution. I have tweaked the solutions in two way's, one is an answer to the error Marie from the comments is experiencing.
First improvement:
weekend <- c("Saturday", "Sunday") is language dependent, so I changed it to the wday function and use numbers to reference days. Further I added the default to make saturdays and sundays the free days and an option to include the last date or not:
library(lubridate) ## lubridate for wday function
CountWorkdays <- function(from, to, holidays = c(), free = c(7,1), include_last = FALSE) {
# Create list of all days
possible_days <- seq(from, to, "days")
# Include last? If not, remove last item.
if (!include_last) {
possible_days <- possible_days[-length(possible_days)]
}
# Count all days that are not weekend and are not holidays
return(sum(!wday(possible_days) %in% free & !possible_days %in% holidays))
}
Second improvement: If you want to use this function on a dataframe you can use mapply, sapply or equivalent functions, but you can also vectorise the function and make it accept vectors (then it is also usable in dplyr::mutate function). It is important to set which arguments are considered vectors and which are not, I choose the from and to dates to be vectorised, others are considered equal for every row. (a situation where this might not be the case is when you consider contract working days per row for people the work less then five days a week).
CountWorkdaysV <- Vectorize(CountWorkdays, c("from", "to"))
This last adjustment seems to work, but I am not really sure about performance impacts so check before you adopt this function.
Hope this helps somebody who stumbles upon this older question via Google like I did.
2019, 2020, and 2021 US Federal Holidays from https://www.opm.gov/policy-data-oversight/pay-leave/federal-holidays/
holidays <- as.Date(c("2019-01-01", "2019-01-21", "2019-02-18", "2019-05-27", "2019-07-04", "2019-09-02", "2019-10-14", "2019-11-11", "2019-11-28", "2019-12-25",
"2020-01-01", "2020-01-20", "2020-02-17", "2020-05-25", "2020-07-03", "2020-09-07", "2020-10-12", "2020-11-11", "2020-11-26", "2020-12-25",
"2021-01-01", "2021-01-18", "2021-01-20", "2021-02-15", "2021-05-31", "2021-06-18", "2021-07-05", "2021-09-06", "2021-10-11", "2021-11-11", "2021-11-25", "2021-12-24"))
Example Use:
CountWorkdaysV(as.Date("2021-01-15"), as.Date("2021-01-31"), holidays = holidays, include_last = TRUE)
# 9 days
Related
I have a data set that has event dates at multiple locations:
year<-rep(2010:2021,3)
month<-rep(1:3,12)
loc<-rep(letters[1:3],each=12)
event_date<-as.Date(paste(year,month, "01"), "%Y%m%d")
event_data<-data.frame(loc,year,event_date)
I want to expand the data set so for each month of the year I have a time since event variable for each location (preferably in months but days is fine). I have tried below but there are minus values at the change of the years and I want to continue on with the time since event until the event in the follow year (no minus values)
months<-expand.grid(year=unique(year),month=1:12)
month_data<-left_join(event_data, months, by = "year")
month_data$date<-as.Date(paste(month_data$year,month_data$month, "01"), "%Y%m%d")
month_data$diff<-month_data$date-month_data$event_date
May not be the neatest but this has done the trick. Instead of expand.grid I have written a loop that creates a sequence of dates by month between each event date :
res<-list()
r_list<-list()
for(i in unique(event_data$loc)) {
event_data1<-event_data[event_data$loc==i,] # outer loop splitting dataset by location
for (j in 1:length(event_data1$event_date)) {
if (is.na(event_data1$event_date[j+1]) == TRUE) {
dates<-as.Date(seq(event_data1$event_date[j],(event_data1$event_date[j]+365), by = 'month')) # seq[j+1] fails as there is no date after the last one. Adding a years worth of months
dates<-head(dates, -1)
camp<-rep(year(dates)[1], length(dates))
diff<- 0:(length(camp)-1)
loc<-rep(event_data1$loc[j], length(dates))
r_list[[j]]<-data.frame(dates,camp,diff,loc)
}
else {dates<-as.Date(seq(event_data1$event_date[j],event_data1$event_date[j+1], by = 'month')) # sequence from one event to another by month
dates<-head(dates, -1) # the last date in the seq is in fact the next event so needs removed
camp<-rep(year(dates)[1], length(dates)) # year of event
diff<- 0:(length(camp)-1) # Months since event
loc<-rep(event_data1$loc[j], length(dates)) # location
r_list[[j]]<-data.frame(dates,camp,diff,loc)
}
}
res<-c(res,r_list)
}
data_new<-do.call("rbind",(res))
I have some experimental data on CO2 values over a few days in a room which are time and date-stamped. I would like to break it up into a series of "experiments" based on an experiment list of when each experiment happened.
e.g.
Data
df<-data.frame(CO2.ppm.=runif(10), Date.time.=as.POSIXct(" 2019-2-08 07:00:00") + runif(n=10, min=0, max=3600))
List of experiments with start and stop times:
ExpertimentList<- data.frame(StartTime=c("2019-2-08 07:10:00", "2019-2-08 07:15:00", "2019-2-08 08:30:00"), StopTime=c("2019-2-08 07:12:00","2019-2-08 07:16:00","2019-2-08 08:15:00"),ExptID=c(1,2,3))
Note there is time when CO2 is measured but no experiment is happening. E.g. between 07:12:00 and 07:15:00.
I would like to split df$Date.time. by ExperimentList's StartTime and StopTime
So far I've converted everything to integers
df$Date.time.<-as.integer(df$Date.time.)
ExperimentList$StartTime<-as.integer(ExperimentList$StartTime
ExperimentList$StopTime<-as.integer(ExperimentList$StopTime)
Then looking at cut
breakz<-dplyr::arrange(paste(Experiment_List$StartTime,Experiment_List$StopTime)%>%as_tibble())
cut(df$Dev.Date.Time,breaks=unique(breakz$value))
But I can't filter out the data when no experiment was taking place. Any thoughts are much appreciated.
Expected output:
set.seed(143)
data.frame(CO2.ppm.=runif(10), Date.time.=sort(as.POSIXct(" 2019-2-08 07:00:00") + runif(n=10, min=0, max=3600)),ExptID=c(NA,NA,NA,1,NA,NA,NA,NA,NA,NA))
ANSWER:
I found that I would run out of memory with #Ronak's answer so I chunked the data.frame into 10000 row segments:
df<-split(df, (as.numeric(rownames(df))-1) %/% 10000)
Then based on #Ronak's answer, I popped the code into a function and used mclapply from the parallel package.
#Do a left join to remove any rows not belonging to an experiment
fuzzyJoinFunction<-function(a){
a<-fuzzy_left_join(a, Experiment_List,
by = c('Dev.Date.Time' = 'StartTime', 'Dev.Date.Time'= 'StopTime'),
match_fun = c(`>=`, `<=`))
a
}
df<-rbindlist(mclapply(X=df,FUN=fuzzyJoinFunction,mc.cores=4))
We can use fuzzyjoin::fuzzy_inner_join to keep only the rows which are in range.
library(dplyr)
library(fuzzyjoin)
#All the datetime values should be of type POSIXct.
ExpertimentList %>%
mutate(across(c(StartTime, StopTime), lubridate::ymd_hms)) -> ExpertimentList
fuzzy_inner_join(df, ExpertimentList,
by = c('Date.time.' = 'StartTime', 'Date.time.'= 'StopTime'),
match_fun = c(`>=`, `<=`))
To get all df values in the final output with NA for ExptID use fuzzy_left_join.
I´m trying to run a function even though im not quite sure if this is the correct answer. Im new to Rstudio and im trying to get count of Number of paid invoices prior to the creation date of a new invoice of each customer and another column of Number of invoices which were paid late
prior to the creation date of a new invoice of each customer
My data:
set.seed(123)
names<- rep(LETTERS[1:2], each = 16)
id<- seq(1,32)
daysp<- runif(1:32,1,32)
startdate <-c("20-02-2018","01-03-2018","13-03-2018","20-03-2018","28-03-2018","05-04-2018","10-04-2018","13-04-2018",
"16-04-2018","19-04-2018","04-05-2018","14-05-2018","23-05-2018","04-06-2018","12-06-2018","19-06-2018",
"26-04-2018","02-05-2018","07-05-2018","07-05-2018","07-05-2018","14-05-2018","29-05-2018","12-06-2018",
"12-06-2018","18-06-2018","11-07-2018","11-07-2018","17-07-2018","30-07-2018","03-08-2018","07-08-2018")
startdate<-as.Date(startdate,"%d-%m-%Y" )
paydate<- startdate + daysp
class <- c("Payed", "Payed","Payed", "Delayed","Payed", "Delayed","Delayed", "Delayed","Payed", "Delayed",
"Payed", "Delayed","Payed", "Delayed","Payed", "Delayed","Payed", "Delayed","Payed", "Delayed",
"Payed", "Delayed","Payed", "Delayed","Payed", "Delayed","Delayed", "Delayed","Payed", "Delayed",
"Payed", "Delayed")
df<-data.frame(names,id,daysp,startdate,paydate,class)
My expected result looks like this:
nopip<-c(0,0,1,1,3,3,4,4,4,5,7,10,10,12,12,14,0,0,2,2,2,2,3,6,6,6,9,9,10,12,13,14)
nopip_delayed<-c(0,0,0,0,0,0,1,1,1,2,3,5,5,6,6,6,0,0,1,1,1,1,1,3,3,3,4,4,5,6,7,8)
like this Dataframe
df<-cbind(df,nopip,nopip_delayed)
Thanks in advance
There are several ways to accomplish this, but here is one using base R which is good to understand for building a foundation to expand.
This uses lapply to step through the data.frame and check if the names match that row along with the pay date being prior to the start date.
df$nopip2 <- lapply(seq_len(nrow(df)), function(x) sum(df$names == df$names[x] & df$paydate < df$startdate[x]))
This does the same sequence as the previous function, but adds an additional check if the class was delayed.
df$nopip_delayed2 <- lapply(seq_len(nrow(df)), function(x) sum(df$names == df$names[x] & df$paydate < df$startdate[x] & df$class == 'Delayed'))
Confirming calculated results are same as desired output
> setequal(df$nopip, df$nopip2)
[1] TRUE
> setequal(df$nopip_delayed, df$nopip_delayed2)
[1] TRUE
Added example to sum the daysp with respective nopip
df$nopip_daysp <- lapply(seq_len(nrow(df)), function(x) sum((df$names == df$names[x] & df$paydate < df$startdate[x]) * df$daysp))
As a side note iterating through a data.frame is an expensive option if the number of rows is large. However, using the steps above will be an easy transition if that time arises.
Im tring to use the bizdays package to generate a vector with bus days between two dates.
fer = as.data.frame(as.Date(fer[1:938]))
#Define default calendar
bizdays.options$set(default.calendar=fer)
dt1 = as.Date(Sys.Date())
dt2 = as.Date(Sys.Date()-(365*10)) #sample 10 year window
#Create date vector
datas = bizseq(dt2, dt1)
i get this error: "Error in bizseq.Date(dt2, dt1) : Given date out of range."
the same behavior for any function bizdays et al.
any ideas?
I had a similar problem, but could not apply the accepted answer to my case. What worked for me was to make sure that the first and last holiday in the vector holidays at least covers (or exceeds) the range of dates provided to bizdays():
library(bizdays)
This works (from_date and to_date both lie within the first and last holiday provided by holidays):
holidays <- c("2016-08-10", "2016-08-13")
from_date <- "2016-08-11"
to_date <- "2016-08-12"
cal <- Calendar(holidays, weekdays=c('sunday', 'saturday'))
bizdays(from_date, to_date, cal)
#1
This does not work (to_date lies outside of the last holiday of holidays):
holidays <- c("2016-08-10", "2016-08-11")
from_date <- "2016-08-11"
to_date <- "2016-08-12"
cal <- Calendar(holidays, weekdays=c('sunday', 'saturday'))
bizdays(from_date, to_date, cal)
# Error in bizdays.Date(from, to, cal) : Given date out of range.
If fer is the holidays, you can try with:
bizdays.options$set(default.calendar=Calendar(holidays=fer))
I am new to R and struggling with the fact that functions are able to operate on whole vectors without having to explicitly specify this.
My goal
I have a data frame calls with multiple columns, one of which is a “date” column. Now I want to add a new column, “daytime”, that labels the daytime the particular entry’s date falls into:
> calls
call_id length date direction daytime
1 258 531 1400594572974 outgoing afternoon
2 259 0 1375555528144 unanswered evening
3 260 778 1385922648396 incoming evening
What I have done so far
I have already implemented methods that return a vector of booleans like that:
# Operates on POSIXlt timestamps
is.earlymorning <- function(date) {
hour(floor_date(date, "hour")) >= 5 & hour(floor_date(date, "hour")) < 9
}
The call is.earlymorning(“2014-05-20 16:02:52”, “2013-08-03 20:45:28”, “2013-12-01 19:30:48”) would thus return (“FALSE”, “FALSE”, “FALSE”). What I am currently struggling with is to implement a function that actually returns labels. What I would like the function to do is the following:
# rawDate is a long value of the date as ms since 1970
Daytime <- function(rawDate) {
date <- as.POSIXlt(as.numeric(rawDate) / 1000, origin = "1970-01-01")
if (is.earlymorning(date)) {
"earlymorning"
} else if (is.morning(date)) {
"morning"
} else if (is.afternoon(date)) {
"afternoon"
} else if (is.evening(date)) {
"evening"
} else if (is.earlynight(date)) {
"earlynight"
} else if (is.latenight(date)) {
"latenight"
}
}
The problem
Obviously, my above approach does not work since the if-conditions would operate on whole vectors in my example. Is there an elegant way to solve this problem? I am sure I am confusing or missing some important points, but as I mentioned I am pretty new to R.
In short, what I want to implement is a function that returns a vector of labels according to a vector of date values:
# Insert new column with daytime labels
calls$daytime <- Daytime(df$date)
# or something like that:
calls$daytime <- sapply(df$date, Daytime)
# Daytime(1400594572974, 1375555528144, 1385922648396) => (“afternoon”, “evening”, “evening”)
One approach would be to use cut rather than ifelse. I am not entirely sure how you want to label hours, but this will give you the idea. foo is your data (i.e., calls).
library(dplyr)
# Following your idea
ana <- transform(foo, date = as.POSIXlt(as.numeric(date) / 1000, origin = "1970-01-01"))
ana %>%
mutate(hour = cut(as.numeric(format(date, "%H")),
breaks = c(00,04,08,12,16,20,24),
label = c("late night", "early morning",
"morning", "afternoon",
"evening", "early night")
)
)
# call_id length date direction daytime hour
#1 258 531 2014-05-20 23:02:52 outgoing afternoon early night
#2 259 0 2013-08-04 03:45:28 unanswered evening late night
#3 260 778 2013-12-02 03:30:48 incoming evening late night
There is no need to have 6 different functions to establish which period of the day a given date is. It suffices to define a vector which matches the hour with the daytime. For instance:
Daytime<-function(rawDate) {
#change the vector according to your definition of the daytime.
#the first value corresponds to hour 0 and the last to hour 23
hours<-c(rep("latenight",5),rep("earlymorning",4),rep("morning",4),rep("afternoon",4),rep("evening",4),rep("earlynight",3))
hours[as.POSIXlt(as.numeric(rawDate) / 1000, origin = "1970-01-01")$hour+1]
}
Given Thomas' hint, I solved my problem in the following (addmittedly unelegant) way:
Daytime <- function(rawDates) {
dates <- as.POSIXlt(as.numeric(rawDates) / 1000, origin = "1970-01-01")
ifelse(is.earlymorning(dates), "earlymorning",
ifelse(is.morning(dates), "morning",
ifelse(is.afternoon(dates), "afternoon",
ifelse(is.evening(dates), "evening",
ifelse(is.earlynight(dates), "earlynight",
ifelse(is.latenight(dates), "latenight",
"N/A")
)
)
)
)
)
}
Considering a case with more labels this approach will get unmaintainable soon. Right now it serves my purposes and I will leave it at that since I must focus on analysing the data as soon as possible. But I will let you know if I had time left and found a less complicated solution! Thank you for your quick response, Thomas.