Detect UK holidays in date data - r

I am working in R, I have 20 years data and I would to check if every giving date is a UK holiday creating a categorical variable (TRUE/FALSE).
I used this code:
library(timeDate)
c <- timeDate(data$Date)
b <- isHoliday(c, holidays = GBBankHoliday(), wday = 1:6)
or
b <- isHoliday(c, holidays = HolidayLONDON(), wday = 1:6)
but it detecs only Sundays (not Christmass or other Holidays).
Does anyone have an idea what to do?

You can try creating wrapper functions for various holidays in the package, and extracting the dates for the holidays, and cross-referencing those dates for your analysis:
library(timeDate)
c <- timeDate(data$Date)
b <- isHoliday(c, holidays = GBBankHoliday(), wday = 1:6)
years <- list(Year = c(2019,2018,2017,2016))
year_fun <- function(year){timeDate::.easter(year)}
purrr::map(years, year_fun)
$Year
GMT
[1] [2019-04-21] [2018-04-01] [2017-04-16] [2016-03-27]

I created a new binary variable in my data which is called "holiday". If the date of my data is a UK holiday the value is 1 (TRUE) if the date is not a holiday the value is 2 (FALSE). The code is very simple:
library(timeDate)
data$holidays<-as.factor(data$Date %in% (as.Date(holidayLONDON(1990:2010))))

Related

NA output usin add.bizdays in R

I create a custom calendar using create.calendar function from bizdays package and a vector with all 2023 year days in vector Date_A. I need to add 8 working days to every day and store it in Date_B, for that reason I used add.bizdays function but the output I get NA values for last 19 values but they should be valid dates, for example for "2023-12-13" in Date_A the output in Date_B should be "2023-12-26" but gives an NA. How can I achieve a valid result instead of that misssing values?
The code is as follows:
library(bizdays)
library(lubridate)
feriados_CR <- as.Date(c(
"2023-01-01","2023-01-02","2023-01-03","2023-01-04","2023-01-05","2023-01-06","2023-04-06",
"2023-04-07","2023-04-10","2023-05-01","2023-07-24","2023-08-02","2023-08-14","2023-09-03",
"2023-09-15","2023-12-01","2023-12-25"))
calendar <- create.calendar("Costa_Rica",
holidays = feriados_CR,
weekdays = c("saturday", "sunday"))
Date_A <- seq(ymd("2023-01-01"), ymd("2023-12-31"), by = "days")
Date_B <- add.bizdays(Date_A, 8, "Costa_Rica")

I need help writing a function to count the number of holidays within a time period using lubridate in R

I am attempting to write a function that counts the number of holidays a person worked in my organization between their start and term date in the year 2017. My organization recognized 6 holidays that year-
New Years Day- 2017-01-02
Memorial Day- 2017-05-29
Independence Day - 2017-07-04
Labor Day - 2017-09-04
Thanksgiving Day- 2017-11-23
Christmas day - 2017-12-25
I used lubridate to combine my year-month-day columns into complete dates using lubridate and dyplr like so:
dates<- data %>% mutate("Term Date" = make_date(month = `Term Month`,
day = data$`Term Day`,
year =data$`Term Year`),
"Start Date"= make_date(month = data$`Start Month`,
day = data$`Start Day`,
year = data$`Start Year`))
I then went on to attempt to write my function.
holidays <- function(x){
z<- 0
if( ymd("2017-01-01") %within% interval(dates$`Start Date`, dates$`Term Date`)){
z <- z + 1
}
print(z)
}
This was only my first step. My goal was to first make my function work for new years and then continue to build in other holidays step by step using if statements.I was unable to get the apply function to work correctly and am unsure if my function even works. I attempted to apply the function like so :
apply(dates,2,holidays)
But got an error argument.
Does anyone have any advice?
Putting the holidays in a vector:
holidays <- as.Date(c('2017-01-02', '2017-05-29', '2017-07-04', '2017-09-04', '2017-11-23', '2017-12-25'))
Extracting month and day (to make it independent of year), "%j" stands for day of year:
holidays <- format(as.Date(holidays), "%j")
Generating some random data to test (1000 uniformly distributed work entries in 2017, 5 employees):
d <- data.frame(
'date' = as.Date(as.integer(runif(1000, 17167, 17531)), origin = '1970-01-01'),
'emp' = sample(LETTERS[1:5], 1000, replace = T)
)
Filtering out the holidays:
h <- d[format(d$date, "%j") %in% holidays, ]
Counting number of holidays worked per employee using aggregate():
aggregate(h$date, list(h$emp), length)
# Group.1 x
#1 A 3
#2 B 4
#3 C 2
#4 D 5
#5 E 1
NB: will work for 2017, but won't work for leap years (one workaround that doesn't involve altering the code too too much is to change the year in the holiday vector manually).

Format historical data for forecasting with calendar variables

I have hourly time series data for the year 2015. This data corresponds to power consumption of a big commercial building. I want to use this data to predict the usage for the year 2016. To develop a forecasting model, I need to format this data in a suitable format.
I am planning to use following features to predict the 2016 usage: (1) day of week, (2) time of the day (3) temperature, (4) year 2015 usage.
I am able to create the first 3 features but the fourth one seems tricky.
How should I arrange the 2015 data so that for a particular day of 2016 I can use the corresponding day data of year 2015. My concern is :
I should not use the weekend day data of 2015 to predict the usage of working day
There are some days in 2015, where data is missing for entire day data. For the corresponding day in 2016, how should I account for these missing readings
Here, I have created dummy data corresponding to the year 2015 and 2016.
library(xts)
set.seed(123)
seq1 <- seq(as.POSIXct("2015-01-01"),as.POSIXct("2015-12-31"), by = "hour")
data1 <- xts(rnorm(length(seq1),150,5),seq1)
seq2 <- seq(as.POSIXct("2016-01-01"),as.POSIXct("2016-09-30"), by = "hour")
data2 <- xts(rnorm(length(seq2),140,5),seq2)
Let me give an example to clarify my problem:
Suppose model is: lm( output ~ dayofweek + timeofday + temperature + lastyearusage, data = xxx)
Now suppose I want to predict the usage on 2 oct 2016(dayY), using the lastyearusage onm2 oct 2015(dayX). In this step, issue is 1) How should I ensure thatdayX is not a weekend day if dayY is a working day. I am sure that in this case if I use dayX to predict dayY without keeping a check on day type output will get messy.
There might be already a function in a package to do this, but post here a custom function to add all these kinds of calendar variables (including the week-end info) to a data.frame containing a date/hour column. Fake data:
df <- data.frame(datetime=seq(as.POSIXlt("2013/01/01 00:00:00"), as.POSIXlt("2013/12/31 23:00:00"), by="hour"), variable=rnorm(8760))
#### datetime variable
#### 1 2013-01-01 00:00:00 1.68959052
#### 2 2013-01-01 01:00:00 0.02023722
#### 3 2013-01-01 02:00:00 -0.42080942
The code for the function:
CreateCalendarVariables = function(df, id_column=NULL) {
df <- data.frame(df)
if (is.null(id_column)) stop("Id column for the datetime variable is a mandatory argument")
temp <- df[, id_column]
if ( !(class(temp)[1] %in% c("Date", "POSIXct", "POSIXt", "POSIXlt")) ){
stop("the indicated datetime variable doesn't have the suitable format")
}
require(lubridate)
df['year'] <- year(temp)
df['.quarter'] <- quarter(temp)
df['.month'] <- month(temp)
df['.week'] <- week(temp)
df['.DMY'] <- as.Date(temp)
df['.dayinyear'] <- yday(temp)
df['.dayinmonth'] <- mday(temp)
df['.weekday'] <- wday(temp, label=T, abbr=FALSE) %>% factor(., levels=levels(.)[c(2,3,4,5,6,7,1)])
df['.is_we'] <- df$.weekday %in% c("Saturday", "Sunday")
if(class(temp)[1] != "Date"){
df['.hour'] <- factor(hour(temp))
}
return(df)
}
Then you just have to specify the N° of column containing the date format. If you need for your model these variables in factor format, feel free to adapt the code:
CreateCalendarVariables(df, 2)
#### Error in CreateCalendarVariables(df, 2) :
#### the indicated datetime variable doesn't have the suitable format
CreateCalendarVariables(df, 1)
#### datetime variable year .quarter .month .week .DMY .dayinyear .dayinmonth .weekday .is_we .hour
#### 1 2013-01-01 00:00:00 1.68959052 2013 1 1 1 2012-12-31 1 1 Tuesday FALSE 0
#### 2 2013-01-01 01:00:00 0.02023722 2013 1 1 1 2013-01-01 1 1 Tuesday FALSE 1
To answer your last question, If an entire level is missing from the calibration dataset (i.e. one whole weed and you're using .Week as a predictor), you 'll need to impute the data first.

Convert Julian date to calendar dates within a data frame

I have a data frame
> df
Age year sex
12 80210 F
13 9123 M
I want to convert the year 80210 as 26june1982. How can I do this that the new data frame contains year in day month year formate from Julian days.
You can convert Julian dates to dates using as.Date and specifying the appropriate origin:
as.Date(8210, origin=as.Date("1960-01-01"))
#[1] "1982-06-24"
However, 80210 needs an origin pretty long ago.
You should substract the origin from the year column.
as.Date(c(80210,9123)-80210,origin='1982-06-26')
[1] "1982-06-26" "1787-11-08"
There are some options for doing this job in the R package date.
See for example on page 4, the function date.mmddyy, which says:
Given a vector of Julian dates, this returns them in the form “10/11/89”, “28/7/54”, etc.
Try this code:
age = c(12,13)
year = c(8210,9123)
sex = c("F","M")
df = data.frame(cbind(age,year,sex))
library(date)
date = date.mmddyy(year, sep = "/")
df2 = transform(df,year=date) #hint provided by jilber
df2
age year sex
1 12 6/24/82 F
2 13 12/23/84 M

Set day of week to be used by to.weekly

I am trying to convert a time series of daily data (only business days) contained in an xts object into a time series of weekly data. Specifically, I want the resulting time series to contain the end of week entries (meaning last business day of a week) of the original data. I've been trying to achieve this using the function to.weekly of the xts package.
In the discussion regarding another question (Wrong week-ending date using 'to.weekly' function in 'xts' package) the below example code achieved exactly what I need. However, when I run the code, to.weekly uses Mondays as a representative for the weekly data.
I am wondering which global setting might allow me to force to.weekly to use Friday as a week's representative.
Example code:
library(lubridate); library(xts)
test.dates <- seq(as.Date("2000-01-01"),as.Date("2011-10-01"),by='days')
test.dates <- test.dates[wday(test.dates)!=1 & wday(test.dates)!=7] #Remove weekends
test.data <- rnorm(length(test.dates),mean=1,sd=2)
test.xts <- xts(x=test.data,order.by=test.dates)
test.weekly <- to.weekly(test.xts)
test.weekly[wday(test.weekly, label = TRUE, abbr = TRUE) != "Fri"]
test.dates <- test.dates[wday(test.dates)==6]
tail(wday(test.dates, label = TRUE, abbr = TRUE))
#[1] Fri Fri Fri Fri Fri Fri
#Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
OK. With the unstated requirements added to the problem:
require(timeDate)
require(lubridate)
startDate <- as.Date("2000-01-03")
endDate <- as.Date("2011-10-01")
AllDays <- as.timeDate(seq(startDate, endDate, by="day"))
is.wrk <- isBizday(AllDays, holidays = holidayNYSE(), wday = 1:5)
is.wrkdt <- as.Date(names(is.wrk)[is.wrk])
endweeks <- tapply(is.wrkdt, paste(year(is.wrkdt),week(is.wrkdt), sep = ""), max)
head(as.Date(endweeks, origin="1970-01-01"))
# 1 2 3 4 5 6
#"2011-01-06" "2011-01-13" "2011-01-20" "2011-01-27" "2011-02-03" "2011-02-10"
So you want:
as.Date(endweeks, origin="1970-01-01")
I had the same problem and I found a two-lines solution.
You need first to retain only business days (if your data set also contains holidays):
test.dates <- test.dates[ wday(dates) %in% c(2:6) ]
Then you have two alternatives. First, you can use to.weekly() which retains the most recent business day, i.e. not necessarily constrained to wday(test.dates)==6
test.weekly <- to.weekly(test.xts)
Or you can use the function endpoints() which works on multi-columns xts objects and deals much better with NA's because it does not remove missing data (preventing the warning "missing values removed from data")
test.weekly <- test.xts[endpoints(test.xts,on='weeks')[-1],]

Categories

Resources