Set day of week to be used by to.weekly - r

I am trying to convert a time series of daily data (only business days) contained in an xts object into a time series of weekly data. Specifically, I want the resulting time series to contain the end of week entries (meaning last business day of a week) of the original data. I've been trying to achieve this using the function to.weekly of the xts package.
In the discussion regarding another question (Wrong week-ending date using 'to.weekly' function in 'xts' package) the below example code achieved exactly what I need. However, when I run the code, to.weekly uses Mondays as a representative for the weekly data.
I am wondering which global setting might allow me to force to.weekly to use Friday as a week's representative.
Example code:
library(lubridate); library(xts)
test.dates <- seq(as.Date("2000-01-01"),as.Date("2011-10-01"),by='days')
test.dates <- test.dates[wday(test.dates)!=1 & wday(test.dates)!=7] #Remove weekends
test.data <- rnorm(length(test.dates),mean=1,sd=2)
test.xts <- xts(x=test.data,order.by=test.dates)
test.weekly <- to.weekly(test.xts)
test.weekly[wday(test.weekly, label = TRUE, abbr = TRUE) != "Fri"]

test.dates <- test.dates[wday(test.dates)==6]
tail(wday(test.dates, label = TRUE, abbr = TRUE))
#[1] Fri Fri Fri Fri Fri Fri
#Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
OK. With the unstated requirements added to the problem:
require(timeDate)
require(lubridate)
startDate <- as.Date("2000-01-03")
endDate <- as.Date("2011-10-01")
AllDays <- as.timeDate(seq(startDate, endDate, by="day"))
is.wrk <- isBizday(AllDays, holidays = holidayNYSE(), wday = 1:5)
is.wrkdt <- as.Date(names(is.wrk)[is.wrk])
endweeks <- tapply(is.wrkdt, paste(year(is.wrkdt),week(is.wrkdt), sep = ""), max)
head(as.Date(endweeks, origin="1970-01-01"))
# 1 2 3 4 5 6
#"2011-01-06" "2011-01-13" "2011-01-20" "2011-01-27" "2011-02-03" "2011-02-10"
So you want:
as.Date(endweeks, origin="1970-01-01")

I had the same problem and I found a two-lines solution.
You need first to retain only business days (if your data set also contains holidays):
test.dates <- test.dates[ wday(dates) %in% c(2:6) ]
Then you have two alternatives. First, you can use to.weekly() which retains the most recent business day, i.e. not necessarily constrained to wday(test.dates)==6
test.weekly <- to.weekly(test.xts)
Or you can use the function endpoints() which works on multi-columns xts objects and deals much better with NA's because it does not remove missing data (preventing the warning "missing values removed from data")
test.weekly <- test.xts[endpoints(test.xts,on='weeks')[-1],]

Related

add_months function in Spark R

I have a variable of the form "2020-09-01". I need to increase and decrease this by 3 months and 5 months and store it in other variables. I need a syntax in Spark R.Thanks. Any other method will also work.Thanks, Again
In R following code works fine
y <- as.Date(load_date,"%Y-%m-%d") %m+% months(i)
The code below didn't work. Error says
unable to find an inherited method for function ‘add_months’ for signature ‘"Date", "numeric"
loaddate = 202009
year <- substr(loaddate,1,4)
month <- substr(loaddate,5,6)
load_date <- paste(year,month,"01",sep = "-")
y <- as.Date(load_date,"%Y%m%d")
y1 <- add_months(y,-3)
Expected Result - 2020-06-01
The lubridate package makes dealing with dates much easier. Here I have shuffled as.Date up a step, then simply subtract 3 months.
library(lubridate)
loaddate = 202009
year <- substr(loaddate,1,4)
month <- substr(loaddate,5,6)
load_date <- as.Date(paste(year,month,"01",sep = "-"))
new_date <- load_date - months(3)
new_date Output:
Date[1:1], format: "2020-06-01"

Detect UK holidays in date data

I am working in R, I have 20 years data and I would to check if every giving date is a UK holiday creating a categorical variable (TRUE/FALSE).
I used this code:
library(timeDate)
c <- timeDate(data$Date)
b <- isHoliday(c, holidays = GBBankHoliday(), wday = 1:6)
or
b <- isHoliday(c, holidays = HolidayLONDON(), wday = 1:6)
but it detecs only Sundays (not Christmass or other Holidays).
Does anyone have an idea what to do?
You can try creating wrapper functions for various holidays in the package, and extracting the dates for the holidays, and cross-referencing those dates for your analysis:
library(timeDate)
c <- timeDate(data$Date)
b <- isHoliday(c, holidays = GBBankHoliday(), wday = 1:6)
years <- list(Year = c(2019,2018,2017,2016))
year_fun <- function(year){timeDate::.easter(year)}
purrr::map(years, year_fun)
$Year
GMT
[1] [2019-04-21] [2018-04-01] [2017-04-16] [2016-03-27]
I created a new binary variable in my data which is called "holiday". If the date of my data is a UK holiday the value is 1 (TRUE) if the date is not a holiday the value is 2 (FALSE). The code is very simple:
library(timeDate)
data$holidays<-as.factor(data$Date %in% (as.Date(holidayLONDON(1990:2010))))

R: Best way around as.POSIXct() in apply function

I'm trying to set up a new variable that incorporates the difference (in number of days) between a known date and the end of a given year. Dummy data below:
> Date.event <- as.POSIXct(c("12/2/2000","8/2/2001"), format = "%d/%m/%Y", tz = "Europe/London")
> Year = c(2000,2001)
> Dates.test <- data.frame(Date.event,Year)
> Dates.test
Date.event Year
1 2000-02-12 2000
2 2001-02-08 2001
I've tried applying a function to achieve this, but it returns an error
> Time.dif.fun <- function(x) {
+ as.numeric(as.POSIXct(sprintf('31/12/%s', s= x['Year']),format = "%d/%m/%Y", tz = "Europe/London") - x['Date.event'])
+ }
> Dates.test$Time.dif <- apply(
+ Dates.test, 1, Time.dif.fun
+ )
Error in unclass(e1) - e2 : non-numeric argument to binary operator
It seems that apply() does not like as.POSIXct(), as testing a version of the function that only derives the end of year date, it is returned as a numeric in the form '978220800' (e.g. for end of year 2000). Is there any way around this? For the real data the function is a bit more complex, including conditional instances using different variables and sometimes referring to previous rows, which would be very hard to do without apply.
Here are some alternatives:
1) Your code works with these changes. We factored out s, not because it is necessary, but only because the following line gets very hard to read without that due to its length. Note that if x is a data frame then so is x["Year"] but x[["Year"]] is a vector as is x$Year. Since the operations are all vectorized we do not need apply.
Although we have not made this change, it would be a bit easier to define s as s <- paste0(x$Year, "-12-31") in which case we could omit the format argument in the following line owing to the use of the default format.
Time.dif.fun <- function(x) {
s <- sprintf('31/12/%s', x[['Year']])
as.numeric(as.POSIXct(s, format = "%d/%m/%Y", tz = "Europe/London") -x[['Date.event']])
}
Time.dif.fun(Dates.test)
## [1] 323 326
2) Convert to POSIXlt, set the year, month and day to the end of the year and subtract. Note that the year component uses years since 1900 and the mon component uses Jan = 0, Feb = 1, ..., Dec = 11. See ?as.POSIXlt for details on these and other components:
lt <- as.POSIXlt(Dates.test$Date.event)
lt$year <- Dates.test$Year - 1900
lt$mon <- 11
lt$mday <- 31
as.numeric(lt - Dates.test$Date.event)
## [1] 323 326
3) Another possibility is:
with(Dates.test, as.numeric(as.Date(paste0(Year, "-12-31")) - as.Date(Date.event)))
## [1] 323 326
You could use the difftime function:
Dates.test$diff_days <- difftime(as.POSIXct(paste0(Dates.test[,2],"-12-31"),format = "%Y-%m-%d", tz = "Europe/London"),Dates.test[,1],unit="days")
You can use ISOdate to build the end of year date, and the difftime(... units='days') to get the days til end of year.
From ?difftime:
Limited arithmetic is available on "difftime" objects: they can be
added or subtracted, and multiplied or divided by a numeric vector.
If you want to do more than the limited arithmetic, just coerce with as.numeric(), but you will have to stick with whatever units you specified.
By convention, you may wish to use the beginning of the next year (midnight on new year's eve) as your endpoint for that year. For example:
Dates.test <- data.frame(
Date.event = as.POSIXct(c("12/2/2000","8/2/2001"),
format = "%d/%m/%Y", tz = "Europe/London")
)
# use data.table::year() to get the year of a date
year <- function(x) as.POSIXlt(x)$year + 1900L
Dates.test$Date.end <- ISOdate(year(Dates.test$Date.event)+1,1,1)
# if you don't want class 'difftime', wrap it in as.numeric(), as in:
Dates.test$Date.diff <- as.numeric(
difftime(Dates.test$Date.end,
Dates.test$Date.event,
units='days')
)
Dates.test
# Date.event Date.end Date.diff
# 1 2000-02-12 2001-01-01 12:00:00 324.5
# 2 2001-02-08 2002-01-01 12:00:00 327.5
The apply() family are basically a clean way of doing for loops, and you should strive for more efficient, vectorized solutions.

Format historical data for forecasting with calendar variables

I have hourly time series data for the year 2015. This data corresponds to power consumption of a big commercial building. I want to use this data to predict the usage for the year 2016. To develop a forecasting model, I need to format this data in a suitable format.
I am planning to use following features to predict the 2016 usage: (1) day of week, (2) time of the day (3) temperature, (4) year 2015 usage.
I am able to create the first 3 features but the fourth one seems tricky.
How should I arrange the 2015 data so that for a particular day of 2016 I can use the corresponding day data of year 2015. My concern is :
I should not use the weekend day data of 2015 to predict the usage of working day
There are some days in 2015, where data is missing for entire day data. For the corresponding day in 2016, how should I account for these missing readings
Here, I have created dummy data corresponding to the year 2015 and 2016.
library(xts)
set.seed(123)
seq1 <- seq(as.POSIXct("2015-01-01"),as.POSIXct("2015-12-31"), by = "hour")
data1 <- xts(rnorm(length(seq1),150,5),seq1)
seq2 <- seq(as.POSIXct("2016-01-01"),as.POSIXct("2016-09-30"), by = "hour")
data2 <- xts(rnorm(length(seq2),140,5),seq2)
Let me give an example to clarify my problem:
Suppose model is: lm( output ~ dayofweek + timeofday + temperature + lastyearusage, data = xxx)
Now suppose I want to predict the usage on 2 oct 2016(dayY), using the lastyearusage onm2 oct 2015(dayX). In this step, issue is 1) How should I ensure thatdayX is not a weekend day if dayY is a working day. I am sure that in this case if I use dayX to predict dayY without keeping a check on day type output will get messy.
There might be already a function in a package to do this, but post here a custom function to add all these kinds of calendar variables (including the week-end info) to a data.frame containing a date/hour column. Fake data:
df <- data.frame(datetime=seq(as.POSIXlt("2013/01/01 00:00:00"), as.POSIXlt("2013/12/31 23:00:00"), by="hour"), variable=rnorm(8760))
#### datetime variable
#### 1 2013-01-01 00:00:00 1.68959052
#### 2 2013-01-01 01:00:00 0.02023722
#### 3 2013-01-01 02:00:00 -0.42080942
The code for the function:
CreateCalendarVariables = function(df, id_column=NULL) {
df <- data.frame(df)
if (is.null(id_column)) stop("Id column for the datetime variable is a mandatory argument")
temp <- df[, id_column]
if ( !(class(temp)[1] %in% c("Date", "POSIXct", "POSIXt", "POSIXlt")) ){
stop("the indicated datetime variable doesn't have the suitable format")
}
require(lubridate)
df['year'] <- year(temp)
df['.quarter'] <- quarter(temp)
df['.month'] <- month(temp)
df['.week'] <- week(temp)
df['.DMY'] <- as.Date(temp)
df['.dayinyear'] <- yday(temp)
df['.dayinmonth'] <- mday(temp)
df['.weekday'] <- wday(temp, label=T, abbr=FALSE) %>% factor(., levels=levels(.)[c(2,3,4,5,6,7,1)])
df['.is_we'] <- df$.weekday %in% c("Saturday", "Sunday")
if(class(temp)[1] != "Date"){
df['.hour'] <- factor(hour(temp))
}
return(df)
}
Then you just have to specify the N° of column containing the date format. If you need for your model these variables in factor format, feel free to adapt the code:
CreateCalendarVariables(df, 2)
#### Error in CreateCalendarVariables(df, 2) :
#### the indicated datetime variable doesn't have the suitable format
CreateCalendarVariables(df, 1)
#### datetime variable year .quarter .month .week .DMY .dayinyear .dayinmonth .weekday .is_we .hour
#### 1 2013-01-01 00:00:00 1.68959052 2013 1 1 1 2012-12-31 1 1 Tuesday FALSE 0
#### 2 2013-01-01 01:00:00 0.02023722 2013 1 1 1 2013-01-01 1 1 Tuesday FALSE 1
To answer your last question, If an entire level is missing from the calibration dataset (i.e. one whole weed and you're using .Week as a predictor), you 'll need to impute the data first.

Get vector of Tuesdays, but if Tuesday falls on a holiday, then replace it with Wednesday in R

I would like to find all of the Tuesdays between two dates. But if the Tuesday falls on a user-defined list of holidays, then I would like Wednesday instead.
This code works in my tests, but it is pretty janky and I am afraid it will fail silently.
low.date <- "1996-01-01"
high.date <- "1997-01-01"
holidays = c("01-01", "07-04", "12-25")
tues <- seq(as.Date(low.date), as.Date(high.date), by = 1)
tues <- subset(tues, format(tues, "%a") == "Tue")
tues <- ifelse(format(tues, "%m-%d") %in% holidays, tues + 1, tues)
tues <- as.Date(tues, origin = "1970-01-01")
Thanks! I see answers pointing to the timeDate package, but I only see methods for finding business days or holidays. Is there a cleaner/safer logic than what I'm using?
It's difficult to modify the logic of your solution. But here is a different form using wday function from lubridate package.
hol_tue <- wday(tues) == 3L & format(tues, "%m-%d") %in% holidays
wday(tues)[hol_tue] <- 4
Slightly inconveniently in lubridate package day count starts from Sunday with Sunday being day 1 as opposed to POSIXlt where it's 0.
POSIXlt in the base package gives you access to wday as a number, which is a little safer since names of days change from system to system.
low.date <- "1996-01-01"
high.date <- "1997-01-01"
holidays <- c("01-01", "07-04", "12-25")
all.days <- seq(as.Date(low.date), as.Date(high.date), by = "day")
# Tuesday is Day 2 of the week
all.tues <- all.days[as.POSIXlt(all.days)$wday == 2]
tues.holidays <- format(all.tues, "%m-%d") %in% holidays
all.tues[tues.holidays] <- all.tues[tues.holidays] + 1

Resources