R bizdays trouble making it work - r

Im tring to use the bizdays package to generate a vector with bus days between two dates.
fer = as.data.frame(as.Date(fer[1:938]))
#Define default calendar
bizdays.options$set(default.calendar=fer)
dt1 = as.Date(Sys.Date())
dt2 = as.Date(Sys.Date()-(365*10)) #sample 10 year window
#Create date vector
datas = bizseq(dt2, dt1)
i get this error: "Error in bizseq.Date(dt2, dt1) : Given date out of range."
the same behavior for any function bizdays et al.
any ideas?

I had a similar problem, but could not apply the accepted answer to my case. What worked for me was to make sure that the first and last holiday in the vector holidays at least covers (or exceeds) the range of dates provided to bizdays():
library(bizdays)
This works (from_date and to_date both lie within the first and last holiday provided by holidays):
holidays <- c("2016-08-10", "2016-08-13")
from_date <- "2016-08-11"
to_date <- "2016-08-12"
cal <- Calendar(holidays, weekdays=c('sunday', 'saturday'))
bizdays(from_date, to_date, cal)
#1
This does not work (to_date lies outside of the last holiday of holidays):
holidays <- c("2016-08-10", "2016-08-11")
from_date <- "2016-08-11"
to_date <- "2016-08-12"
cal <- Calendar(holidays, weekdays=c('sunday', 'saturday'))
bizdays(from_date, to_date, cal)
# Error in bizdays.Date(from, to, cal) : Given date out of range.

If fer is the holidays, you can try with:
bizdays.options$set(default.calendar=Calendar(holidays=fer))

Related

tsibble -- how do you get around implicit gaps when there are none

I am new to the tsibble package. I have monthly data that I coerced to a tsibble to use the fable package. A few issues I am having
It appears the index variable (from my testing) is not of class date even though I applied
lubridate's ymd function to it.
has_gaps function returns FALSE but when I model on the data, I get the error that ".data contains
implicit gaps in time"
library(dplyr)
library(fable)
library(lubridate)
library(tsibble)
test <- data.frame(
YearMonth = c(20160101, 20160201, 20160301, 20160401, 20160501, 20160601,
20160701, 20160801, 20160901, 20161001, 20161101, 20161201),
Claims = c(13032647, 1668005, 24473616, 13640769, 17891432, 11596556,
23176360, 7885872, 11948461, 16194792, 4971310, 18032363),
Revenue = c(12603367, 18733242, 5862766, 3861877, 15407158, 24534258,
15633646, 13720258, 24944078, 13375742, 4537475, 22988443)
)
test_ts <- test %>%
mutate(YearMonth = ymd(YearMonth)) %>%
as_tsibble(
index = YearMonth,
regular = FALSE #because it picks up gaps when I set it to TRUE
)
# Are there any gaps?
has_gaps(test_ts, .full = T)
model_new <- test_ts %>%
model(
snaive = SNAIVE(Claims))
Warning messages:
1: 1 error encountered for snaive
[1] .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using `tsibble::fill_gaps()` if required.
Any help will appreciated.
You have a daily index, but you want a monthly index. The simplest way is to use the tsibble::yearmonth() function, but you will need to convert the date to character first.
library(dplyr)
library(tsibble)
test <- data.frame(
YearMonth = c(20160101, 20160201, 20160301, 20160401, 20160501, 20160601,
20160701, 20160801, 20160901, 20161001, 20161101, 20161201),
Claims = c(13032647, 1668005, 24473616, 13640769, 17891432, 11596556,
23176360, 7885872, 11948461, 16194792, 4971310, 18032363),
Revenue = c(12603367, 18733242, 5862766, 3861877, 15407158, 24534258,
15633646, 13720258, 24944078, 13375742, 4537475, 22988443)
)
test_ts <- test %>%
mutate(YearMonth = yearmonth(as.character(YearMonth))) %>%
as_tsibble(index = YearMonth)
Looks like as_tsibble isn't able to recognize the interval properly in the YearMonth column because it is a Date class object. It's hidden in the 'Index' section of help page that that might be problem:
For a tbl_ts of regular interval, a choice of index representation has to be made. For example, a monthly data should correspond to time index created by yearmonth or zoo::yearmon, instead of Date or POSIXct.
Like that excerpt suggests you can get around the problem with yearmonth(). But that requires a little string manipulation first to get it into a format that will parse properly.
test_ts <- test %>%
mutate(YearMonth = gsub("(.{2})01$", "-\\1", YearMonth) %>%
yearmonth()
) %>%
as_tsibble(
index = YearMonth
)
Now the model should run error free! Not sure why the has_gaps() test is saying everything is okay in your example...

R - Date/Time Calculations

My Question is divided into 2 parts:
1st part:
I have a function, getdata() which I use to pull information for a date range.
get_data <- function (fac_num, start_date, end_date) {
if (!(is.null(fac_num) | is.null(start_date) | is.null(end_date))) {
if(end_date - start_date > 7) {
start_date <- end_date - 7
#start_date <- as.Date('2017-07-05')
#end_date <- as.Date('2017-07-06')
#fac_num <- "005"
}
new_start_date <- paste0(start_date,' 05:00:00')
new_end_date <- paste0(end_date + 1,' 05:00:00')
qry <- paste0("SELECT FAC_NUM, USER_ID, APPL_ID, FUNC_ID, ST_ID, NXT_ST_ID, RESP_PRMT_DATA,
ST_DT_TM, END_DT_TM, RESP_PRMT_TY_CDE,
REQ_INP_DATA FROM OPSDBA.STG_RFS_INTERACTION WHERE TRANS_ST_DT_TM >= DATE'",
start_date,"' AND TRANS_ST_DT_TM BETWEEN TO_TIMESTAMP('",new_start_date,"', 'YYYY-MM-DD HH:MI:SS') AND TO_TIMESTAMP('",new_end_date,"', 'YYYY-MM-DD HH:MI:SS')
AND APPL_ID='CTS' AND FAC_NUM='",fac_num,"'")
and then I perform calculations on it.
Further, in my program. I use this getdata() function to pull data for a new set of analysis.
rf_log_perform <- get_data(display_facility_decode(input$facNum2),
input$dateRange2, input$dateRange2 + 1)
Here since I am using just a single date instead of range, I have added one to the range so that the getdata() function would work.
I then wanted to modify the date range in such a way that, it does not show anything past 11:59 for the selected date.
rf_log_perform$date <- ifelse(strftime(rf_log_perform$st_dt_tm, format="%H:%M:%S")<'05:00:00',
format(as.POSIXct(strptime(rf_log_perform$st_dt_tm - 1*86400 , '%Y-%m-%d %H:%M:%S')),format = '%Y-%m-%d'),
format(as.POSIXct(strptime(rf_log_perform$st_dt_tm , '%Y-%m-%d %H:%M:%S')),format = '%Y-%m-%d'))
By using the getdata() function, I would be able to pull data for date range 08/29/2017, 05:00:00 to 08/30/2017, 05:00:00 which is considered to be a day in my example.
But for my calculations, I want to discard everything which is beyond 08/29/2017, 11:59:59 PM, for more accurate results.
For this purpose, I have added an ifelse statement in there to sort that out. But this isn't behaving as I expect and am confused on why not.
Unfortunately I still can not comment on the main question.
I encourage you to make two adjustments to your question to improve the chances on getting an answer to your question:
1) Please make your example reproducible e.g. provide date ranges, wrap your code in a well defined function etc.
2) Explain what you are trying to achieve. What is your intention and expected result.

Calculating Business Days

I am trying to calculate business days between two days. Successfully, I calculated the days without Saturday and Sunday from this question(Calculate the number of weekdays between 2 dates in R), and now I am trying to implement national holidays into this code. How can I add national holidays into here?
I used this code to calculate weekdays.
Nweekdays <- function(a, b) {
sum(!weekdays(seq(a, b, "days")) %in% c("Saturday", "Sunday"))}
Updated your function a bit so holidays can be added...
Nweekdays <- function(a, b, holidays, weekend) {
possible_days <- seq(a, b, "days")
# Count all days that are not weekend and
# are not holidays
sum(!weekdays(possible_days) %in% weekend & !possible_days %in% holidays)
}
weekend <- c("Saturday", "Sunday")
holidays <- as.Date(c("2017-12-31", "2017-12-24", "2017-07-04"))
Nweekdays(as.Date("2017-08-01"), as.Date("2017-12-31"), holidays, weekend)
[1] 109
While the Gregorian calendar is pretty global, the definition of weekend and holidays is dependent on country, region, etc.
Having some issues with the bizdays package, I came across this solution. I have tweaked the solutions in two way's, one is an answer to the error Marie from the comments is experiencing.
First improvement:
weekend <- c("Saturday", "Sunday") is language dependent, so I changed it to the wday function and use numbers to reference days. Further I added the default to make saturdays and sundays the free days and an option to include the last date or not:
library(lubridate) ## lubridate for wday function
CountWorkdays <- function(from, to, holidays = c(), free = c(7,1), include_last = FALSE) {
# Create list of all days
possible_days <- seq(from, to, "days")
# Include last? If not, remove last item.
if (!include_last) {
possible_days <- possible_days[-length(possible_days)]
}
# Count all days that are not weekend and are not holidays
return(sum(!wday(possible_days) %in% free & !possible_days %in% holidays))
}
Second improvement: If you want to use this function on a dataframe you can use mapply, sapply or equivalent functions, but you can also vectorise the function and make it accept vectors (then it is also usable in dplyr::mutate function). It is important to set which arguments are considered vectors and which are not, I choose the from and to dates to be vectorised, others are considered equal for every row. (a situation where this might not be the case is when you consider contract working days per row for people the work less then five days a week).
CountWorkdaysV <- Vectorize(CountWorkdays, c("from", "to"))
This last adjustment seems to work, but I am not really sure about performance impacts so check before you adopt this function.
Hope this helps somebody who stumbles upon this older question via Google like I did.
2019, 2020, and 2021 US Federal Holidays from https://www.opm.gov/policy-data-oversight/pay-leave/federal-holidays/
holidays <- as.Date(c("2019-01-01", "2019-01-21", "2019-02-18", "2019-05-27", "2019-07-04", "2019-09-02", "2019-10-14", "2019-11-11", "2019-11-28", "2019-12-25",
"2020-01-01", "2020-01-20", "2020-02-17", "2020-05-25", "2020-07-03", "2020-09-07", "2020-10-12", "2020-11-11", "2020-11-26", "2020-12-25",
"2021-01-01", "2021-01-18", "2021-01-20", "2021-02-15", "2021-05-31", "2021-06-18", "2021-07-05", "2021-09-06", "2021-10-11", "2021-11-11", "2021-11-25", "2021-12-24"))
Example Use:
CountWorkdaysV(as.Date("2021-01-15"), as.Date("2021-01-31"), holidays = holidays, include_last = TRUE)
# 9 days

invalid 'tz' value, problems with time zone

I'm working with minute data of NASDAQ, it has the index "2015-07-13 12:05:00 EST". I adjusted the system time with Sys.setenv(TZ = 'EST').
I want to program a simple buy/hold/sell strategy, therefore I create a vector of flat positions as a foundation.
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
Then I want to apply a constraint, that in a certain time window, positions are bound to be flat, which in my case means equal to 1.
pos_flat["T13:41/T14:00"] <- 1
And this returns the error:
"Error in as.POSIXlt.POSIXct(.POSIXct(.index(x)), tz = indexTZ(x)) :invalid 'tz' value".
I also get this error doing other calculations, I just used this example because it is easy and shows the problem.
As extra information:
> Sys.timezone
function (location = TRUE)
{
tz <- Sys.getenv("TZ", names = FALSE)
if (nzchar(tz))
return(tz)
if (location)
return(.Internal(tzone_name()))
z <- as.POSIXlt(Sys.time())
zz <- attr(z, "tzone")
if (length(zz) == 3L)
zz[2L + z$isdst]
else zz[1L]
}
<bytecode: 0x03648ff4>
<environment: namespace:base>
I don't understand the problem with the tz value... Any ideas?
The source of your "invalid 'tz' value" error is because, for whatever reason, R doesn't accept tz = df$var. If you set tz = 'America/New_York' or some other character value, then it will work.
Better answer (instead of using force_tz below) for converting UTC times to various timezones based on location. It is also simpler and better than looping through or using a nested ifelse. I subset and change tz based on a timezone column (which my data already has, if not you can create it). Just make sure you account for all timezones in your data
(unique(df$timezone))
df$datetime2[df$timezone == 'America/New_York'] <- format(df$datetime, tz="America/New_York")[df$timezone == 'America/New_York']
df$datetime2[df$timezone == 'America/Chicago'] <- format(df$datetime, tz="America/Chicago")[df$timezone == 'America/Chicago']
df$datetime2[df$timezone == 'America/Denver'] <- format(df$datetime, tz="America/Denver")[df$timezone == 'America/Denver']
df$datetime2[df$timezone == 'America/Los_Angeles'] <- format(df$datetime, tz="America/Los_Angeles")[df$timezone == 'America/Los_Angeles']
Previous solution: Converting to Local Time in R - Vector of Timezones
require(lubridate)
require(dplyr)
df = data.frame(timestring = c("2015-12-12 13:34:56", "2015-12-14 16:23:32"), localzone = c("America/Los_Angeles", "America/New_York"), stringsAsFactors = F)
df$moment = as.POSIXct(df$timestring, format="%Y-%m-%d %H:%M:%S", tz="UTC")
df = df %>% rowwise() %>% mutate(localtime = force_tz(moment, localzone))
df
You are getting errors because "EST" is not a valid timezone specification. It's an abbreviation that's often used when printing and displaying timezones.
The index is printed as "2015-07-13 12:05:00 EST" because "EST" probably represents Eastern Standard Time in the United States. If you want to set the TZ environment variable to that timezone, you should use Sys.setenv() with Country/City notation:
Sys.setenv(TZ = "America/New_York")
You can also set the timezone in the xts constructor:
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ), tzone = "America/New_York")
Your error occurs because of a misinterpretation of the time object. You need to have UNIX timestamps in order to use something like
pos_flat["T13:41/T14:00"] <- 1
Try a conversion of your indices by doing something like this:
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
As you want to use EST, you have to change your environment variables (if you are not living in EST timezone). So all in all, this should work:
Sys.setenv(TZ = 'EST')
#load stuff
#...
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
pos_flat["T13:41/T14:00"] <- 1
For further information, have a look at the POSIXct and POSIXlt structures in R.
Best regards

Write a function with date as input in R

I would like to write a function that will take date as input argument and output will be day, month, week and week year. My sample code shown some error. Kindly help me
in this regards, thank you.
My Sample code as follows:
myFunction <- function(date){
date <-as.numeric(as.Date(date, format = "%m/%d/%Y",origin = "1899-12-30"))
date$month<- strftime(date,"%m")
date$day<- strftime(date,"%d")
data$week<-strftime(date,"%w")
date$week_year<-strftime(date,"%W")
return(date$day,date$month,date$week,date$week_year)
}
When I called function ,It shown error:
myFunction(2016-07-26)
Error in as.POSIXlt.numeric(x, tz = tz) : 'origin' must be supplied
Your input is a string. Using lubridate you could write
myFunction <- function(date){
library(lubridate)
t0 <- ymd(date)
return(list(day(t0), month(t0), week(t0), wday(t0, label=F, abbr=F), year(t0)))
}

Resources