I have a Date object in R and would like to add 1 business day to this date. If the result is a holiday, I would like the date to be incremented to the next non-holiday date. Let's assume I mean NYSE holidays. How can I do this?
Example:
mydate = as.Date("2013-12-24")
mydate + 1 #this is a holiday so I want this to roll over to the 26th instead
I might use a combo of timeDate::nextBizDay() and roll=-Inf to set up a data.table lookup calendar, like this:
library(data.table)
library(timeDate)
## Set up a calendar for 2013 & 2014
cal <- data.table(date=seq(from=as.Date("2013-01-01"), by=1, length=730),
key="date")
cal2 <- copy(cal)
cal2[,nextBizDay:=date+1]
cal2 <- cal2[isBizday(as.timeDate(nextBizDay)),]
cal <- cal2[cal,,roll=-Inf]
## Check that it works
x <- as.Date("2013-12-21")+1:10
cal[J(x),]
# date nextBizDay
# 1: 2013-12-22 2013-12-23
# 2: 2013-12-23 2013-12-24
# 3: 2013-12-24 2013-12-26
# 4: 2013-12-25 2013-12-26
# 5: 2013-12-26 2013-12-27
# 6: 2013-12-27 2013-12-30
# 7: 2013-12-28 2013-12-30
# 8: 2013-12-29 2013-12-30
# 9: 2013-12-30 2013-12-31
# 10: 2013-12-31 2014-01-01
## Or perhaps:
lu <- with(cal, setNames(nextBizDay, date))
lu[as.character(x[1:6])]
# 2013-12-22 2013-12-23 2013-12-24 2013-12-25 2013-12-26 2013-12-27
# "2013-12-23" "2013-12-24" "2013-12-26" "2013-12-26" "2013-12-27" "2013-12-30"
Lubridate will not help you as it does not a notion of business days.
At least two packages do, and they both have a financial bent:
RQuantLib has exchange calendars for many exchanges (but it is a pretty large package)
timeDate also has calendars
Both packages have decent documentation which will permit you to set this up from working examples.
A third option (for simple uses) is to just store a local calendar out a few years and use that.
Edit: Here is a quick RQuantLib example:
R> library(RQuantLib)
R> adjust(calendar="TARGET", dates=Sys.Date()+2:6, bdc = 0)
2013-12-22 2013-12-23 2013-12-24 2013-12-25 2013-12-26
"2013-12-23" "2013-12-23" "2013-12-24" "2013-12-27" "2013-12-27"
R>
It just moves the given day (from argument dates) forward to the next biz day.
holidayNYSE(year = getRmetricsOptions("currentYear")) also check out isHoliday from timeDate package
Related
I am currently facing a dataset of taxi trips by a driver in NYC. I got the driver ID as well as the pickup date and time and dropoff date and time for every trip. Now I want to calculate the waiting time between the dropoff time of the last trip and the pickup time of the new trip. Therefore I have to calculate the time difference between two columns with one Lag (because dropoff time refers to the last trip and pickup time to the next trip (next column)) grouped by driver ID (to make sure I am not calculating the time difference between trips of two different drivers).
A possible data set looks like this:
hack_license = c("303F79923DA5DA7A10DF15E2D91CDCF7","697ABFCDF7E7C77A01183C857132F2A4","697ABFCDF7E7C77A01183C857132F2A4","697ABFCDF7E7C77A01183C857132F2A4","ABE23CA71E2DE84972281BA1C70B6EBB","ABE23CA71E2DE84972281BA1C70B6EBB","BA83D7C383EAA4F9D78A1A8B83CB3E92","BA83D7C383EAA4F9D78A1A8B83CB3E92","D476A1872F1F6594BD638C274483ED06","D476A1872F1F6594BD638C274483ED06")
pickup_datetime = c("2013-12-31 23:01:07","2013-12-31 23:04:00","2013-12-31 23:31:00","2013-12-31 23:40:00","2013-12-31 23:16:39","2013-12-31 23:24:05","2013-12-31 23:09:10","2013-12-31 23:26:26","2013-12-31 23:13:00","2013-12-31 23:22:00")
dropoff_datetime = c("2013-12-31 23:20:33","2013-12-31 23:28:00","2013-12-31 23:33:00","2013-12-31 23:48:00","2013-12-31 23:22:29","2013-12-31 23:28:37","23:21:24","2013-12-31 23:36:54","2013-12-31 23:20:00","2013-12-31 23:27:00")
data <- data.frame(hack_license,pickup_datetime,dropoff_datetime)
I tried to use dplyr and lubridate like this, but it doesn't work.
data %>%
group_by(data$hack_license) %>%
group_by(hack_license) %>%
mutate(waiting_time_in_secs = difftime(pickup_datetime,
lag(dropoff_datetime), units = 'secs'))
Maybe some of you can help me out here. Would be great!
You can create a datetime column for both pickup and dropoff and for each hack_license calculate the difference in time between the current pickup time and previous drop off time.
library(dplyr)
library(lubridate)
data <- data %>%
mutate(pickup_datetime = ymd_hms(pickup_datetime),
dropoff_datetime = ymd_hms(dropoff_datetime)) %>%
group_by(hack_license) %>%
mutate(waiting_time_in_secs = as.numeric(difftime(pickup_datetime,
lag(dropoff_datetime), units = 'secs')))
data
# hack_license pickup_datetime dropoff_datetime waiting_time_in_secs
# <chr> <dttm> <dttm> <dbl>
# 1 303F79923DA5DA7A10DF15E2D91CDCF7 2013-12-31 23:01:07 2013-12-31 23:20:33 NA
# 2 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:04:00 2013-12-31 23:28:00 NA
# 3 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:31:00 2013-12-31 23:33:00 180
# 4 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:40:00 2013-12-31 23:48:00 420
# 5 ABE23CA71E2DE84972281BA1C70B6EBB 2013-12-31 23:16:39 2013-12-31 23:22:29 NA
# 6 ABE23CA71E2DE84972281BA1C70B6EBB 2013-12-31 23:24:05 2013-12-31 23:28:37 96
# 7 BA83D7C383EAA4F9D78A1A8B83CB3E92 2013-12-31 23:09:10 2013-12-31 23:21:24 NA
# 8 BA83D7C383EAA4F9D78A1A8B83CB3E92 2013-12-31 23:26:26 2013-12-31 23:36:54 302
# 9 D476A1872F1F6594BD638C274483ED06 2013-12-31 23:13:00 2013-12-31 23:20:00 NA
#10 D476A1872F1F6594BD638C274483ED06 2013-12-31 23:22:00 2013-12-31 23:27:00 120
I have data that deals with production months and relative frequency. On the x-axis the production months and on the y-axis the relative frequency. Now you can see increases and decreases in this course. My goal is to put these climbs and descents in an interval. There are a few procedures that are concerned with determining these abnormalities. I have already dealt with them and implemented the "Hill Climbing" algorithm. I get intervals, but these are not great. Now I wanted to extend this algorithm so that I get better intervals. I already tried using some packages like strucchange() or breakpoints() but these are always giving me errors. Since I'm neither a computer scientist nor a mathematician, it would be great to get some advice!
My code for hill climbing:
hillclimbing1 <- function(month,amount)
{
res <- c()
val <- amount[1]
j <- 1
for (i in 1:length(month))
{
if(abs(amount[i] - val) > abs((val*0.3)))
{
val <- amount[i]
res[j] <- i - 0.5
j <- j +1
}
}
return(res)
}
My dataframe looks like this:
month amount
2012-07-01 0.0000000
2012-08-01 1.1111111
2012-09-01 0.2985075
2012-10-01 0.5141388
2012-11-01 0.0000000
2012-12-01 0.0000000
2013-01-01 0.6849315
2013-02-01 1.9762846
2013-03-01 1.1799410
2013-04-01 0.2881844
2013-05-01 0.2617801
2013-06-01 1.2285012
2013-07-01 1.2285012
2013-08-01 1.3539652
2013-09-01 1.6694491
2013-10-01 2.4000000
2013-11-01 2.5065963
2013-12-01 2.4869110
2014-01-01 2.0497804
2014-02-01 1.4044944
2014-03-01 3.9443155
2014-04-01 2.9748284
2014-05-01 3.0623020
2014-06-01 2.2044088
2014-07-01 2.9686175
2014-08-01 3.1304348
2014-09-01 3.9028621
2014-10-01 2.3942538
2014-11-01 2.9021559
2014-12-01 4.6280992
2015-01-01 3.8616251
2015-02-01 3.0252101
2015-03-01 3.7565740
2015-04-01 4.0977714
EDIT:
After using the min/max method I get following plot:
Is there any way to get rid of interval no. 2 and 3?
I am trying to convert the timestamps in the stock data from Google Finance API to a more usable datetime format.
I have used data.table::fread to read the data here:
fread(<url>)
datetime open high low close volume
1: a1497619800 154.230 154.2300 154.2300 154.2300 500
2: 1 153.720 154.3200 153.7000 154.2500 1085946
3: 2 153.510 153.8000 153.2000 153.7700 34882
4: 3 153.239 153.4800 153.1400 153.4800 24343
5: 4 153.250 153.3000 152.9676 153.2700 20212
As you can see, the "datetime" format is rather strange. The format is described in this link:
The full timestamps are denoted by the leading 'a'. Like this: a1092945600. The number after the 'a' is a Unix timestamp. [...]
The numbers without a leading 'a' are "intervals". So, for example, the second row in the data set below has an interval of 1. You can multiply this number by our interval size [...] and add it to the last Unix Timestamp.
In my case, the "interval size" is 300 seconds (5 minutes). This format is restarted at the start of each new day and so trying to format it is quite difficult!
I can pull out the index positions of where the day starts are by using grep and searching for "a";
newDay <- grep(df$V1, pattern = "a")
Then my idea was to split the dataframe into chunks depending on index positions then extend the unix times on each day separately followed by combing them back to a data.table, before storing.
data.table::split looks like it will do the job, but I am unsure of how to supply it the day breaks to split by index positions, or if there is a more logical way to achieve the same result without having to break it down to each day.
Thanks.
You may use grepl to search for "a" in "datetime", which results in a boolean vector. cumsum the boolean to create a grouping variable - for each "a" (TRUE), the counter will increase by one.
Within each group, convert the first element to POSIXct, using an appropriate format and origin (and timezone, tz?). Add multiples of the 'interval size' (300 sec), using zero for the first element and the "datetime" multiples for the others.
d[ , time := {
t1 <- as.POSIXct(datetime[1], format = "a%s", origin = "1970-01-01")
.(t1 + c(0, as.numeric(datetime[-1]) * 300))
}
, by = .(cumsum(grepl("^a", datetime)))]
d
# datetime time
# 1: a1497619800 2017-06-16 15:30:00
# 2: 1 2017-06-16 15:35:00
# 3: 2 2017-06-16 15:40:00
# 4: 3 2017-06-16 15:45:00
# 5: 4 2017-06-16 15:50:00
# 6: a1500000000 2017-07-14 04:40:00
# 7: 3 2017-07-14 04:55:00
# 8: 5 2017-07-14 05:05:00
# 9: 7 2017-07-14 05:15:00
Some toy data:
d <- fread(input = "datetime
a1497619800
1
2
3
4
a1500000000
3
5
7")
With:
DT[grep('^a', date), datetime := as.integer(gsub('\\D+','',date))
][, datetime := zoo::na.locf(datetime)
][nchar(date) < 4, datetime := datetime + (300 * as.integer(date))
][, datetime := as.POSIXct(datetime, origin = '1970-01-01', tz = 'America/New_York')][]
you get:
date close high low open volume datetime
1: a1500298200 153.57 153.7100 153.57 153.5900 1473 2017-07-17 09:30:00
2: 1 153.51 153.8700 153.33 153.7500 205057 2017-07-17 09:35:00
3: 2 153.49 153.7800 153.34 153.5800 70023 2017-07-17 09:40:00
4: 3 153.68 153.7300 153.42 153.5400 53050 2017-07-17 09:45:00
5: 4 153.06 153.7500 153.06 153.7200 120899 2017-07-17 09:50:00
---
2348: 937 143.94 144.0052 143.91 143.9917 36651 2017-08-25 15:40:00
2349: 938 143.90 143.9958 143.90 143.9400 40769 2017-08-25 15:45:00
2350: 939 143.94 143.9500 143.87 143.8900 56616 2017-08-25 15:50:00
2351: 940 143.97 143.9700 143.89 143.9400 56381 2017-08-25 15:55:00
2352: 941 143.74 143.9700 143.74 143.9655 179811 2017-08-25 16:00:00
Used data:
DT <- fread('https://www.google.com/finance/getprices?i=300&p=30d&f=d,t,o,h,l,c,v&df=cpct&q=IBM', skip = 7, header = FALSE)
setnames(DT, 1:6, c('date','close','high','low','open','volume'))
I use an xts object. The index of the object is as below. There is one for every hour of the day for a year.
"2011-01-02 18:59:00 EST"
"2011-01-02 19:58:00 EST"
"2011-01-02 20:59:00 EST"
In columns are values associated with each index entry. What I want to do is calculate the standard deviation of the value for all Mondays at 18:59 for the complete year. There should be 52 values for the year.
I'm able to search for the day of the week using the weekdays() function, but my problem is searching for the time, such as 18:59:00 or any other time.
You can do this by using interaction to create a factor from the combination of weekdays and .indexhour, then use split to select the relevant observations from your xts object.
set.seed(21)
x <- .xts(rnorm(1e4), seq(1, by=60*60, length.out=1e4))
groups <- interaction(weekdays(index(x)), .indexhour(x))
output <- lapply(split(x, groups), function(x) c(count=length(x), sd=sd(x)))
output <- do.call(rbind, output)
head(output)
# count sd
# Friday.0 60 1.0301030
# Monday.0 59 0.9204670
# Saturday.0 60 0.9842125
# Sunday.0 60 0.9500347
# Thursday.0 60 0.9506620
# Tuesday.0 59 0.8972697
You can use the .index* family of functions (don't forget the '.' in front of 'index'!):
fxts[.indexmon(fxts)==0] # its zero-based (!) and gives you all the January values
fxts[.indexmday(fxts)==1] # beginning of month
fxts[.indexwday(SPY)==1] # Mondays
require(quantmod)
> fxts
value
2011-01-02 19:58:00 1
2011-01-02 20:59:00 2
2011-01-03 18:59:00 3
2011-01-09 19:58:00 4
2011-01-09 20:59:00 5
2011-01-10 18:59:00 6
2011-01-16 18:59:00 7
2011-01-16 19:58:00 8
2011-01-16 20:59:00 9`
fxts[.indexwday(fxts)==1] #this gives you all the Mondays
for subsetting the time you use
fxts["T19:30/T20:00"] # this will give you the time period you are looking for
and here you combine weekday and time period
fxts["T18:30/T20:00"] & fxts[.indexwday(fxts)==1] # to get a logical vector or
fxts["T18:30/T21:00"][.indexwday(fxts["T18:30/T21:00"])==1] # to get the values
> value
2011-01-03 18:58:00 3
2011-01-10 18:59:00 6
I have been using data.table for practically everything I was using data.frames for, as it is much, much faster on big in-memory data (several million rows). However, I'm not quite sure how to add days or months to an IDate column without using apply (which is very slow).
A minimal example:
dates = c("2003-01-01", "2003-02-01", "2003-03-01", "2003-06-01", "2003-12-01",
"2003-04-01", "2003-05-01", "2003-07-01", "2003-09-01", "2003-08-01")
dt = data.table(idate1=as.IDate(dates))
Now, let's say I want to create a column with dates 6 months ahead. Normally, for a single IDate, I would do this:
seq(dt$idate1[1],by="6 months",length=2)[2]
But this won't work as from= must be of length 1:
dt[,idate2:=seq(idate1,by="6 months",length=2)[2]]
Is there an efficient way of doing it to create column idate2 in dt?
Thanks a lot,
RR
One way is to use mondate package and add the months to it and then convert it back to iDate class object.
require(mondate)
dt = data.table(idate1=as.IDate(dates))
dt[, idate2 := as.IDate(mondate(as.Date(idate1)) + 6)]
# idate1 idate2
# 1: 2003-01-01 2003-07-01
# 2: 2003-02-01 2003-08-02
# 3: 2003-03-01 2003-09-01
# 4: 2003-06-01 2003-12-02
# 5: 2003-12-01 2004-06-01
# 6: 2003-04-01 2003-10-02
# 7: 2003-05-01 2003-11-01
# 8: 2003-07-01 2004-01-01
# 9: 2003-09-01 2004-03-02
# 10: 2003-08-01 2004-02-01
Although, I suppose that there might be other better solutions.
You can use lubridate,
library(lubridate)
dt[, idate2 := as.IDate(idate1 %m+% months(6))]
idate1 idate2
1: 2003-01-01 2003-07-01
2: 2003-02-01 2003-08-01
3: 2003-03-01 2003-09-01
4: 2003-06-01 2003-12-01
5: 2003-12-01 2004-06-01
6: 2003-04-01 2003-10-01
7: 2003-05-01 2003-11-01
8: 2003-07-01 2004-01-01
9: 2003-09-01 2004-03-01
10: 2003-08-01 2004-02-01