Financial time difference (in R) - r

I am working with financial asset pricing data and I am interested in finding the time difference between the dates of two consecutive operations on the market.
A simple example could be:
first operation -> client X buys stock A on 2020-01-02 09:00:00
second operation -> client X buys stock B on 2020-01-03 09:00:00
Here is my problem:
I am looking for a function that computes the FINANCIAL time difference between this two datetime objects.
Hence, I am not interested in a simple calendar time difference (that can be computed in R using the well-known difftime() function), but in a time difference that considers the financial or trading day, that is (roughly speaking) a day that starts at 9.00 am and ends at 18.00 pm.
Therefore, this function should give a result of 9 hours (if the unit reference is hour) to the simple example above, instead of 24 hours as the usual calendar time difference would suggest.
A more complex version of this function would also take into account the market holidays and exclude weekends from computation.
Following an example using R
d1 <- as.POSIXct("2020-01-02 09:00:00", tz = "UTC")
d2 <- as.POSIXct("2020-01-03 09:00:00", tz = "UTC")
difftime(d2, d1, units = "hours")
This produce a time difference of 24 hours.
However, the expected result would be just 9 hours, since the financial (or market trading) day ends on 2020-01-02 at 18.00 pm and starts again the day after at 9.00 am, hence there should be only 9 hours of trading between the two.
I mainly work in R, so I would appreciate if someone can give me some advice on that language, but if anyone knows something similar in other languages would also be very useful.
Thank you a lot for your help and time.

Related

How to I transform half-hourly data that does not span the whole day to a Time Series in R?

This is my first question on stackoverflow, sorry if the question is poorly put.
I am currently developing a project where I predict how much a person drinks each day. I currently have data that looks like this:
The menge column represents how much water a person has actually drunk in 30 minutes (So first value represents amount from 8:00 till before 8:30 etc..). This is a 1 day sample from 3 months of data. The day starts at 8 AM and ends at 8 PM.
I am trying to forecast the Time Series for each day. For example, given the first one or two time steps, we would predict the whole day and then we know how much in total the person has drunk until 8 PM.
I am trying to model this data as a Time Series object in R (Google Colab), in order to use Croston's Method for the forecasting. Using the ts() function, what should I set the frequency to knowing that:
The data is half-hourly
The data is from 8:00 till 20:00 each day (Does not span the whole day)
Would I need to make the data span the whole day by adding 0 values? Are there maybe better approaches for this? Thank you in advance.
When using the ts() function, the frequency is used to define the number of (usually regularly spaced) observations within a given time period. For your example, your observations are every 30 minutes between 8AM and 8PM, and your time period is 1 day. The time period of 1 day assumes that the patterns over each day is of most interest here, you could also use 1 week here.
So within each day of your data (8AM-8PM) you have 24 observations (24 half hours). So a suitable frequency for this data would be 24.
You can also pad the data with 0 values, however this isn't necessary and would complicate the model. If you padded the data so that it has observations for all half-hours of the day, the frequency would then be 48.

R, intraday data, convert datetime to a different timezone

I have a dataset with intraday data where three important variables are Country, Datetime, Price.
An example could be:
Sweden, 2019-12-23 09:08:00, 105.31
This data is downloaded from Bloomberg, and it looks like it uses my local time (Denmark). For example, for Australia I have a market which starts at 23 00 which does not make sense unless it is European time. I would like to convert the time that I have in the data to the local time in that particular country. Of course, I could add or subtract some hours, but the time difference is not fixed: some countries have summer/winter time while other countries don't, and countries which do have summer/winter time may change on different days (for example I think there is about one week between the time change in US and Europe). Do you have an advice how to transform my dataset into the local timezone? So, if it says "2019-12-23 09:08:00", then I would like to know that in that particular country it was 09:08 in the morning (and not in my country). I really hope there is a smart R function for this.
Thanks in advance!
You could use lubridate::force_tz and lubridate::with_tz:
dat <- as.POSIXct("2021-05-01 12:00:00",tz = "UTC")
lubridate::force_tz(dat,tz="CET")
#> [1] "2021-05-01 12:00:00 CEST"
lubridate::with_tz(lubridate::force_tz(dat,tz="UTC"))
#> [1] "2021-05-01 14:00:00 CEST"

How to specify interval or frequency with tsibble and fable for service hours?

I want to forecast the number of customers entering a shop during service hours. I have hourly data for
Monday to Friday
8:00 to 18:00
Thus, I assume my time series is in fact regular, but atypical in a sense, since I have 10 hours a day and 5 days a week.
I am able to do modeling with this regular 24/7 time series by setting non-service hours to zero, but I find this inefficient and also incorrect, because the times are not missing. Rather, they do not exist.
Using the old ts-framework I was able to explicitly specify
myTS <- ts(x, frequency = 10)
However, within the new tsibble/fable-framework this is not possible. It detects hourly data and expects 24 hours per day and not 10. Every subsequent function reminds me of implicit gaps in time. Manually overriding the interval-Attribute works:
> attr(ts, "interval") <- new_interval(hour = 10)
> has_gaps(ts)
# A tibble: 1 x 1
.gaps
<lgl>
1 FALSE
But has no effect on modeling:
model(ts,
snaive = SNAIVE(customers ~ lag("week")))
I still get the same error message:
1 error encountered for snaive [1] .data contains implicit gaps in
time. You should check your data and convert implicit gaps into
explicit missing values using tsibble::fill_gaps() if required.
Any help would be appreciated.
This question actually corresponds to this gh issue. As far as I know, there's no R packages that allow users to construct custom schedule, for example to specify certain intra-days and days. A couple of packages provide some specific calendars (like business dates), but none gives a solution to setting up intra days. Tsibble will gain a calendar argument for custom calendars to respect structural missings, when such a package is made available. But currently no support for that.
As you stated, it's hourly data. Hence the data interval should be 1 hour, not 10 hours. However, ts() frequency is seasonal periods, 10 hours per day, for modelling.

XTS: split FX intraday bar data by trading days

I want to apply a function to 20 trading days worth of hourly FX data (as one example amongst many).
I started off with rollapply(data,width=20*24,FUN=FUN,by=24). That seemed to be working well, I could even assert I always got 480 bars passed in... until I realized that wasn't what I wanted. The start and end time of those 480 bars was drifting over the years, due to changes in daylight savings, and market holidays.
So, what I want is a function that treats a day as from 22:00 to 22:00 of each day we have data for. (21:00 to 21:00 in N.Y. summertime - my data timezone is UTC, and daystart is defined at 5pm ET)
So, I made my own rollapply function with this at its core:
ep=endpoints(data,on=on,k=k)
sp=ep[1:(length(ep)-width)]+1
ep=ep[(width+1):length(ep)]
xx <- lapply(1:length(ep), function(ix) FUN(.subset_xts(data,sp[ix]:ep[ix]),...) )
I then called this with on="days", k=1 and width=20.
This has two problems:
Days is in days, not trading days! So, instead of typically 4 weeks of data, I get just under 3 weeks of data.
The cutoff is midnight UTC. I cannot work out how to change it to use the 22:00 (or 21:00) cutoff.
UPDATE: Problem 1 above is wrong! The XTS endpoints function does work in trading days, not calendar days. The reason I thought otherwise is the timezone issue made it look like a 6-day trading
week: Sun to Fri. Once the timezone problem was fixed (see my
self-answer), using width=20 and on="days" does indeed give me 4
weeks of data.
(The typically there is important: when there is a trading holiday during those 4 weeks I expect to receive 4 weeks 1 day's worth of data, i.e. always exactly 20 trading days.)
I started working on a function to cut the data into weeks, thinking I could then cut them into five 24hr chunks, but this feels like the wrong approach, and surely someone has invented this wheel before me?
Here is how to get the daybreak right:
x2=x
index(x2)=index(x2)+(7*3600)
indexTZ(x2)='America/New_York'
I.e. just setting the timezone puts the daybreak at 17:00; we want it to be at 24:00, so add 7 hours on first.
With help from:
time zones in POSIXct and xts, converting from GMT in R
Here is the full function:
rollapply_chunks.FX.xts=function(data,width,FUN,...,on="days",k=1){
data <- try.xts(data)
x2 <- data
index(x2) <- index(x2)+(7*3600)
indexTZ(x2) <- 'America/New_York'
ep <- endpoints(x2,on=on,k=k) #The end point of each calendar day (when on="days").
#Each entry points to the final bar of the day. ep[1]==0.
if(length(ep)<2){
stop("Cannot divide data up")
}else if(length(ep)==2){ #Can only fit one chunk in.
sp <- 1;ep <- ep[-1]
}else{
sp <- ep[1:(length(ep)-width)]+1
ep <- ep[(width+1):length(ep)]
}
xx <- lapply(1:length(ep), function(ix) FUN(.subset_xts(data,sp[ix]:ep[ix]),...) )
xx <- do.call(rbind,xx) #Join them up as one big matrix/data.frame.
tt <- index(data)[ep] #Implicit align="right". Use sp for align="left"
res <- xts(xx, tt)
return (res)
}
You can see we use the modified index to split up the original data. (If R uses copy-on-write under the covers, then the only extra memory requirement should be for a copy of the index, not of the data.)
(Legal bit: please consider it licensed under MIT, but explicit permission given to use in the GPL-2 XTS package if that is desired.)

How do i extract a specific, recurring time from 1 minute tick data in R?

For instance, let's say I want to extract the price at 09:04:00 everyday from a timeseries that is formatted as:
DateTime | Price
2011-04-09 09:01:00 | 100.00
2011-04-09 09:02:00 | 100.10
2011-04-09 09:03:00 | 100.13
(NB: there is no | in the actual data, i've just included it here to illustrate that the DateTime is the index and Price is the coredata and that the two are distinct within the xts object)
and put those extracted values into an xts vector...what is the most efficient way to do this?
Also, if i have a five year time series of a cross-border spread, where - due to time differences - the spread opens at different times during the year (say 9am during winter, and 10am during summer) how can I get R to take account of those time differences and recognise either 9am-16:30 or 10am-16:30 as the same "day" interval.
In other words, I want to convert an intraday, 1m tick data file to daily OHLC data. Normally would just use xts and to.period to do this, but - given the time difference noted above - gives odd / strange day start/end times due
Any advice greatly appreciated!
You can use the "T" prefix with xts subsetting to specify a time interval for each day. You must specify an interval; a single time will not work.
set.seed(21)
x <- xts(cumprod(1+rnorm(2*60*24)/100),
as.POSIXct("2011-04-09 09:01:00")+60*(1:(2*60*24)))
x["T09:01:59/T09:02:01"]
# [,1]
# 2011-04-09 09:02:00 0.9980737
# 2011-04-10 09:02:00 1.0778835

Resources