lubridate mismatched index class - r

Before I start, I would like to state that I have seen the FAQ entry at https://tsibble.tidyverts.org/articles/faq.html, however I am still unable to get to a workable solution.
I am using the "Aggregates (Bars)" output from polygon.io (https://polygon.io/docs/get_v2_aggs_ticker__forexTicker__range__multiplier___timespan___from___to__anchor)
Due to licensing/copyright restrictions I am unable to post the data here, but if you go to the docs link above, they have a sample there (no login required).
Polygon.io provide timestamp as follows:
t integer The Unix Msec timestamp for the start of the aggregate
window.
My attempt so far looks like this :
library(fpp3)
library(fable.prophet)
library(jsonlite)
library(curl)
library(tidyverse)
library(lubridate)
myURL <- # *N.B. For sample data please see doc link in SO question*
myData <- fromJSON(myURL)
myData$ticker
myData$results
parse1 <- myData$results %>% select(t,c) %>%
mutate(dt = as_datetime(t/1000),.keep="unused",.before=1)
print(head(parse1))
parse2 <- as_tsibble(parse1,index=dt)
However this yields:
Error: Can't obtain the interval due to the mismatched index class.

The issue seems to be related to the regular interval of Datetime column 'dt'. We can convert it to Date class with as.Date and it works
library(dplyr)
library(tsibble)
parse1 %>%
mutate(dt = as.Date(dt)) %>%
as_tsibble(index = dt)
-output
# A tsibble: 120 x 2 [1D]
dt c
<date> <dbl>
1 2021-01-03 1.22
2 2021-01-04 1.23
3 2021-01-05 1.23
4 2021-01-06 1.23
5 2021-01-07 1.23
6 2021-01-08 1.22
7 2021-01-09 1.22
8 2021-01-10 1.22
9 2021-01-11 1.22
10 2021-01-12 1.22
# … with 110 more rows
Was able to replicate the same error in the OP's post
as_tsibble(parse1,index=dt)
Error: Can't obtain the interval due to the mismatched index class.
ℹ Please see `vignette("FAQ")` for details.
Run `rlang::last_error()` to see where the error occurred.
The issue is with the piece of code in as_tsibble
...
if (unknown_interval(interval) && (nrows > vec_size(key_data))) {
abort(c(
"Can't obtain the interval due to the mismatched index class.",
i = "Please see `vignette(\"FAQ\")` for details."
))
}
...
There is an argument to specify if the interval is regular or not. By default, it is TRUE. Here, the interval is not regular. So, we need
as_tsibble(parse1,index=dt)
Error: Can't obtain the interval due to the mismatched index class.
ℹ Please see `vignette("FAQ")` for details.
Run `rlang::last_error()` to see where the error occurred.
Thus, change the regular to FALSE
as_tsibble(parse1,index=dt, regular = FALSE)
# A tsibble: 120 x 2 [!] <GMT>
dt c
<dttm> <dbl>
1 2021-01-03 00:00:00 1.22
2 2021-01-04 00:00:00 1.23
3 2021-01-05 00:00:00 1.23
4 2021-01-06 00:00:00 1.23
5 2021-01-07 00:00:00 1.23
6 2021-01-08 00:00:00 1.22
7 2021-01-09 00:00:00 1.22
8 2021-01-10 00:00:00 1.22
9 2021-01-11 00:00:00 1.22
10 2021-01-12 00:00:00 1.22
# … with 110 more rows

Related

Calculating the difference of a column compared to a specific reference row

I have a data frame with data for every minute and every weekday during the year and want to calculate the difference based on a specific reference line each day (which is 08:30:00 in this example and Data1 is the column I want to compare the difference for). Usually I would use diff and lag but there I can only check the difference to n-previous rows not one specific reference row.
As the entire data has about 1 Mio entries I think using lag and diff in a recursive function (where I could use the condition-check for the starting line and then walking forward) would be too time consuming. Another idea I had is doing a second data frame with only the reference line for each day (which only had line 3 in this sample) and then joining with the original data frame as a new column containing the starting value. Then I could easily calc the difference between two columns.
Date Time Data1 Diff
1 2022-01-03 08:28:00 4778.14 0
2 2022-01-03 08:29:00 4784.23 0
3 2022-01-03 08:30:00 4785.15 0
4 2022-01-03 08:31:00 4785.01 -0.14
5 2022-01-03 08:32:00 4787.83 2.68
6 2022-01-03 08:33:00 4788.80 3.65
You can subset Data1 to rows where Time is "08:30:00" as follows. This assumes Time is character.
dat$diff <- dat$Data1 - dat$Data1[[match("08:30:00", dat$Time)]]
dat
Date Time Data1 Diff diff
1 2022-01-03 08:28:00 4778.14 0.00 -7.01
2 2022-01-03 08:29:00 4784.23 0.00 -0.92
3 2022-01-03 08:30:00 4785.15 0.00 0.00
4 2022-01-03 08:31:00 4785.01 -0.14 -0.14
5 2022-01-03 08:32:00 4787.83 2.68 2.68
6 2022-01-03 08:33:00 4788.80 3.65 3.65
For data with multiple dates, you can do the same operation for each day using dplyr::group_by():
library(dplyr)
dat %>%
group_by(Date) %>%
mutate(diff = Data1 - Data1[[match("08:30:00", Time)]]) %>%
ungroup()

Doing operations down columns in R, indexing by another column

I am trying to compute the hedging error for an options pricing model. Each day, I will compute an equivalent position that one should take when hedging against this option in the market, let's call it X_s, and compute the cash position of the hedge, let's call it X_0, for every given day. This doesn't present any issues since I can mapply() a function that calculates all the necessary partials given my parameters, stock price, etc. to compute X_s and X_0. Where I am starting to run into issues is when trying to compute the hedging error for my models. Here's a subset of my data that I'm looking at:
date optionid px_last r X_s_position X_0_cash mp_ba
1 2020-03-03 127117475 3003.37 0.011587702 0.642588548 -1783.881169 146.05
2 2020-03-03 131373646 3003.37 0.011587702 0.527107056 -1477.947518 105.15
3 2020-03-06 127117475 2972.37 0.008128021 0.566540143 -1558.566925 125.40
4 2020-03-09 127117475 2746.56 0.004745339 0.133284145 -332.122900 33.95
5 2020-03-10 127117475 2882.23 0.005884274 0.413389283 -1125.632994 65.85
6 2020-03-11 127117475 2741.38 0.006223502 0.131700734 -333.691757 27.35
7 2020-03-12 127117475 2480.64 0.003787032 0.003680431 -8.179825 0.95
So, let's say we're looking at optionid == 127117475. On the first observation date we won't have any hedge error, so we go to the next observation on 2020-03-06. The hedge error on that day would be
0.642588548*2972.37 + -1783.881169*exp(0.011587702*as.numeric(2020-03-06 - 2020-03-03)/365) - 105.15
So in row 3, in the new 'hedge error' column I want to create, the value would be 20.80985. So, to calculate the hedge error for the next observation of optionid == 127117475, I take the previous observation X_s_position multiply it by the next spot price (px_last), add the X_0_cash value multiplied by exp(r*(difference in days between the two observations)/365) and then subtract the next observation of the option price (mp_ba)
Perhaps like so? Should the mp_ba in your example be 125.40?
library(dplyr)
df %>%
group_by(optionid) %>%
mutate(hedge_error = lag(X_s_position)*px_last + X_0_cash*exp(lag(r)*as.numeric(date - lag(date))/365) - mp_ba)
Result
# A tibble: 7 × 8
# Groups: optionid [2]
date optionid px_last r X_s_position X_0_cash mp_ba hedge_error
<date> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2020-03-03 127117475 3003. 0.0116 0.643 -1784. 146. NA
2 2020-03-03 131373646 3003. 0.0116 0.527 -1478. 105. NA
3 2020-03-06 127117475 2972. 0.00813 0.567 -1559. 125. 226.
4 2020-03-09 127117475 2747. 0.00475 0.133 -332. 34.0 1190.
5 2020-03-10 127117475 2882. 0.00588 0.413 -1126. 65.8 -807.
6 2020-03-11 127117475 2741. 0.00622 0.132 -334. 27.4 772.
7 2020-03-12 127117475 2481. 0.00379 0.00368 -8.18 0.95 318.

Convert tibble to xts for analysis with performanceanalytics package

I have a tibble with a date and return column, that looks as follows:
> head(return_series)
# A tibble: 6 x 2
date return
<chr> <dbl>
1 2002-01 0.0292
2 2002-02 0.0439
3 2002-03 0.0240
4 2002-04 0.00585
5 2002-05 -0.0169
6 2002-06 -0.0686
I first add the day to the date column with the following code:
return_series$date <- as.Date(as.yearmon(return_series$date))
# A tibble: 6 x 2
date return
<date> <dbl>
1 2002-01-01 0.0292
2 2002-02-01 0.0439
3 2002-03-01 0.0240
4 2002-04-01 0.00585
5 2002-05-01 -0.0169
6 2002-06-01 -0.0686
My goal is to convert the return_series tibble to xts data to use it for further analysis with the PerformanceAnalytics package. But when I use the command as.xts I receive the following error:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
How can I change the format to xts or is there an other possibility to work with the PerformanceAnalytics package instead of converting to xts?
Thank you very much for your help!
You need to follow the xts documentation more closely:
> tb <- as_tibble(data.frame(date=as.Date("2002-01-01") + (0:5)*30,
+ return=rnorm(6)))
> tb
# A tibble: 6 × 2
date return
<date> <dbl>
1 2002-01-01 0.223
2 2002-01-31 -0.352
3 2002-03-02 0.149
4 2002-04-01 1.42
5 2002-05-01 -1.04
6 2002-05-31 0.507
>
> x <- xts(tb[,-1], order.by=as.POSIXct(tb[[1]]))
> x
return
2001-12-31 18:00:00 0.222619
2002-01-30 18:00:00 -0.352288
2002-03-01 18:00:00 0.149319
2002-03-31 18:00:00 1.421967
2002-04-30 19:00:00 -1.035087
2002-05-30 19:00:00 0.507046
>
An xts object prefers a POSIXct datetime object, which you can convert from a Date object. For a (closely-related) zoo object you could keep Date.

Create Time-Series Model with data every 15minutes

I'm working on a SHM system where I have data every 15 minutes that came from sensors on a structure. I have a set of observations where there is no damage and another where some kind of damage was simulated. My objective is to take the undamaged data and use it to forecast. This forecasted data is then compared to the undamaged one and this difference will then be used to create control charts.
However my undamaged data is of around 5 months and the damaged state is of 8 months. I tried to explore the forecast package using multiple seasonality (msts) of 96 (1 day) and 35060 (1 year) since I believe it has a connection to temperature.
The models that I created that followed some kind of pattern that could resemble reality had a small amplitude, while the real data was much more volatile.
Can someone point me in the right direction as to what to do next and how to do it?
PS: When using the ts function even though I try to make it start at 2018-04-27 14:15:00, when plotting the ts object always starts at 1-1-2018. I think this is more aesthetic than anything but setting it right would be appreciated.
ts and msts objects are not well suited to high frequency data. I suggest you try using tsibble objects via the tsibble package (http://tsibble.tidyverts.org). With tsibble, the time index is explicit. Here is an example using 30 minute data.
library(tsibble)
library(feasts)
library(ggplot2)
tsibbledata::vic_elec
#> # A tsibble: 52,608 x 5 [30m] <UTC>
#> Time Demand Temperature Date Holiday
#> <dttm> <dbl> <dbl> <date> <lgl>
#> 1 2012-01-01 00:00:00 4263. 21.0 2012-01-01 TRUE
#> 2 2012-01-01 00:30:00 4049. 20.7 2012-01-01 TRUE
#> 3 2012-01-01 01:00:00 3878. 20.6 2012-01-01 TRUE
#> 4 2012-01-01 01:30:00 4036. 20.4 2012-01-01 TRUE
#> 5 2012-01-01 02:00:00 3866. 20.2 2012-01-01 TRUE
#> 6 2012-01-01 02:30:00 3694. 20.1 2012-01-01 TRUE
#> 7 2012-01-01 03:00:00 3562. 19.6 2012-01-01 TRUE
#> 8 2012-01-01 03:30:00 3433. 19.1 2012-01-01 TRUE
#> 9 2012-01-01 04:00:00 3359. 19.0 2012-01-01 TRUE
#> 10 2012-01-01 04:30:00 3331. 18.8 2012-01-01 TRUE
#> # … with 52,598 more rows
tsibbledata::vic_elec %>% autoplot(Demand)
Created on 2019-11-27 by the reprex package (v0.3.0)

Make Quarterly Time Series in Julia

In Julia, we can create a Time Array with the following code:
d = [date(1980,1,1):date(2015,1,1)];
t = TimeArray(d,rand(length(d)),["test"])
This would give us daily data. What about getting quarterly or yearly time series?
Merely use the optional step capability of Base.range in combination with the type Datetime.Period
julia> [Date(1980,1,1):Month(3):Date(2015,1,1)]
141-element Array{Date{ISOCalendar},1}:
1980-01-01
1980-04-01
1980-07-01
1980-10-01
1981-01-01
1981-04-01
...
And change the step as necessary
julia> [Date(1980,1,1):Year(1):Date(2015,1,1)]
36-element Array{Date{ISOCalendar},1}:
1980-01-01
1981-01-01
1982-01-01
...
0.3.x vs 0.4.x
In version 0.3.x Dates is available in the package Dates which provides the module Dates, but in version 0.4.x, module Dates is built in. Also (currently) an additional subtle difference is Year and Month must be accessed as Dates.Year and Dates.Month in version 0.4.x.
I know this question is a bit old, but it's worth adding that there is another time series package called Temporal* that has this functionality available.
Here's some example usage:
using Temporal, Base.Dates
date_array = collect(today()-Day(365):Day(1):today())
random_walk = cumsum(randn(length(date_array))) + 100.0
Construct the time series object (type TS). Last argument is for column names, but if not given will autogenerate default column names.
ts_data = TS(random_walk, date_array, :RandomWalk)
# Index RandomWalk
# 2016-08-24 99.8769
# 2016-08-25 99.1643
# 2016-08-26 98.8918
# 2016-08-27 97.7265
# 2016-08-28 97.9675
# 2016-08-29 97.7151
# 2016-08-30 97.0279
# ⋮
# 2017-08-17 81.2998
# 2017-08-18 82.0658
# 2017-08-19 82.1941
# 2017-08-20 81.9021
# 2017-08-21 81.8163
# 2017-08-22 81.5406
# 2017-08-23 81.2229
# 2017-08-24 79.2867
Get the last observation of every quarter (similar logic exists for weeks, months, and years using eow, eom, and eoy respectively):
eoq(ts_data) # get the last observation at every quarter
# 4x1 Temporal.TS{Float64,Date}: 2016-09-30 to 2017-06-30
# Index RandomWalk
# 2016-09-30 88.5629
# 2016-12-31 82.1014
# 2017-03-31 84.9065
# 2017-06-30 92.1997
Can also use functions to aggregate data by the same kinds of periods as given above.
collapse(ts_data, eoq, fun=mean) # get the average value every quarter
# 4x1 Temporal.TS{Float64,Date}: 2016-09-30 to 2017-06-30
# Index RandomWalk
# 2016-09-30 92.5282
# 2016-12-31 86.8291
# 2017-03-31 89.1391
# 2017-06-30 90.3982
* (Disclaimer: I'm the package author.)
Quarterly isn't supported yet, but other time periods such as week, month and year are supported. There is a method called collapse that is used to convert a TimeArray to a larger time frame.
d = [Date(1980,1,1):Date(2015,1,1)];
t = TimeArray(d,rand(length(d)),["test"])
c = collapse(t, last, period=year)
Returns the following
36x1 TimeArray{Float64,1} 1980-12-31 to 2015-01-01
test
1980-12-31 | 0.94
1981-12-31 | 0.37
1982-12-31 | 0.12
1983-12-31 | 0.64
⋮
2012-12-31 | 0.43
2013-12-31 | 0.81
2014-12-31 | 0.88
2015-01-01 | 0.55
Also, note that date has been deprecated in favor of Date as a new updated package now runs the date/time functions underneath.

Resources