Convert tibble to xts for analysis with performanceanalytics package - r

I have a tibble with a date and return column, that looks as follows:
> head(return_series)
# A tibble: 6 x 2
date return
<chr> <dbl>
1 2002-01 0.0292
2 2002-02 0.0439
3 2002-03 0.0240
4 2002-04 0.00585
5 2002-05 -0.0169
6 2002-06 -0.0686
I first add the day to the date column with the following code:
return_series$date <- as.Date(as.yearmon(return_series$date))
# A tibble: 6 x 2
date return
<date> <dbl>
1 2002-01-01 0.0292
2 2002-02-01 0.0439
3 2002-03-01 0.0240
4 2002-04-01 0.00585
5 2002-05-01 -0.0169
6 2002-06-01 -0.0686
My goal is to convert the return_series tibble to xts data to use it for further analysis with the PerformanceAnalytics package. But when I use the command as.xts I receive the following error:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
How can I change the format to xts or is there an other possibility to work with the PerformanceAnalytics package instead of converting to xts?
Thank you very much for your help!

You need to follow the xts documentation more closely:
> tb <- as_tibble(data.frame(date=as.Date("2002-01-01") + (0:5)*30,
+ return=rnorm(6)))
> tb
# A tibble: 6 × 2
date return
<date> <dbl>
1 2002-01-01 0.223
2 2002-01-31 -0.352
3 2002-03-02 0.149
4 2002-04-01 1.42
5 2002-05-01 -1.04
6 2002-05-31 0.507
>
> x <- xts(tb[,-1], order.by=as.POSIXct(tb[[1]]))
> x
return
2001-12-31 18:00:00 0.222619
2002-01-30 18:00:00 -0.352288
2002-03-01 18:00:00 0.149319
2002-03-31 18:00:00 1.421967
2002-04-30 19:00:00 -1.035087
2002-05-30 19:00:00 0.507046
>
An xts object prefers a POSIXct datetime object, which you can convert from a Date object. For a (closely-related) zoo object you could keep Date.

Related

Filter interval dates in R with dplyr

I have a dataset with dates in tibble format from tidyverse/dplyr.
library(tidyverse)
A = seq(from = as.Date("2019/1/1"),to=as.Date("2022/1/1"), length.out = 252*3)
length(A)
x = rnorm(252*3)
d = tibble(A,x);d
Resulting to :
# A tibble: 756 x 2
A x
<date> <dbl>
1 2019-01-01 1.43
2 2019-01-02 0.899
3 2019-01-03 0.658
4 2019-01-05 -0.0720
5 2019-01-06 -1.99
6 2019-01-08 -0.743
7 2019-01-09 0.426
8 2019-01-11 0.00675
9 2019-01-12 0.967
10 2019-01-14 -0.606
# ... with 746 more rows
i also have a date of interest, say:
start = as.Date("2021/12/15");start
I want to subset the dataset from this specific date (start) and one year back. But the year has 252 observations.
i tried :
d%>%
dplyr::filter(A<start)%>%
dplyr::slice_tail(n=252)
but i don't like it because my real dataset has more than one factor label and if i use this then i will have 252 observations.
i also tried :
LAST_YEAR = DATE-365
d%>%
dplyr::filter(Date <= DATE & Date >=LAST_YEAR)
which works but i want to use the 252.Imagine that i want to find 2 years (252*2) back how many observations i have on this specific time interval.
Any help how i can do that?

R convert "Y-m-d" or "m/d/Y" to the same format

I have a huge (~10.000.000 rows) dataframe with a column that consists dates, i.e:
df <- data.frame(StartDate = as.character(c("2014-08-20 11:59:38",
"2014-08-21 16:17:44",
"2014-08-22 19:02:10",
"9/1/2014 08:05:13",
"9/2/2014 15:13:28",
"9/3/2014 00:22:01")))
The problem is that date formats are mixed - I would like to standardise them so as to get:
StartDate
1 2014-08-20
2 2014-08-21
3 2014-08-22
4 2014-09-01
5 2014-09-02
6 2014-09-03
1. as.Date() approach
as.Date("2014-08-31 23:59:38", "%m/%d/%Y")
as.Date("9/1/2014 00:00:28", "%m/%d/%Y")
gives
[1] NA
[1] "2014-09-01"
2. lubridate approach
dmy("9/1/2014 00:00:28")
mdy("9/1/2014 00:00:28")
dmy("2014-08-31 23:59:38")
mdy("2014-08-31 23:59:38")
in each case returns
[1] NA
Warning message:
All formats failed to parse. No formats found.
Is there any neat solution to that?
Easier maybe to use parse_date
library(parsedate)
df$StartDate <- as.Date(parse_date(df$StartDate))
-output
> df$StartDate
[1] "2014-08-20" "2014-08-21" "2014-08-22" "2014-09-01" "2014-09-02" "2014-09-03"
I have just found out that anytime::anydate extracts the dates directly and straightforwardly:
library(anytime)
library(tidyverse)
df %>%
mutate(Date = anydate(StartDate))
#> StartDate Date
#> 1 2014-08-20 11:59:38 2014-08-20
#> 2 2014-08-21 16:17:44 2014-08-21
#> 3 2014-08-22 19:02:10 2014-08-22
#> 4 9/1/2014 08:05:13 2014-09-01
#> 5 9/2/2014 15:13:28 2014-09-02
#> 6 9/3/2014 00:22:01 2014-09-03
Another solution, based on lubridate:
library(tidyverse)
library(lubridate)
df %>%
mutate(Date = if_else(!str_detect(StartDate,"/"),
date(ymd_hms(StartDate, quiet = T)), date(mdy_hms(StartDate, quiet = T))))
#> StartDate Date
#> 1 2014-08-20 11:59:38 2014-08-20
#> 2 2014-08-21 16:17:44 2014-08-21
#> 3 2014-08-22 19:02:10 2014-08-22
#> 4 9/1/2014 08:05:13 2014-09-01
#> 5 9/2/2014 15:13:28 2014-09-02
#> 6 9/3/2014 00:22:01 2014-09-03

Disaggregate daily time series into hourly values using R

I'm working with a dataset that contains daily data of water flow. The data goes from 1-10-1998 to 30-03-2020 and looks like this:
Date QA
1998-10-01 315
1998-10-02 245
1998-10-03 179
1998-10-04 186
1998-10-05 262
1998-10-06 199
1998-10-07 319
(...)
The class(Date) is "Date" and the class(QA) is "numeric".
My goal is to turn this daily data into hourly data. For this I used the function 'td' from the package 'tempdisagg' of R:
library(tempdisagg)
td(QA~1,to="hour",method="denton-cholette")
My problem is in the definition of QA as a time series variable. When I define it as 'ts' and apply the function to disaggregate the data, the following error appears:
QA_ts <- ts(QA, start = decimal_date(as.Date("1998-10-01")), frequency = 365)
td(QA_ts ~ 1, to = "hour",method="denton-cholette")
Error in td(QA_ts ~ 1, to = "hour",method="denton-cholette") :
use a time series class other than 'ts' to deal with 'hour'
And when I define QA as another format such as "xts" or "msts" I get the following error:
newQA <- xts(QA,Date)
td(newQA ~1, to="hour",method="denton-cholette")
Error in seq.Date(lf[1], lf.end, by = to) : 'to' must be a "Date" object
I think I'm doing something wrong when defining QA as time series but I can't solve this issue.
Can anybody help me out?
thanks,
Date needs to be of class POSIXct, rather than Date, to convert to hourly frequency. Here is a reproducible example:
x <- structure(list(time = structure(c(10227, 10258, 10286, 10317,
10347, 10378, 10408), class = "Date"), value = c(315, 245, 179,
186, 262, 199, 319)), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
Disaggratate to days:
library(tempdisagg)
m0 <- td(x ~ 1, to = "day", method = "fast")
#> Loading required namespace: tsbox
predict(m0)
#> # A tibble: 212 x 2
#> time value
#> <date> <dbl>
#> 1 1998-01-01 10.4
#> 2 1998-01-02 10.3
#> 3 1998-01-03 10.3
#> 4 1998-01-04 10.3
#> 5 1998-01-05 10.3
#> 6 1998-01-06 10.3
#> 7 1998-01-07 10.3
#> 8 1998-01-08 10.3
#> 9 1998-01-09 10.3
#> 10 1998-01-10 10.3
#> # … with 202 more rows
If you want to disaggregate to hours, time need to be POSIXct:
x$time <- as.POSIXct(x$time)
m1 <- td(x ~ 1, to = "hour", method = "fast")
predict(m1)
#> # A tibble: 5,087 x 2
#> time value
#> <dttm> <dbl>
#> 1 1998-01-01 01:00:00 0.431
#> 2 1998-01-01 02:00:00 0.431
#> 3 1998-01-01 03:00:00 0.431
#> 4 1998-01-01 04:00:00 0.431
#> 5 1998-01-01 05:00:00 0.431
#> 6 1998-01-01 06:00:00 0.431
#> 7 1998-01-01 07:00:00 0.431
#> 8 1998-01-01 08:00:00 0.431
#> 9 1998-01-01 09:00:00 0.431
#> 10 1998-01-01 10:00:00 0.431
#> # … with 5,077 more rows
Here is a slightly more complex example for hourly disaggregation.
This post explains conversion to high-frequency in more detail.

How to convert monthly time-series in R

I am working on a monthly-based time-series data set:
> head(data, n=10)
# A tibble: 10 x 2
Month Inflation
<dttm> <dbl>
1 1979-01-01 00:00:00 0.0258
2 1979-02-01 00:00:00 0.0234
3 1979-03-01 00:00:00 0.0055
4 1979-04-01 00:00:00 0.0302
5 1979-05-01 00:00:00 0.0305
6 1979-06-01 00:00:00 0.0232
7 1979-07-01 00:00:00 0.025
8 1979-08-01 00:00:00 0.0234
9 1979-09-01 00:00:00 0.0074
10 1979-10-01 00:00:00 0.0089
Although it appears that the data is yet to be recognized as a time-series data as it shows the following structure:
> str(data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 479 obs. of 2 variables:
$ Month : POSIXct, format: "1979-01-01" "1979-02-01" "1979-03-01" "1979-04-01" ...
$ Inflation: num 0.0258 0.0234 0.0055 0.0302 0.0305 0.0232 0.025 0.0234 0.0074 0.0089 ...
When I tried to convert it using xts function, it gave me this error:
> inflation <- xts(data[,-1], order.by=as.Date(data[,1], "%m/%d/%Y"))
Error in as.Date.default(data[, 1], "%m/%d/%Y") :
do not know how to convert 'data[, 1]' to class “Date”
Please help me with the most appropriate way of data conversion.
Thanks
# You have something like:
data <- data.frame(
Month = as.Date(as.Date("1979-01-01"):as.Date("2000-01-01"), origin="1970-01-01"),
Inflation = rnorm(7671)) # same number of obs
Create TS
choose start and end dates appropriatelly
tseries <- ts(data$Inflation, start = c(1979,1), end = c(2000,1), frequency = 12)
plot(tseries)

identify date format in R before converting

I have a simple data set which has a date column and a value column. I noticed that the date sometimes comes in as mmddyy (%m/%d/%y) format and other times in mmddYYYY (%m/%d/%Y) format. What is the best way to standardize the dates so that i can do other calculations without this formatting causing issues?
I tried the answers provided here
Changing date format in R
and here
How to change multiple Date formats in same column
Neither of these were able to fix the problem.
Below is a sample of the data
Date, Market
12/17/09,1.703
12/18/09,1.700
12/21/09,1.700
12/22/09,1.590
12/23/2009,1.568
12/24/2009,1.520
12/28/2009,1.500
12/29/2009,1.450
12/30/2009,1.450
12/31/2009,1.450
1/4/2010,1.440
When i read it into a new vector using something like this
dt <- as.Date(inp$Date, format="%m/%d/%y")
I get the following output for the above segment
dt Market
2009-12-17 1.703
2009-12-18 1.700
2009-12-21 1.700
2009-12-22 1.590
2020-12-23 1.568
2020-12-24 1.520
2020-12-28 1.500
2020-12-29 1.450
2020-12-30 1.450
2020-12-31 1.450
2020-01-04 1.440
As you can see we skipped from 2009 to 2020 at 12/23 because of change in formatting. Any help is appreciated. Thanks.
> dat$Date <- gsub("[0-9]{2}([0-9]{2})$", "\\1", dat$Date)
> dat$Date <- as.Date(dat$Date, format = "%m/%d/%y")
> dat
Date Market
# 1 2009-12-17 1.703
# 2 2009-12-18 1.700
# 3 2009-12-21 1.700
# 4 2009-12-22 1.590
# 5 2009-12-23 1.568
# 6 2009-12-24 1.520
# 7 2009-12-28 1.500
# 8 2009-12-29 1.450
# 9 2009-12-30 1.450
# 10 2009-12-31 1.450
# 11 2010-01-04 1.440

Resources