I've got a data frame with the following data:
>PRICE
DATE CLOSE
1 20070103 54.700
2 20070104 54.770
3 20070105 55.120
4 20070108 54.870
5 20070109 54.860
6 20070110 54.270
7 20070111 54.770
8 20070112 55.360
9 20070115 55.760
...
As you can see my DATE column represents a date (yyyyMMdd) and my CLOSE column represents prices.
I now have to calculate CalmarRatio, from the PerformanceAnalytics package.
I'm new to R, so i can't understand everything, but from what i have googled to the moment i see that the R parameter to that function needs to be a time-series-like object.
Is there any way i can convert my array to a time-series object given that there might not be data for every date in a period (only for the ones i specify)?
Your DATE column may represent a date, but it is actually either a character, factor, integer, or a numeric vector.
First, you need to convert the DATE column to a Date object. Then you can create an xts object from the CLOSE and DATE columns of your PRICE data.frame. Finally, you can use the xts object to calculate returns and the Calmar ratio.
PRICE <- structure(list(
DATE = c(20070103L, 20070104L, 20070105L, 20070108L, 20070109L,
20070110L, 20070111L, 20070112L, 20070115L),
CLOSE = c(54.7, 54.77, 55.12, 54.87, 54.86, 54.27, 54.77, 55.36, 55.76)),
.Names = c("DATE", "CLOSE"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"))
library(PerformanceAnalytics) # loads/attaches xts
# Convert DATE to Date class
PRICE$DATE <- as.Date(as.character(PRICE$DATE),format="%Y%m%d")
# create xts object
x <- xts(PRICE$CLOSE,PRICE$DATE)
CalmarRatio(Return.calculate(x))
# [,1]
# Calmar Ratio 52.82026
Most people find working with the time series class to be a big pain. You should consider using the zoo class from package zoo. It will not complain about missing times , only about duplicates. The PerformanceAnalytics functions are almost certainly going to be expecting 'zoo' or its descendant class 'xts'.
pricez <- read.zoo(text=" DATE CLOSE
1 20070103 54.700
2 20070104 54.770
3 20070105 55.120
4 20070108 54.870
5 20070109 54.860
6 20070110 54.270
7 20070111 54.770
8 20070112 55.360
9 20070115 55.760
")
index(pricez) <- as.Date(as.character(index(pricez)), format="%Y%m%d")
pricez
2007-01-03 2007-01-04 2007-01-05 2007-01-08 2007-01-09 2007-01-10 2007-01-11 2007-01-12 2007-01-15
54.70 54.77 55.12 54.87 54.86 54.27 54.77 55.36 55.76
An alternative solution is to use the tidyquant package, which allows the functionality of the financial packages, including time series functionality, to be used with data frames. The following examples shows how you can get the Calmar Ratio for multiple assets. The tidyquant vignettes go into more details on how to use the package.
library(tidyquant)
# Get prices
price_tbl <- c("FB", "AMZN", "NFLX", "GOOG") %>%
tq_get(get = "stock.prices",
from = "2010-01-01",
to = "2016-12-31")
price_tbl
#> # A tibble: 6,449 × 8
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 2012-05-18 42.05 45.00 38.00 38.23 573576400 38.23
#> 2 FB 2012-05-21 36.53 36.66 33.00 34.03 168192700 34.03
#> 3 FB 2012-05-22 32.61 33.59 30.94 31.00 101786600 31.00
#> 4 FB 2012-05-23 31.37 32.50 31.36 32.00 73600000 32.00
#> 5 FB 2012-05-24 32.95 33.21 31.77 33.03 50237200 33.03
#> 6 FB 2012-05-25 32.90 32.95 31.11 31.91 37149800 31.91
#> 7 FB 2012-05-29 31.48 31.69 28.65 28.84 78063400 28.84
#> 8 FB 2012-05-30 28.70 29.55 27.86 28.19 57267900 28.19
#> 9 FB 2012-05-31 28.55 29.67 26.83 29.60 111639200 29.60
#> 10 FB 2012-06-01 28.89 29.15 27.39 27.72 41855500 27.72
#> # ... with 6,439 more rows
# Convert to period returns
return_tbl <- price_tbl %>%
group_by(symbol) %>%
tq_transmute(ohlc_fun = Ad,
mutate_fun = periodReturn,
period = "daily")
return_tbl
#> Source: local data frame [6,449 x 3]
#> Groups: symbol [4]
#>
#> symbol date daily.returns
#> <chr> <date> <dbl>
#> 1 FB 2012-05-18 0.00000000
#> 2 FB 2012-05-21 -0.10986139
#> 3 FB 2012-05-22 -0.08903906
#> 4 FB 2012-05-23 0.03225806
#> 5 FB 2012-05-24 0.03218747
#> 6 FB 2012-05-25 -0.03390854
#> 7 FB 2012-05-29 -0.09620809
#> 8 FB 2012-05-30 -0.02253811
#> 9 FB 2012-05-31 0.05001770
#> 10 FB 2012-06-01 -0.06351355
#> # ... with 6,439 more rows
# Calculate performance
return_tbl %>%
tq_performance(Ra = daily.returns,
performance_fun = CalmarRatio)
#> Source: local data frame [4 x 2]
#> Groups: symbol [4]
#>
#> symbol CalmarRatio
#> <chr> <dbl>
#> 1 FB 0.50283172
#> 2 AMZN 0.91504597
#> 3 NFLX 0.14444744
#> 4 GOOG 0.05068483
Whether you want to convert a data frame (or any time series) to a xts or zoo object, as in the answers above, or to any other time series (such as a ts object) the tsbox package makes coercion easy:
PRICE <- structure(list(
DATE = c(20070103L, 20070104L, 20070105L, 20070108L, 20070109L,
20070110L, 20070111L, 20070112L, 20070115L),
CLOSE = c(54.7, 54.77, 55.12, 54.87, 54.86, 54.27, 54.77, 55.36, 55.76)),
.Names = c("DATE", "CLOSE"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"))
library(tsbox)
ts_xts(PRICE)
#> [time]: 'DATE' [value]: 'CLOSE'
#> Loading required namespace: xts
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> CLOSE
#> 2007-01-03 54.70
#> 2007-01-04 54.77
#> 2007-01-05 55.12
#> 2007-01-08 54.87
#> 2007-01-09 54.86
#> 2007-01-10 54.27
#> 2007-01-11 54.77
#> 2007-01-12 55.36
#> 2007-01-15 55.76
ts_ts(PRICE)
#> [time]: 'DATE' [value]: 'CLOSE'
#> Time Series:
#> Start = 2007.00547581401
#> End = 2007.0383306981
#> Frequency = 365.2425
#> [1] 54.70 54.77 55.12 NA NA 54.87 54.86 54.27 54.77 55.36 NA
#> [12] NA 55.76
This answer based on #Joshua_Ulrich's answer creates a time series from the built-in airquality dataset containing "Daily air quality measurements in New York, May to September 1973".
> head(airquality,3)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
Convert Month and Day to a vector of class "Date"
airqualitydate = as.Date(sprintf("1973%02.0f%02.0f", airquality$Month, airquality$Day),
format="%Y%m%d")
Create the time series object
ts_airquality <- xts(airquality, airqualitydate)
head(ts_airquality, 3)
Ozone Solar.R Wind Temp Month Day
1973-05-01 41 190 7.4 67 5 1
1973-05-02 36 118 8.0 72 5 2
1973-05-03 12 149 12.6 74 5 3
Plot to illustrate the different output of the plot.xts() function. (compare to plot(airquality))
plot(ts_airquality$Ozone, main="Ozone (ppb)")
lines(ts_airquality$Temp, on=NA, main="Temperature (degrees F)")
Note, the base R ts() method is mostly suited for quarterly or yearly data.
As explained in an answer to "starting a daily time series in R":
"Time Series Object does not work well with creating daily time series. I will suggest you use the zoo library."
In particular the xts package is an extension to zoo.
Related
I seem to have some trouble converting my data frame data into a time series. I have a typical data set consisting of date, export quantity, GDP, FDI etc.
# A tibble: 252 x 10
Date `Maize Exports (m/t)` `Rainfall (mm)` `Temperature ©` `Exchange rate (R/$)` `Maize price (R)` `FDI (Million R)` GDP (Million~1 Oil p~2 Infla~3
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2000-05-01 00:00:00 21000 30.8 14.4 0.144 678. 4337 9056 192. 5.1
2 2000-06-01 00:00:00 54000 14.9 14.0 0.147 583. -4229 9056 205. 5.1
3 2000-07-01 00:00:00 134000 11.1 12.6 0.144 518. -4229 8841 196. 5.9
4 2000-08-01 00:00:00 213000 6.1 15.3 0.143 526. -4229 8841 205. 6.8
5 2000-09-01 00:00:00 123000 38.5 17.8 0.138 576. 6315 8841 234. 6.8
6 2000-10-01 00:00:00 94000 61.9 20.1 0.132 636. 6315 4487 231. 7.1
7 2000-11-01 00:00:00 192000 93.9 19.9 0.129 685. 6315 4487 250. 7.1
8 2000-12-01 00:00:00 134000 85.6 22.3 0.132 747. -2143 4487 192. 7
9 2001-01-01 00:00:00 133000 92.4 23.4 0.0875 1066. -5651 7365 226. 5
10 2001-02-01 00:00:00 168000 51 22.0 0.0879 1042. -5651 7365 233. 5.9
I've installed the right packages (readxl), I've used the as.Date function so ensure my Date is recognized as such, and I've used the as.ts function to convert the dataset. However, after using the as.ts function, the date column is all muddled up into a random number and not a date anymore. What am I doing wrong? Please help!
Date Maize Exports (m/t) Rainfall (mm) Temperature © Exchange rate (R/$) Maize price (R) FDI (Million R) GDP (Million R) Oil prices (R/barrel)
[1,] 957139200 21000 30.8 14.36 0.1435235 677.88 4337 9056 192.35
[2,] 959817600 54000 14.9 13.96 0.1474926 583.48 -4229 9056 205.36
[3,] 962409600 134000 11.1 12.61 0.1437298 518.10 -4229 8841 196.38
[4,] 965088000 213000 6.1 15.27 0.1433075 525.59 -4229 8841 204.66
[5,] 967766400 123000 38.5 17.83 0.1382170 576.08 6315 8841 233.64
[6,] 970358400 94000 61.9 20.10 0.1322751 635.79 6315 4487 231.27
In short nothing is wrong - and while this response should really be a comment, I wanted to use a full answer to have a bit more space to explain.
Behind each date is a numeric value tethered to an origin, so this is just R's way of handling it. And since you imported from excel originally, those origins may not line up if you tried to cross check it (see below).
You didn't make your question reproducible, but I put some similar data together to demonstrate what's going on:
Data
df <- data.frame(date = as.Date(c("2000-05-01",
"2000-06-01",
"2000-07-01",
"2000-08-01",
"2000-09-01",
"2000-10-01",
"2000-11-01")),
maize = c(21, 54, 132, 213, 123, 94, 192) * 1000,
rainfall = c(30, 14, 11, 6, 38, 61, 93))
tb <- tidyr::as_tibble(df)
Turning this into a time series object using as.ts()
tb_ts <- as.ts(tb)
# Time Series:
# Start = 1
# End = 7
# Frequency = 1
# date maize rainfall
# 1 11078 21000 30
# 2 11109 54000 14
# 3 11139 132000 11
# 4 11170 213000 6
# 5 11201 123000 38
# 6 11231 94000 61
# 7 11262 192000 93
Since I created these data in R, the "origin" is January 1, 1970, and we can see this in numerical dates from the time series object and convert them back into date formats:
as.Date(tb_ts[1:7], origin = '1970-01-01')
# [1] "2000-05-01" "2000-06-01" "2000-07-01" "2000-08-01"
# [5] "2000-09-01" "2000-10-01" "2000-11-01"
Note that if you import data from Excel, Excel's origin is December 30th, 1899 (i.e., as.Date(xx, origin = "1899-12-30")), so if you tried that you get the wrong dates:
as.Date(tb_ts[1:7], origin = "1899-12-30")
# [1] "1930-04-30" "1930-05-31" "1930-06-30" "1930-07-31"
# [5] "1930-08-31" "1930-09-30" "1930-10-31
The function worked as it's supposed to. Keeping the date format you're familiar with isn't practical for execution, so it converts the dates to a different value, usually something like the number of days (or minutes or seconds) since a certain year, usually Jan. 1 1970. For example, here is a little set to make the point:
# a test vector of dates
> del1 <- seq(as.Date("2012-04-01"), length.out=4, by=30)
# looks like
> del1
[1] "2012-04-01" "2012-05-01" "2012-05-31" "2012-06-30"
# use the as.ts
> as.ts(del1)
Time Series:
Start = 1
End = 4
Frequency = 1
[1] 15431 15461 15491 15521
So you can see the dates, which are 30 days apart, are converted to a series of values that are 30 integers apart.
I have a tibble with a date and return column, that looks as follows:
> head(return_series)
# A tibble: 6 x 2
date return
<chr> <dbl>
1 2002-01 0.0292
2 2002-02 0.0439
3 2002-03 0.0240
4 2002-04 0.00585
5 2002-05 -0.0169
6 2002-06 -0.0686
I first add the day to the date column with the following code:
return_series$date <- as.Date(as.yearmon(return_series$date))
# A tibble: 6 x 2
date return
<date> <dbl>
1 2002-01-01 0.0292
2 2002-02-01 0.0439
3 2002-03-01 0.0240
4 2002-04-01 0.00585
5 2002-05-01 -0.0169
6 2002-06-01 -0.0686
My goal is to convert the return_series tibble to xts data to use it for further analysis with the PerformanceAnalytics package. But when I use the command as.xts I receive the following error:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
How can I change the format to xts or is there an other possibility to work with the PerformanceAnalytics package instead of converting to xts?
Thank you very much for your help!
You need to follow the xts documentation more closely:
> tb <- as_tibble(data.frame(date=as.Date("2002-01-01") + (0:5)*30,
+ return=rnorm(6)))
> tb
# A tibble: 6 × 2
date return
<date> <dbl>
1 2002-01-01 0.223
2 2002-01-31 -0.352
3 2002-03-02 0.149
4 2002-04-01 1.42
5 2002-05-01 -1.04
6 2002-05-31 0.507
>
> x <- xts(tb[,-1], order.by=as.POSIXct(tb[[1]]))
> x
return
2001-12-31 18:00:00 0.222619
2002-01-30 18:00:00 -0.352288
2002-03-01 18:00:00 0.149319
2002-03-31 18:00:00 1.421967
2002-04-30 19:00:00 -1.035087
2002-05-30 19:00:00 0.507046
>
An xts object prefers a POSIXct datetime object, which you can convert from a Date object. For a (closely-related) zoo object you could keep Date.
I'm working with a dataset that contains daily data of water flow. The data goes from 1-10-1998 to 30-03-2020 and looks like this:
Date QA
1998-10-01 315
1998-10-02 245
1998-10-03 179
1998-10-04 186
1998-10-05 262
1998-10-06 199
1998-10-07 319
(...)
The class(Date) is "Date" and the class(QA) is "numeric".
My goal is to turn this daily data into hourly data. For this I used the function 'td' from the package 'tempdisagg' of R:
library(tempdisagg)
td(QA~1,to="hour",method="denton-cholette")
My problem is in the definition of QA as a time series variable. When I define it as 'ts' and apply the function to disaggregate the data, the following error appears:
QA_ts <- ts(QA, start = decimal_date(as.Date("1998-10-01")), frequency = 365)
td(QA_ts ~ 1, to = "hour",method="denton-cholette")
Error in td(QA_ts ~ 1, to = "hour",method="denton-cholette") :
use a time series class other than 'ts' to deal with 'hour'
And when I define QA as another format such as "xts" or "msts" I get the following error:
newQA <- xts(QA,Date)
td(newQA ~1, to="hour",method="denton-cholette")
Error in seq.Date(lf[1], lf.end, by = to) : 'to' must be a "Date" object
I think I'm doing something wrong when defining QA as time series but I can't solve this issue.
Can anybody help me out?
thanks,
Date needs to be of class POSIXct, rather than Date, to convert to hourly frequency. Here is a reproducible example:
x <- structure(list(time = structure(c(10227, 10258, 10286, 10317,
10347, 10378, 10408), class = "Date"), value = c(315, 245, 179,
186, 262, 199, 319)), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
Disaggratate to days:
library(tempdisagg)
m0 <- td(x ~ 1, to = "day", method = "fast")
#> Loading required namespace: tsbox
predict(m0)
#> # A tibble: 212 x 2
#> time value
#> <date> <dbl>
#> 1 1998-01-01 10.4
#> 2 1998-01-02 10.3
#> 3 1998-01-03 10.3
#> 4 1998-01-04 10.3
#> 5 1998-01-05 10.3
#> 6 1998-01-06 10.3
#> 7 1998-01-07 10.3
#> 8 1998-01-08 10.3
#> 9 1998-01-09 10.3
#> 10 1998-01-10 10.3
#> # … with 202 more rows
If you want to disaggregate to hours, time need to be POSIXct:
x$time <- as.POSIXct(x$time)
m1 <- td(x ~ 1, to = "hour", method = "fast")
predict(m1)
#> # A tibble: 5,087 x 2
#> time value
#> <dttm> <dbl>
#> 1 1998-01-01 01:00:00 0.431
#> 2 1998-01-01 02:00:00 0.431
#> 3 1998-01-01 03:00:00 0.431
#> 4 1998-01-01 04:00:00 0.431
#> 5 1998-01-01 05:00:00 0.431
#> 6 1998-01-01 06:00:00 0.431
#> 7 1998-01-01 07:00:00 0.431
#> 8 1998-01-01 08:00:00 0.431
#> 9 1998-01-01 09:00:00 0.431
#> 10 1998-01-01 10:00:00 0.431
#> # … with 5,077 more rows
Here is a slightly more complex example for hourly disaggregation.
This post explains conversion to high-frequency in more detail.
I do time series decomposition and I want to save the resulting objects in a dataframe. It works if I store the results in a object and use it to make the dataframe afterwards:
# needed packages
library(tidyverse)
library(forecast)
# some "time series"
vec <- 1:1000 + rnorm(1000)
# store pipe results
pipe_out <-
# do decomposition
decompose(msts(vec, start= c(2001, 1, 1), seasonal.periods= c(7, 365.25))) %>%
# relevant data
.$seasonal
# make a dataframe with the stored seasonal data
data.frame(ts= pipe_out)
But doing the same as a one-liner fails:
decompose(msts(vec, start= c(2001, 1, 1), seasonal.periods= c(7, 365.25))) %>%
data.frame(ts= .$seasonal)
I get the error
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ‘"decomposed.ts"’ to a data.frame
I thought that the pipe simply moves forward the things that came up in the last step which saves us storing those things in objects. If so, shouldn't both codes result in the very same output?
EDIT (from comments)
The first code works but it is a bad solution because if one wants to extract all the vectors of the decomposed time series one would need to do it in multiple steps. Something like the following would be better:
decompose(msts(vec, start= c(2001, 1, 1),
seasonal.periods= c(7, 365.25))) %>%
data.frame(seasonal= .$seasonal, x=.$x, trend=.$trend, random=.$random)
It's unclear from your example whether you want to extract $x or $seasonal. Either way, you can extract part of a list either with the `[[`() function in base or the alias extract2() in magrittr, as you prefer. You should then use the . when you create a data.frame in the last step.
Cleaning up the code a bit to be consistent with the piping, the following works:
library(magrittr)
library(tidyverse)
library(forecast)
vec <- 1:1000 + rnorm(1000)
vec %>%
msts(start = c(2001, 1, 1), seasonal.periods= c(7, 365.25)) %>%
decompose %>%
`[[`("seasonal") %>%
# extract2("seasonal") %>% # Another option, uncomment if preferred
data.frame(ts = .) %>%
head # Just for the reprex, remove as required
#> ts
#> 1 -1.17332998
#> 2 0.07393265
#> 3 0.37631946
#> 4 0.30640395
#> 5 1.04279779
#> 6 0.20470768
Created on 2019-11-28 by the reprex package (v0.3.0)
Edit based on comment:
To do what you mention in the comments, you need to use curly brackets (see e.g. here for an explanation why). Hence, the following works:
library(magrittr)
library(tidyverse)
library(forecast)
vec <- 1:1000 + rnorm(1000)
vec %>%
msts(start= c(2001, 1, 1), seasonal.periods = c(7, 365.25)) %>%
decompose %>%
{data.frame(seasonal = .$seasonal,
trend = .$trend)} %>%
head
#> seasonal trend
#> 1 -0.4332034 NA
#> 2 -0.6185832 NA
#> 3 -0.5899566 NA
#> 4 0.7640938 NA
#> 5 -0.4374417 NA
#> 6 -0.8739449 NA
However, for your specific use case, it may be clearer and easier to use magrittr::extract and then simply bind_cols:
vec %>%
msts(start= c(2001, 1, 1), seasonal.periods = c(7, 365.25)) %>%
decompose %>%
magrittr::extract(c("seasonal", "trend")) %>%
bind_cols %>%
head
#> # A tibble: 6 x 2
#> seasonal trend
#> <dbl> <dbl>
#> 1 -0.433 NA
#> 2 -0.619 NA
#> 3 -0.590 NA
#> 4 0.764 NA
#> 5 -0.437 NA
#> 6 -0.874 NA
Created on 2019-11-29 by the reprex package (v0.3.0)
With daily data, decompose() does not work well because it will only handle the annual seasonality and will give relatively poor estimates of it. If the data involve human behaviour, it will probably have both weekly and annual seasonal patterns.
Also, msts objects are not great for daily data either because they don't store the dates explicitly.
I suggest you use tsibble objects with an STL decomposition instead. Here is an example using your data.
library(tidyverse)
library(tsibble)
library(feasts)
mydata <- tsibble(
day = as.Date(seq(as.Date("2001-01-01"), length=1000, by=1)),
vec = 1:1000 + rnorm(1000)
)
#> Using `day` as index variable.
mydata
#> # A tsibble: 1,000 x 2 [1D]
#> day vec
#> <date> <dbl>
#> 1 2001-01-01 0.161
#> 2 2001-01-02 2.61
#> 3 2001-01-03 1.37
#> 4 2001-01-04 3.15
#> 5 2001-01-05 4.43
#> 6 2001-01-06 7.35
#> 7 2001-01-07 7.10
#> 8 2001-01-08 10.0
#> 9 2001-01-09 9.16
#> 10 2001-01-10 10.2
#> # … with 990 more rows
# Compute a decomposition
mydata %>% STL(vec)
#> # A dable: 1,000 x 7 [1D]
#> # STL Decomposition: vec = trend + season_year + season_week + remainder
#> day vec trend season_year season_week remainder season_adjust
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2001-01-01 0.161 14.7 -14.6 0.295 -0.193 14.5
#> 2 2001-01-02 2.61 15.6 -14.2 0.0865 1.04 16.7
#> 3 2001-01-03 1.37 16.6 -15.5 0.0365 0.240 16.9
#> 4 2001-01-04 3.15 17.6 -13.0 -0.0680 -1.34 16.3
#> 5 2001-01-05 4.43 18.6 -13.4 -0.0361 -0.700 17.9
#> 6 2001-01-06 7.35 19.5 -12.4 -0.122 0.358 19.9
#> 7 2001-01-07 7.10 20.5 -13.4 -0.181 0.170 20.7
#> 8 2001-01-08 10.0 21.4 -12.7 0.282 1.10 22.5
#> 9 2001-01-09 9.16 22.2 -13.8 0.0773 0.642 22.9
#> 10 2001-01-10 10.2 22.9 -12.7 0.0323 -0.0492 22.9
#> # … with 990 more rows
Created on 2019-11-30 by the reprex package (v0.3.0)
The output is a dable (decomposition table) which behaves like a dataframe most of the time. So you can extract the trend column, or either of the seasonal component columns in the usual way.
I want to use the Prophet() function in R, but I cannot transform my column "YearWeek" to a as.Date() column.
I have a column "YearWeek" that stores values from 201401 up to 201937 i.e. starting in 2014 week 1 up to 2019 week 37.
I don't know how to declare this column as a date in the form yyyy-ww needed to use the Prophet() function.
Does anyone know how to do this?
Thank you in advance.
One solution could be to append a 01 to the end of your yyyy-ww formatted dates.
Data:
library(tidyverse)
df <- cross2(2014:2019, str_pad(1:52, width = 2, pad = 0)) %>%
map_df(set_names, c("year", "week")) %>%
transmute(date = paste(year, week, sep = "")) %>%
arrange(date)
head(df)
#> # A tibble: 6 x 1
#> date
#> <chr>
#> 1 201401
#> 2 201402
#> 3 201403
#> 4 201404
#> 5 201405
#> 6 201406
Now let's append the 01 and convert to date:
df %>%
mutate(date = paste(date, "01", sep = ""),
new_date = as.Date(date, "%Y%U%w"))
#> # A tibble: 312 x 2
#> date new_date
#> <chr> <date>
#> 1 20140101 2014-01-05
#> 2 20140201 2014-01-12
#> 3 20140301 2014-01-19
#> 4 20140401 2014-01-26
#> 5 20140501 2014-02-02
#> 6 20140601 2014-02-09
#> 7 20140701 2014-02-16
#> 8 20140801 2014-02-23
#> 9 20140901 2014-03-02
#> 10 20141001 2014-03-09
#> # ... with 302 more rows
Created on 2019-10-10 by the reprex package (v0.3.0)
More info about a numeric week of the year can be found here.