Firstly, apologies for what is probably a very easy question. I have been following an example to plot STL and have come up with a nice line chart. I would like to extract the data points so I can use them in Tableau in this format:
(sorry, having trouble getting tables to display)
My time series is generated from a count in the same format as the table above, so I assume it is quite simple to stitch it back together, but I am not very experienced with data manipulation in R yet. I am happy with the actual seasonal plot, it's just the matter of tying it all back up into something I can use.
I cannot provide my data, but I can provide the following from a tutorial which does the same thing:
library(xts)
## load co2 data set
load(url("https://userpage.fu-berlin.de/soga/300/30100_data_sets/KeelingCurve.Rdata"))
library(lubridate)
start <- c(year(xts::first(co2)), month(xts::first(co2)))
start
end <- c(year(xts::last(co2)), month(xts::last(co2)))
end
# creation of a ts object
co2 <- ts(data = as.vector(coredata(co2)),
start = start,
end = end, frequency = 12)
# set up stl function
fit <- stl(co2, s.window = "periodic")
I am able to extract the list of y-axis values using:
seasonal_stl <- fit$time.series[,1]
What I would like to do is reconstruct that into a table of Month, Year and the seasonal value. Can anyone suggest how to do that? Many thanks in advance.
You can use the tsibble package to convert the ts object into a data frame in the form you want.
ts(fit$time.series, start=start, frequency=12) |>
tsibble::as_tsibble() |>
tidyr::pivot_wider(names_from = "key", values_from = "value") |>
tibble::as_tibble()
But you might find it easier to use the tsibble and feasts packages from the start, like this.
library(tsibble)
library(feasts)
library(lubridate)
## load co2 data set
load(url("https://userpage.fu-berlin.de/soga/300/30100_data_sets/KeelingCurve.Rdata"))
start <- c(year(xts::first(co2)), month(xts::first(co2)))
# creation of a tsibble object
co2 <- ts(co2, start=start, frequency=12) |>
as_tsibble()
# Fit STL
fit <- co2 |>
model(stl = STL(value ~ season(window = "periodic")))
# Extract components
components(fit)
#> # A dable: 711 x 7 [1M]
#> # Key: .model [1]
#> # : value = trend + season_year + remainder
#> .model index value trend season_year remainder season_adjust
#> <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 stl 1958 Mar 316. 315. 1.46 -0.551 314.
#> 2 stl 1958 Apr 317. 315. 2.59 -0.0506 315.
#> 3 stl 1958 May 318. 315. 3.00 -0.514 315.
#> 4 stl 1958 Jun 317. 315. 2.28 -0.286 315.
#> 5 stl 1958 Jul 316. 315. 0.668 -0.00184 315.
#> 6 stl 1958 Aug 315. 315. -1.48 1.13 316.
#> 7 stl 1958 Sep 313. 315. -3.16 1.01 316.
#> 8 stl 1958 Oct 313. 315. -3.25 0.468 316.
#> 9 stl 1958 Nov 313. 316. -2.05 -0.148 315.
#> 10 stl 1958 Dec 315. 316. -0.860 -0.0377 316.
#> # … with 701 more rows
Created on 2023-01-26 with reprex v2.0.2
Related
I have a very long dataset of numerous stocks for many years, similar to this one:
one_ticker = tq_get("AAPL", from = "2021-06-01")
one_ticker <- one_ticker %>%
mutate(day = day(date),
month = month(date),
year = year(date))
symbol date open high low close volume adjusted day month year
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
1 AAPL 2021-06-01 125. 125. 124. 124. 67637100 124. 1 6 2021
2 AAPL 2021-06-02 124. 125. 124. 125. 59278900 125. 2 6 2021
3 AAPL 2021-06-03 125. 125. 123. 124. 76229200 123. 3 6 2021
4 AAPL 2021-06-04 124. 126. 124. 126. 75169300 126. 4 6 2021
5 AAPL 2021-06-07 126. 126. 125. 126. 71057600 126. 7 6 2021
6 AAPL 2021-06-08 127. 128. 126. 127. 74403800 126. 8 6 2021
7 AAPL 2021-06-09 127. 128. 127. 127. 56877900 127. 9 6 2021
8 AAPL 2021-06-10 127. 128. 126. 126. 71186400 126. 10 6 2021
9 AAPL 2021-06-11 127. 127. 126. 127. 53522400 127. 11 6 2021
10 AAPL 2021-06-14 128. 131. 127. 130. 96906500 130. 14 6 2021
I want first to calculate the biWeekly adjusted price return within each month:
-first biWeekly interval: days 1-15
-second biWeekly interval: days 16-30
Calculate the adjusted returns standard deviation within each quarter.
Here are the results (for Apple last 6 months):
1. Adjusted_biWeekly_Returns
[1] 0.043128324
[2] 0.052324355
[3] 0.081663817
[4] -0.003620508
[5] 0.026136504
[6] 0.004698278
[7] -0.022818187
[8] -0.048995111
[9] 0.0153523
[10] 0.022176775
Explanations:
[1] 129.257401/123.913231-1 = 0.043128324
(15/06/2021 adjusted price// 01/06/2021 adjusted price)
[5] 148.882721/145.090561-1 = 0.026136504
(13/08/2021 & 02/08/2021) - because there was no trading on the 15th and the 1st.
2. Quarterly Standard Deviation:
1/06/2021 - 1/09/2021 0.028944365 ([1]-[6] standard deviation)
1/09/2021 - 1/01/2022 Not available yet.
How can I calculate it in R?
*there is the tq_transmute function which is very useful for weekly but not biWeekly calculations
You could do each step separately and make use of the to.weekly and apply.quarterly functions as done in the code below:
library(tidyverse)
library(tidyquant)
library(xts)
library(PerformanceAnalytics)
library(TTR)
one_ticker = tq_get("AAPL", from = "2021-06-01")
one_ticker <- one_ticker %>%
mutate(day = day(date),
month = month(date),
year = year(date))
aapl_adj <- xts(x = one_ticker$adjusted,
order.by = one_ticker$date)
aapl_adj_weekly <- to.weekly(aapl_adj, OHLC = F) # convert to weekly
idx <- seq_along(aapl_adj_weekly) %% 2 > 0 # create index for bi-weekly dates
aapl_adj_biweekly <- aapl_adj_weekly[idx, ] # extract bi-weekly values
aapl_adj_biweekly_returns <- TTR::ROC(aapl_adj_biweekly, type = "discrete", na.pad = F)
aapl_adj_biweekly_returns
# compute quarterly standard deviation
xts::apply.quarterly(aapl_adj_biweekly_returns, sd)
e1
2021-06-18 NA
2021-09-24 0.03159961
2021-11-16 0.02900001
If you don't need to downsample to a biweekly frequency, you could just run this in one go for each ticker. This has the advantage of better estimates of the return standard deviation too, since you use all the available data points instead of only biweekly data:
# fast version without downsampling and annualized standard deviation
aapl_adj |> TTR::ROC(type = "discrete", na.pad = F) |> xts::apply.quarterly(FUN = PerformanceAnalytics::sd.annualized)
e1
2021-06-30 0.1551537
2021-09-30 0.2063587
2021-11-16 0.1701798
I'm still getting used to working in R and thought constructing a "simple" MACD-screener would be a great way to get into some of the inner workings of R. However, I have encountered the following problem.
I've perfectly been able to calculate te MACD and signal line for a seperate stock. So now, in order to be able to scan multiple stocks, I have to generalize the code. My question in: "How can I use a variable (f.e. name of the stock currently being looked at) in the "$-notation"?
After this I'm planning to do a "for loop" iterating over the names of stocks in a list-object. Is this a practical way of doing it?
Below I've inserted the code I have till now. In this code I'm looking to replace the "QQQ" with a variable.
library(quantmod)
tickers <- c('QQQ','SPY','APPL','MMM')
ema.s = 12
ema.l = 26
ema.k = 9
ema.t = 200
getSymbols(tickers, from = '2021-01-6',
to = "2021-10-21",warnings = FALSE,
auto.assign = TRUE)
QQQ$QQQ.EMA.S <- EMA(QQQ[,6], n = ema.s)
QQQ$QQQ.EMA.L <- EMA(QQQ[,6], n = ema.l)
QQQ$QQQ.MACD <- QQQ$QQQ.EMA.S - QQQ$QQQ.EMA.L
QQQ$QQQ.SIG <- EMA(QQQ$QQQ.MACD, n = ema.k)
You can use tidyquant to all of this in one go.
library(tidyquant)
ema.s = 12
ema.l = 26
tickers <- c('QQQ','SPY','AAPL','MMM')
# get all the data in a tibble
stock_data <- tq_get(tickers,
from = '2021-01-6',
to = "2021-10-21")
stock_data <- stock_data %>%
group_by(symbol) %>%
tq_mutate(select = adjusted,
mutate_fun = MACD,
n_fast = ema.s,
n_slow = ema.l)
stock_data
# A tibble: 800 x 10
# Groups: symbol [4]
symbol date open high low close volume adjusted macd signal
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 QQQ 2021-01-06 307 312. 306. 308. 52809600 306. NA NA
2 QQQ 2021-01-07 310. 316. 310. 315. 30394800 314. NA NA
3 QQQ 2021-01-08 317. 319. 315. 319. 33955800 318. NA NA
4 QQQ 2021-01-11 316. 317. 314. 314. 32746400 313. NA NA
5 QQQ 2021-01-12 314. 316. 311. 314. 29266800 313. NA NA
6 QQQ 2021-01-13 314. 317. 314. 316. 22898400 315. NA NA
7 QQQ 2021-01-14 316. 318. 314. 314. 23500100 313. NA NA
8 QQQ 2021-01-15 314. 315. 311. 312. 35118700 311. NA NA
9 QQQ 2021-01-19 314. 317. 313. 316. 24537000 315. NA NA
10 QQQ 2021-01-20 320. 325. 317. 324. 30728100 323. NA NA
If you want to do this in base R functions combined with only quantmod functions, check the quantmod tag, there are a few posts that use lapply to do this. If you don't find what you need, let me know.
I am building multiple forecasts in R and I am trying to select certain columns from the forecast output. Below is what the fable looks like:
> head(forData)
# A fable: 6 x 8 [1M]
# Key: .model [1]
.model Month ABC .mean DateVar PCT Ind1 Ind2
<chr> <mth> <dist> <dbl> <dttm> <dbl> <dbl> <dbl>
1 average 2021 Jul N(0.31, 0.0017) 0.315 2021-07-01 00:00:00 3.25 0 0
2 average 2021 Aug N(0.33, 0.0024) 0.328 2021-08-01 00:00:00 3.25 0 0
3 average 2021 Sep N(0.33, 0.0029) 0.329 2021-09-01 00:00:00 3.25 0 0
4 average 2021 Oct N(0.32, 0.0038) 0.322 2021-10-01 00:00:00 3.25 0 0
5 average 2021 Nov N(0.33, 0.0044) 0.328 2021-11-01 00:00:00 3.25 0 0
6 average 2021 Dec N(0.33, 0.0051) 0.326 2021-12-01 00:00:00 3.25 0 0
When I try to use dplyr to select any columns I get the following error:
> forData %>% select(Month, .mean)
Error: Can't subset columns that don't exist.
x Column `ABC` doesn't exist.
The code below gives me a vector of both Month and .mean so I assume the names are correct but I can't understand the error it gives.
forData$Month
forData$.mean
We can use backquote to select after converting to tibble
forData %>%
as_tibble %>%
select(Month, `.mean`)
The underlying issue here (which is not clear from the error, I'll try to improve this) is that a <fable> must contain a distribution column. By selecting Month and .mean, you are removing the ABC (distribution) column which is required. If you no longer want the distribution, you will need to convert to a different data class, there are two main options here:
a <tsibble> with as_tsibble() (which requires the time column Month that you still have)
a <tibble> with as_tibble() (which has no requirements on the columns it contains)
I have spent 1-day search for the answer to this question and yet still could not figure out how this works (relatively new to R).
The data:
I have the daily revenue of a store. The starting date is November 2017, and the end date is February 2020. (It is not a typical Jan - Dec every year data). There is no missing value, and every day's sale is recorded. There are 2 columns: date (in proper date format) and revenue (in numerical format).
I am trying to build a time series forecasting model for my sales data. One pre-requisite is that I need to transform my data into the ts object. All those posts online I have seen dealt with yearly or monthly data, yet I have not yet seen anyone mention daily data.
I tried to convert my data to a ts object this way (I named my data "d"):
d_ts <- ts(d$revenue, start=min(d$date), end = max(d$date), frequency = 365)
I then got really weird results as such:
Start = c(17420, 1)
End = c(18311, 1)
Frequency = 365
[1] 1174.77 214.92 10.00 684.86 7020.04 11302.50 30613.55 29920.98 24546.49 22089.89 30291.65 32993.05 26517.11 39670.38 30361.32 17510.72
[17] 9888.76 3032.27 1229.74 2426.36 ....... [ reached getOption("max.print") -- omitted 324216 entries ]
There are 892 days in this dataset, how come the ts object's dimension to be 325,216 x 1 ????
I looked into this book called "Hands-On Time-Series with R" and found the following excerpt:
enter image description here
This basically means the ts() object does NOT work for daily data. Is this why my ts() conversion is messed up?
My questions are ...
(1) How can I make my daily revenue data to be a time series object before feeding into a model, if ts() does not work for daily data? All those time-series models require my data to be in time-series format though.
(2) Does the fact that my data does not start on Jan 2017 & end on Dec 2019 (i.e. those perfect 12 months in a year data shown in many online posts) have any complications? If so, what should I adjust so that the time series forecasting would be meaningful?
I have been stuck on this issue and could not wrap my head around. I really, really appreciate your help!
The ts function can work with any time interval, that's defined by the start and end points. As you're using dates, one unit corresponds to one day, as this is how they're stored internally. The help file at ?ts also shows examples of how to use annual or quarterly data,
To read in your daily data correctly you need to set frequency=1. Using some data similar in structure to what you've got:
#Compile a dataframe like yours
library(lubridate)
set.seed(0)
d <- data.frame(date=seq.Date(dmy("01/11/2017/"), by="day", length.out=892))
d$revenue <- runif(892)
head(d)
#date revenue
# 1 2017-11-01 0.8966972
# 2 2017-11-02 0.2655087
# 3 2017-11-03 0.3721239
# 4 2017-11-04 0.5728534
# 5 2017-11-05 0.9082078
# 6 2017-11-06 0.2016819
#Convert to timeseries object
d_ts <- ts(d$revenue, start=min(d$date), end = max(d$date), frequency = 1)
d_ts
# Time Series:
# Start = 17471
# End = 18362
# Frequency = 1
# [1] 0.896697200 0.265508663 0.372123900 0.572853363 0.908207790 0.201681931 0.898389685 0.944675269 0.660797792
# [10] 0.629114044 0.061786270 0.205974575 0.176556753 0.687022847 0.384103718 0.769841420 0.497699242 0.717618508
With daily data, you are better off using a tsibble class rather than a ts class. There are modelling and forecasting tools available via the fable package.
library(tsibble)
library(fable)
set.seed(1)
d_tsibble <- data.frame(
date = seq(as.Date("2017-11-01"), by = "day", length.out = 892),
revenue = rnorm(892)
) %>%
as_tsibble(index = date)
d_tsibble
#> # A tsibble: 892 x 2 [1D]
#> date revenue
#> <date> <dbl>
#> 1 2017-11-01 -0.626
#> 2 2017-11-02 0.184
#> 3 2017-11-03 -0.836
#> 4 2017-11-04 1.60
#> 5 2017-11-05 0.330
#> 6 2017-11-06 -0.820
#> 7 2017-11-07 0.487
#> 8 2017-11-08 0.738
#> 9 2017-11-09 0.576
#> 10 2017-11-10 -0.305
#> # … with 882 more rows
d_tsibble %>%
model(
arima = ARIMA(revenue)
) %>%
forecast(h = "14 days")
#> # A fable: 14 x 4 [1D]
#> # Key: .model [1]
#> .model date revenue .distribution
#> <chr> <date> <dbl> <dist>
#> 1 arima 2020-04-11 -0.0178 N(-1.8e-02, 1.1)
#> 2 arima 2020-04-12 -0.0117 N(-1.2e-02, 1.1)
#> 3 arima 2020-04-13 -0.00765 N(-7.7e-03, 1.1)
#> 4 arima 2020-04-14 -0.00501 N(-5.0e-03, 1.1)
#> 5 arima 2020-04-15 -0.00329 N(-3.3e-03, 1.1)
#> 6 arima 2020-04-16 -0.00215 N(-2.2e-03, 1.1)
#> 7 arima 2020-04-17 -0.00141 N(-1.4e-03, 1.1)
#> 8 arima 2020-04-18 -0.000925 N(-9.2e-04, 1.1)
#> 9 arima 2020-04-19 -0.000606 N(-6.1e-04, 1.1)
#> 10 arima 2020-04-20 -0.000397 N(-4.0e-04, 1.1)
#> 11 arima 2020-04-21 -0.000260 N(-2.6e-04, 1.1)
#> 12 arima 2020-04-22 -0.000171 N(-1.7e-04, 1.1)
#> 13 arima 2020-04-23 -0.000112 N(-1.1e-04, 1.1)
#> 14 arima 2020-04-24 -0.0000732 N(-7.3e-05, 1.1)
Created on 2020-04-01 by the reprex package (v0.3.0)
I do time series decomposition and I want to save the resulting objects in a dataframe. It works if I store the results in a object and use it to make the dataframe afterwards:
# needed packages
library(tidyverse)
library(forecast)
# some "time series"
vec <- 1:1000 + rnorm(1000)
# store pipe results
pipe_out <-
# do decomposition
decompose(msts(vec, start= c(2001, 1, 1), seasonal.periods= c(7, 365.25))) %>%
# relevant data
.$seasonal
# make a dataframe with the stored seasonal data
data.frame(ts= pipe_out)
But doing the same as a one-liner fails:
decompose(msts(vec, start= c(2001, 1, 1), seasonal.periods= c(7, 365.25))) %>%
data.frame(ts= .$seasonal)
I get the error
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ‘"decomposed.ts"’ to a data.frame
I thought that the pipe simply moves forward the things that came up in the last step which saves us storing those things in objects. If so, shouldn't both codes result in the very same output?
EDIT (from comments)
The first code works but it is a bad solution because if one wants to extract all the vectors of the decomposed time series one would need to do it in multiple steps. Something like the following would be better:
decompose(msts(vec, start= c(2001, 1, 1),
seasonal.periods= c(7, 365.25))) %>%
data.frame(seasonal= .$seasonal, x=.$x, trend=.$trend, random=.$random)
It's unclear from your example whether you want to extract $x or $seasonal. Either way, you can extract part of a list either with the `[[`() function in base or the alias extract2() in magrittr, as you prefer. You should then use the . when you create a data.frame in the last step.
Cleaning up the code a bit to be consistent with the piping, the following works:
library(magrittr)
library(tidyverse)
library(forecast)
vec <- 1:1000 + rnorm(1000)
vec %>%
msts(start = c(2001, 1, 1), seasonal.periods= c(7, 365.25)) %>%
decompose %>%
`[[`("seasonal") %>%
# extract2("seasonal") %>% # Another option, uncomment if preferred
data.frame(ts = .) %>%
head # Just for the reprex, remove as required
#> ts
#> 1 -1.17332998
#> 2 0.07393265
#> 3 0.37631946
#> 4 0.30640395
#> 5 1.04279779
#> 6 0.20470768
Created on 2019-11-28 by the reprex package (v0.3.0)
Edit based on comment:
To do what you mention in the comments, you need to use curly brackets (see e.g. here for an explanation why). Hence, the following works:
library(magrittr)
library(tidyverse)
library(forecast)
vec <- 1:1000 + rnorm(1000)
vec %>%
msts(start= c(2001, 1, 1), seasonal.periods = c(7, 365.25)) %>%
decompose %>%
{data.frame(seasonal = .$seasonal,
trend = .$trend)} %>%
head
#> seasonal trend
#> 1 -0.4332034 NA
#> 2 -0.6185832 NA
#> 3 -0.5899566 NA
#> 4 0.7640938 NA
#> 5 -0.4374417 NA
#> 6 -0.8739449 NA
However, for your specific use case, it may be clearer and easier to use magrittr::extract and then simply bind_cols:
vec %>%
msts(start= c(2001, 1, 1), seasonal.periods = c(7, 365.25)) %>%
decompose %>%
magrittr::extract(c("seasonal", "trend")) %>%
bind_cols %>%
head
#> # A tibble: 6 x 2
#> seasonal trend
#> <dbl> <dbl>
#> 1 -0.433 NA
#> 2 -0.619 NA
#> 3 -0.590 NA
#> 4 0.764 NA
#> 5 -0.437 NA
#> 6 -0.874 NA
Created on 2019-11-29 by the reprex package (v0.3.0)
With daily data, decompose() does not work well because it will only handle the annual seasonality and will give relatively poor estimates of it. If the data involve human behaviour, it will probably have both weekly and annual seasonal patterns.
Also, msts objects are not great for daily data either because they don't store the dates explicitly.
I suggest you use tsibble objects with an STL decomposition instead. Here is an example using your data.
library(tidyverse)
library(tsibble)
library(feasts)
mydata <- tsibble(
day = as.Date(seq(as.Date("2001-01-01"), length=1000, by=1)),
vec = 1:1000 + rnorm(1000)
)
#> Using `day` as index variable.
mydata
#> # A tsibble: 1,000 x 2 [1D]
#> day vec
#> <date> <dbl>
#> 1 2001-01-01 0.161
#> 2 2001-01-02 2.61
#> 3 2001-01-03 1.37
#> 4 2001-01-04 3.15
#> 5 2001-01-05 4.43
#> 6 2001-01-06 7.35
#> 7 2001-01-07 7.10
#> 8 2001-01-08 10.0
#> 9 2001-01-09 9.16
#> 10 2001-01-10 10.2
#> # … with 990 more rows
# Compute a decomposition
mydata %>% STL(vec)
#> # A dable: 1,000 x 7 [1D]
#> # STL Decomposition: vec = trend + season_year + season_week + remainder
#> day vec trend season_year season_week remainder season_adjust
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2001-01-01 0.161 14.7 -14.6 0.295 -0.193 14.5
#> 2 2001-01-02 2.61 15.6 -14.2 0.0865 1.04 16.7
#> 3 2001-01-03 1.37 16.6 -15.5 0.0365 0.240 16.9
#> 4 2001-01-04 3.15 17.6 -13.0 -0.0680 -1.34 16.3
#> 5 2001-01-05 4.43 18.6 -13.4 -0.0361 -0.700 17.9
#> 6 2001-01-06 7.35 19.5 -12.4 -0.122 0.358 19.9
#> 7 2001-01-07 7.10 20.5 -13.4 -0.181 0.170 20.7
#> 8 2001-01-08 10.0 21.4 -12.7 0.282 1.10 22.5
#> 9 2001-01-09 9.16 22.2 -13.8 0.0773 0.642 22.9
#> 10 2001-01-10 10.2 22.9 -12.7 0.0323 -0.0492 22.9
#> # … with 990 more rows
Created on 2019-11-30 by the reprex package (v0.3.0)
The output is a dable (decomposition table) which behaves like a dataframe most of the time. So you can extract the trend column, or either of the seasonal component columns in the usual way.