I have this dataset that contains multiple series (50 products). My dataset has 50 products (50 columns). each column has the daily sales of a product.
I want to forecast these product using ets. So I have created this code below and when I run it I get only one time series and some information that I do not understand. Thanks in advance :)
y<- read.csv("QAO2.csv", header=FALSE, fileEncoding = "latin1")
y <- ts(y[,-1],f=12,s=c(2007, 1))
ns <- ncol(y)
for(i in 1:ns)
fit.ets <- ets(y[,i])
print(fit.ets)
f.ets <- forecast(fit.ets,h=12)
print(f.ets)
plot(f.ets)
This is what the fable package is designed to do. Here is an example using 50 series of monthly data from 2007. Although you say you have daily data, the code you provide assumes monthly data (frequency 12).
library(fable)
library(dplyr)
library(tidyr)
library(ggplot2)
y <- ts(matrix(rnorm(175*50), ncol=50), frequency=12, start=c(2007,1)) %>%
as_tsibble() %>%
rename(Month = index, Sales=value)
y
#> # A tsibble: 8,750 x 3 [1M]
#> # Key: key [50]
#> Month key Sales
#> <mth> <chr> <dbl>
#> 1 2007 Jan Series 1 1.06
#> 2 2007 Feb Series 1 0.495
#> 3 2007 Mar Series 1 0.332
#> 4 2007 Apr Series 1 0.157
#> 5 2007 May Series 1 -0.120
#> 6 2007 Jun Series 1 -0.0846
#> 7 2007 Jul Series 1 -0.743
#> 8 2007 Aug Series 1 0.714
#> 9 2007 Sep Series 1 1.73
#> 10 2007 Oct Series 1 -0.212
#> # … with 8,740 more rows
fit.ets <- y %>% model(ETS(Sales))
fit.ets
#> # A mable: 50 x 2
#> # Key: key [50]
#> key `ETS(Sales)`
#> <chr> <model>
#> 1 Series 1 <ETS(A,N,N)>
#> 2 Series 10 <ETS(A,N,N)>
#> 3 Series 11 <ETS(A,N,N)>
#> 4 Series 12 <ETS(A,N,N)>
#> 5 Series 13 <ETS(A,N,N)>
#> 6 Series 14 <ETS(A,N,N)>
#> 7 Series 15 <ETS(A,N,N)>
#> 8 Series 16 <ETS(A,N,N)>
#> 9 Series 17 <ETS(A,N,N)>
#> 10 Series 18 <ETS(A,N,N)>
#> # … with 40 more rows
f.ets <- forecast(fit.ets, h=12)
f.ets
#> # A fable: 600 x 5 [1M]
#> # Key: key, .model [50]
#> key .model Month Sales .mean
#> <chr> <chr> <mth> <dist> <dbl>
#> 1 Series 1 ETS(Sales) 2021 Aug N(-0.028, 1.1) -0.0279
#> 2 Series 1 ETS(Sales) 2021 Sep N(-0.028, 1.1) -0.0279
#> 3 Series 1 ETS(Sales) 2021 Oct N(-0.028, 1.1) -0.0279
#> 4 Series 1 ETS(Sales) 2021 Nov N(-0.028, 1.1) -0.0279
#> 5 Series 1 ETS(Sales) 2021 Dec N(-0.028, 1.1) -0.0279
#> 6 Series 1 ETS(Sales) 2022 Jan N(-0.028, 1.1) -0.0279
#> 7 Series 1 ETS(Sales) 2022 Feb N(-0.028, 1.1) -0.0279
#> 8 Series 1 ETS(Sales) 2022 Mar N(-0.028, 1.1) -0.0279
#> 9 Series 1 ETS(Sales) 2022 Apr N(-0.028, 1.1) -0.0279
#> 10 Series 1 ETS(Sales) 2022 May N(-0.028, 1.1) -0.0279
#> # … with 590 more rows
f.ets %>%
filter(key == "Series 1") %>%
autoplot(y) +
labs(title = "Series 1")
Created on 2021-08-05 by the reprex package (v2.0.0)
Related
I have data resembling the following structure, where the when variable denotes the day of measurement:
## Generate data.
set.seed(1986)
n <- 1000
y <- rnorm(n)
when <- as.POSIXct(strftime(seq(as.POSIXct("2021-11-01 23:00:00 UTC", tryFormats = "%Y-%m-%d"),
as.POSIXct("2022-11-01 23:00:00 UTC", tryFormats = "%Y-%m-%d"),
length.out = n), format = "%Y-%m-%d"))
dta <- data.frame(y, when)
head(dta)
#> y when
#> 1 -0.04625141 2021-11-01
#> 2 0.28000082 2021-11-01
#> 3 0.25317063 2021-11-01
#> 4 -0.96411077 2021-11-02
#> 5 0.49222664 2021-11-02
#> 6 -0.69874551 2021-11-02
I need to compute averages of y over time. For instance, the following computes daily averages:
## Compute daily averages of y.
library(dplyr)
daily_avg <- dta %>%
group_by(when) %>%
summarise(daily_mean = mean(y)) %>%
ungroup()
daily_avg
#> # A tibble: 366 × 2
#> when daily_mean
#> <dttm> <dbl>
#> 1 2021-11-01 00:00:00 0.162
#> 2 2021-11-02 00:00:00 -0.390
#> 3 2021-11-03 00:00:00 -0.485
#> 4 2021-11-04 00:00:00 -0.152
#> 5 2021-11-05 00:00:00 0.425
#> 6 2021-11-06 00:00:00 0.726
#> 7 2021-11-07 00:00:00 0.855
#> 8 2021-11-08 00:00:00 0.0608
#> 9 2021-11-09 00:00:00 -0.995
#> 10 2021-11-10 00:00:00 0.395
#> # … with 356 more rows
I am having a hard time computing weekly averages. Here is what I have tried so far:
## Fail - compute weekly averages of y.
library(lubridate)
dta$week <- week(dta$when) # This is wrong.
dta[165: 171, ]
#> y when week
#> 165 0.9758333 2021-12-30 52
#> 166 -0.8630091 2021-12-31 53
#> 167 0.3054031 2021-12-31 53
#> 168 1.2814421 2022-01-01 1
#> 169 0.1025440 2022-01-01 1
#> 170 1.3665411 2022-01-01 1
#> 171 -0.5373058 2022-01-02 1
Using the week function from the lubridate package ignores the fact that my data spawn across years. So, if I were to use a code similar to the one I used for the daily averages, I would aggregate observations belonging to different years (but to the same week number). How can I solve this?
You can use %V (from ?strptime) for weeks, combining it with the year.
dta %>%
group_by(week = format(when, format = "%Y-%V")) %>%
summarize(daily_mean = mean(y)) %>%
ungroup()
# # A tibble: 54 x 2
# week daily_mean
# <chr> <dbl>
# 1 2021-44 0.179
# 2 2021-45 0.0477
# 3 2021-46 0.0340
# 4 2021-47 0.356
# 5 2021-48 0.0544
# 6 2021-49 -0.0948
# 7 2021-50 -0.0419
# 8 2021-51 0.209
# 9 2021-52 0.251
# 10 2022-01 -0.197
# # ... with 44 more rows
There are different variants of "week", depending on your preference.
%V
Week of the year as decimal number (01–53) as defined in ISO 8601.
If the week (starting on Monday) containing 1 January has four or more
days in the new year, then it is considered week 1. Otherwise, it is
the last week of the previous year, and the next week is week 1.
(Accepted but ignored on input.)
%W
Week of the year as decimal number (00–53) using Monday as the first
day of week (and typically with the first Monday of the year as day 1
of week 1). The UK convention.
You can extract year and week from the dates and group by both:
dta %>%
mutate(year = year(when),
week = week(when)) %>%
group_by(year, week) %>%
summarise(y_mean = mean(y)) %>%
ungroup()
# # A tibble: 54 x 3
# # Groups: year, week [54]
# year week y_mean
# <dbl> <dbl> <dbl>
# 1 2021 44 -0.222
# 2 2021 45 0.234
# 3 2021 46 0.0953
# 4 2021 47 0.206
# 5 2021 48 0.192
# 6 2021 49 -0.0831
# 7 2021 50 0.0282
# 8 2021 51 0.196
# 9 2021 52 0.132
# 10 2021 53 -0.279
# # ... with 44 more rows
I'm just going to chalk this up to my ignorance, but sometimes the pivot_* functions drive me crazy.
I have a tibble:
# A tibble: 12 x 3
year term estimate
<dbl> <chr> <dbl>
1 2018 intercept -29.8
2 2018 daysuntilelection 8.27
3 2019 intercept -50.6
4 2019 daysuntilelection 7.40
5 2020 intercept -31.6
6 2020 daysuntilelection 6.55
7 2021 intercept -19.0
8 2021 daysuntilelection 4.60
9 2022 intercept -10.7
10 2022 daysuntilelection 6.41
11 2023 intercept 120
12 2023 daysuntilelection 0
that I would like to flip to:
# A tibble: 6 x 3
year intercept daysuntilelection
<dbl> <dbl> <dbl>
1 2018 -29.8 8.27
2 2019 -50.6 7.40
3 2020 -31.6 6.55
4 2021 -19.0 4.60
5 2022 -10.7 6.41
6 2023 120 0
Normally pivot_wider should be able to do this as x %>% pivot_wider(!year, names_from = "term", values_from = "estimate") but instead it returns a two-column tibble with lists and a bunch of warning.
# A tibble: 1 x 2
intercept daysuntilelection
<list> <list>
1 <dbl [6]> <dbl [6]>
Warning message:
Values from `estimate` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = {summary_fun}` to summarise duplicates.
* Use the following dplyr code to identify duplicates.
{data} %>%
dplyr::group_by(term) %>%
dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
dplyr::filter(n > 1L)
Where do I go wrong here? Help!
Next to the solutions offered in the comments, data.table's dcast is a very fast implementation to pivot your data. If the pivot_ functions drive you crazy, maybe this is a nice alternative for you:
x <- read.table(text = "
1 2018 intercept -29.8
2 2018 daysuntilelection 8.27
3 2019 intercept -50.6
4 2019 daysuntilelection 7.40
5 2020 intercept -31.6
6 2020 daysuntilelection 6.55
7 2021 intercept -19.0
8 2021 daysuntilelection 4.60
9 2022 intercept -10.7
10 2022 daysuntilelection 6.41
11 2023 intercept 120
12 2023 daysuntilelection 0")
names(x) <- c("id", "year", "term", "estimate")
library(data.table)
dcast(as.data.table(x), year ~ term)
#> Using 'estimate' as value column. Use 'value.var' to override
#> year daysuntilelection intercept
#> 1: 2018 8.27 -29.8
#> 2: 2019 7.40 -50.6
#> 3: 2020 6.55 -31.6
#> 4: 2021 4.60 -19.0
#> 5: 2022 6.41 -10.7
#> 6: 2023 0.00 120.0
DATA
df <- read.table(text = "
1 2018 intercept -29.8
2 2018 daysuntilelection 8.27
3 2019 intercept -50.6
4 2019 daysuntilelection 7.40
5 2020 intercept -31.6
6 2020 daysuntilelection 6.55
7 2021 intercept -19.0
8 2021 daysuntilelection 4.60
9 2022 intercept -10.7
10 2022 daysuntilelection 6.41
11 2023 intercept 120
12 2023 daysuntilelection 0")
CODE
library(tidyverse)
df %>%
pivot_wider(names_from = V3,values_from = V4 , values_fill = 0) %>%
group_by(V2) %>%
summarise_all(sum,na.rm=T)
OUTPUT
V2 V1 intercept daysuntilelection
<int> <int> <dbl> <dbl>
1 2018 3 -29.8 8.27
2 2019 7 -50.6 7.4
3 2020 11 -31.6 6.55
4 2021 15 -19 4.6
5 2022 19 -10.7 6.41
6 2023 23 120 0
I'm encountering an issue attempting to extract the 90/95% confidence intervals resulting from a forecast model built from a key variable holding 5 groups across a total of 4 forecasting models.
The primary problem is that I'm not familiar with how R treats and works with dist and hilo object types.
The original tsibble has a structure of 60 months for each of the 5 groups (300 observations)
>groups
# A tsibble: 300 x 3 [1M]
# Key: Group [5]
Month Group Measure
<mth> <chr> <dbl>
1 2016 May Group1 8.75
2 2016 Jun Group1 8.5
3 2016 Jul Group1 7
4 2016 Aug Group1 10
5 2016 Sep Group1 2
6 2016 Oct Group1 6
7 2016 Nov Group1 8
8 2016 Dec Group1 0
9 2017 Jan Group1 16
10 2017 Feb Group1 9
... with 290 more rows
I form a model with different forecast methods, as well as a combination model:
groups%>%model(ets=ETS(Measure),
mean=MEAN(Measure),
snaive=SNAIVE(Measure))%>%mutate(combination=(ets+mean+snaive)/3)->groups_avg
This results in a mable of the structure
>groups_avg
# A mable: 5 x 5
# Key: Group [5]
Group ets mean snaive combination
<chr> <model> <mode> <model> <model>
1 Group1 <ETS(A,N,N)> <MEAN> <SNAIVE> <COMBINATION>
2 Group2 <ETS(A,N,N)> <MEAN> <SNAIVE> <COMBINATION>
3 Group3 <ETS(M,N,N)> <MEAN> <SNAIVE> <COMBINATION>
4 Group4 <ETS(A,N,N)> <MEAN> <SNAIVE> <COMBINATION>
5 Group5 <ETS(A,N,N)> <MEAN> <SNAIVE> <COMBINATION>
Which I then forecast out 6 months
groups_avg%>%forecast(h=6,level=c(90,95))->groups_fc
Before generating my idea of what the output tsibble should be:
>firm_fc%>%hilo(level=c(90,95))->firm_hilo
> groups_hilo
# A tsibble: 120 x 7 [1M]
# Key: Group, .model [20]
Group .model Month Measure .mean `90%` `95%`
<chr> <chr> <mth> <dist> <dbl> <hilo> <hilo>
1 CapstoneLaw ets 2021 May N(12, 21) 11.6 [4.1332418, 19.04858]90 [ 2.704550, 20.47727]95
2 CapstoneLaw ets 2021 Jun N(12, 21) 11.6 [4.0438878, 19.13793]90 [ 2.598079, 20.58374]95
3 CapstoneLaw ets 2021 Jul N(12, 22) 11.6 [3.9555794, 19.22624]90 [ 2.492853, 20.68897]95
4 CapstoneLaw ets 2021 Aug N(12, 22) 11.6 [3.8682807, 19.31354]90 [ 2.388830, 20.79299]95
5 CapstoneLaw ets 2021 Sep N(12, 23) 11.6 [3.7819580, 19.39986]90 [ 2.285970, 20.89585]95
6 CapstoneLaw ets 2021 Oct N(12, 23) 11.6 [3.6965790, 19.48524]90 [ 2.184235, 20.99758]95
7 CapstoneLaw mean 2021 May N(8, 21) 7.97 [0.3744124, 15.56725]90 [-1.080860, 17.02253]95
8 CapstoneLaw mean 2021 Jun N(8, 21) 7.97 [0.3744124, 15.56725]90 [-1.080860, 17.02253]95
9 CapstoneLaw mean 2021 Jul N(8, 21) 7.97 [0.3744124, 15.56725]90 [-1.080860, 17.02253]95
10 CapstoneLaw mean 2021 Aug N(8, 21) 7.97 [0.3744124, 15.56725]90 [-1.080860, 17.02253]95
# ... with 110 more rows
As I've done with more simply structured forecasts, I tried to write these forecast results to a csv.
> write.csv(firm_hilo,dir)
Error: Can't convert <hilo> to <character>.
Run `rlang::last_error()` to see where the error occurred.
But I am quite lost on how to coerce the generated 90/95% confidence intervals into a format that I can export. Has anyone encountered this issue?
Please let me know if I should include any more information!
Issue:
Using fable I can easily produce forecasts on a time series with a grouped structure, and can even use Fable's aggregate_key/ reconcile syntax to produce a coherent top-level forecast. However I'm unable to easily access the aggregate forecasts using this method, and the alternative I'm using involves abandoning the fable (forecast table) structure. Can anyone tell me if there's an easier/intended way to do this using the package? As you can see in the examples, I'm able to get there using other methods, but I'd like to know if there's a better way. Any help gratefully received!
Approach 1:
My efforts to summarise the forecast without using aggregate_key/ reconcile have been mainly using dplyr's group_by and summarise, however the prediction interval for the forecast is formatted as a normal distribution object, which doesn't seem to support summing using this method. To get around this I've been using hilo and unpack_hilo to extract bounds for different prediction intervals, which can then be summed using the usual method. However I'd really like to retain the fable structure and the distribution objects, which is impossible using this method.
Approach 2:
The alternative, using aggregate_key/ reconcile only seems to support aggregation using min_trace. I understand that this method is for optimum reconciliation, whereas what I want is a simple bottom-up aggregate forecast. It feels like there should be an easy way to get bottom-up forecasts using this syntax, but I haven't found one so far. Moreover, even using min_trace I'm unsure how to access the aggregate forecast itself as you can see in the example!
Example using approach 1:
library(fable)
#> Loading required package: fabletools
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
lung_deaths_agg <- as_tsibble(cbind(mdeaths, fdeaths))
fc_1 <- lung_deaths_agg %>%
model(lm = TSLM(value ~ trend() + season())) %>%
forecast()
fc_1
#> # A fable: 48 x 5 [1M]
#> # Key: key, .model [2]
#> key .model index value .mean
#> <chr> <chr> <mth> <dist> <dbl>
#> 1 fdeaths lm 1980 Jan N(794, 5940) 794.
#> 2 fdeaths lm 1980 Feb N(778, 5940) 778.
#> 3 fdeaths lm 1980 Mar N(737, 5940) 737.
#> 4 fdeaths lm 1980 Apr N(577, 5940) 577.
#> 5 fdeaths lm 1980 May N(456, 5940) 456.
#> 6 fdeaths lm 1980 Jun N(386, 5940) 386.
#> 7 fdeaths lm 1980 Jul N(379, 5940) 379.
#> 8 fdeaths lm 1980 Aug N(335, 5940) 335.
#> 9 fdeaths lm 1980 Sep N(340, 5940) 340.
#> 10 fdeaths lm 1980 Oct N(413, 5940) 413.
#> # ... with 38 more rows
fc_1 %>%
hilo() %>%
unpack_hilo(c(`80%`, `95%`)) %>%
as_tibble() %>%
group_by(index) %>%
summarise(across(c(.mean, ends_with("upper"), ends_with("lower")), sum))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 24 x 6
#> index .mean `80%_upper` `95%_upper` `80%_lower` `95%_lower`
#> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1980 Jan 2751. 3089. 3267. 2414. 2236.
#> 2 1980 Feb 2687. 3024. 3202. 2350. 2171.
#> 3 1980 Mar 2535. 2872. 3051. 2198. 2020.
#> 4 1980 Apr 2062. 2399. 2577. 1725. 1546.
#> 5 1980 May 1597. 1934. 2113. 1260. 1082.
#> 6 1980 Jun 1401. 1738. 1916. 1064. 885.
#> 7 1980 Jul 1343. 1680. 1858. 1006. 827.
#> 8 1980 Aug 1200. 1538. 1716. 863. 685.
#> 9 1980 Sep 1189. 1527. 1705. 852. 674.
#> 10 1980 Oct 1482. 1819. 1998. 1145. 967.
#> # ... with 14 more rows
Example using approach 2:
fc_2 <- lung_deaths_agg %>%
aggregate_key(key, value = sum(value)) %>%
model(lm = TSLM(value ~ trend() + season())) %>%
reconcile(lm = min_trace(lm)) %>%
forecast()
fc_2
#> # A fable: 72 x 5 [1M]
#> # Key: key, .model [3]
#> key .model index value .mean
#> <chr> <chr> <mth> <dist> <dbl>
#> 1 fdeaths lm 1980 Jan N(794, 5606) 794.
#> 2 fdeaths lm 1980 Feb N(778, 5606) 778.
#> 3 fdeaths lm 1980 Mar N(737, 5606) 737.
#> 4 fdeaths lm 1980 Apr N(577, 5606) 577.
#> 5 fdeaths lm 1980 May N(456, 5606) 456.
#> 6 fdeaths lm 1980 Jun N(386, 5606) 386.
#> 7 fdeaths lm 1980 Jul N(379, 5606) 379.
#> 8 fdeaths lm 1980 Aug N(335, 5606) 335.
#> 9 fdeaths lm 1980 Sep N(340, 5606) 340.
#> 10 fdeaths lm 1980 Oct N(413, 5606) 413.
#> # ... with 62 more rows
fc_2 %>% as_tibble() %>% select(key) %>% slice(50:55)
#> # A tibble: 6 x 1
#> key
#> <chr>
#> 1 <aggregated>
#> 2 <aggregated>
#> 3 <aggregated>
#> 4 <aggregated>
#> 5 <aggregated>
#> 6 <aggregated>
fc_2 %>% as_tibble() %>% select(key) %>% filter(key == "<aggregated>")
#> # A tibble: 0 x 1
#> # ... with 1 variable: key <chr>
Approach 1:
Working with distributions requires more care (than numbers) when adding things together. More specifically, the mean of a Normal distribution can be added without issue:
library(distributional)
mean(dist_normal(2,3) + dist_normal(4,1))
#> [1] 6
mean(dist_normal(2,3)) + mean(dist_normal(4,1))
#> [1] 6
Created on 2020-07-03 by the reprex package (v0.3.0)
However the quantiles (used to produce your 80% and 95% intervals) cannot:
library(distributional)
quantile(dist_normal(2,3) + dist_normal(4,1), 0.9)
#> [1] 10.05262
quantile(dist_normal(2,3), 0.9) + quantile(dist_normal(4,1), 0.9)
#> [1] 11.12621
Created on 2020-07-03 by the reprex package (v0.3.0)
If you want to aggregate distributions, you'll need to compute the sum on the distribution itself:
library(fable)
library(dplyr)
lung_deaths_agg <- as_tsibble(cbind(mdeaths, fdeaths))
fc_1 <- lung_deaths_agg %>%
model(lm = fable::TSLM(value ~ trend() + season())) %>%
forecast()
fc_1 %>%
summarise(value = sum(value), .mean = mean(value))
#> # A fable: 24 x 3 [1M]
#> index value .mean
#> <mth> <dist> <dbl>
#> 1 1980 Jan N(2751, 40520) 2751.
#> 2 1980 Feb N(2687, 40520) 2687.
#> 3 1980 Mar N(2535, 40520) 2535.
#> 4 1980 Apr N(2062, 40520) 2062.
#> 5 1980 May N(1597, 40520) 1597.
#> 6 1980 Jun N(1401, 40520) 1401.
#> 7 1980 Jul N(1343, 40520) 1343.
#> 8 1980 Aug N(1200, 40520) 1200.
#> 9 1980 Sep N(1189, 40520) 1189.
#> 10 1980 Oct N(1482, 40520) 1482.
#> # … with 14 more rows
Created on 2020-07-03 by the reprex package (v0.3.0)
Note that this will require the development versions of fabletools (>=0.2.0.9000) and distributional (>=0.1.0.9000) as I have added new features to make this example work.
Approach 2:
Experimental support for bottom up reconciliation is available using fabletools:::bottom_up(). This is currently an internal function as I'm still working on some details of how reconciliation can be done more generally in fabletools.
Matching aggregated values should be done with is_aggregated().
fc_2 <- lung_deaths_agg %>%
aggregate_key(key, value = sum(value)) %>%
model(lm = TSLM(value ~ trend() + season())) %>%
reconcile(lm = min_trace(lm)) %>%
forecast()
fc_2 %>%
filter(is_aggregated(key))
#> # A fable: 24 x 5 [1M]
#> # Key: key, .model [1]
#> key .model index value .mean
#> <chr> <chr> <mth> <dist> <dbl>
#> 1 <aggregated> lm 1980 Jan N(2751, 24989) 2751.
#> 2 <aggregated> lm 1980 Feb N(2687, 24989) 2687.
#> 3 <aggregated> lm 1980 Mar N(2535, 24989) 2535.
#> 4 <aggregated> lm 1980 Apr N(2062, 24989) 2062.
#> 5 <aggregated> lm 1980 May N(1597, 24989) 1597.
#> 6 <aggregated> lm 1980 Jun N(1401, 24989) 1401.
#> 7 <aggregated> lm 1980 Jul N(1343, 24989) 1343.
#> 8 <aggregated> lm 1980 Aug N(1200, 24989) 1200.
#> 9 <aggregated> lm 1980 Sep N(1189, 24989) 1189.
#> 10 <aggregated> lm 1980 Oct N(1482, 24989) 1482.
#> # … with 14 more rows
Created on 2020-07-03 by the reprex package (v0.3.0)
Comparing an aggregated vector with "<aggregated>" is ambiguous, as your key's character value may be "<aggregated>" without the value being <aggregated>. I've now updated fabletools to match "<aggregated>" with aggregated values with a warning and hint, so this code now gives:
fc_2 %>%
filter(key == "<aggregated>")
#> Warning: <aggregated> character values have been converted to aggregated values.
#> Hint: If you're trying to compare aggregated values, use `is_aggregated()`.
#> # A fable: 24 x 5 [1M]
#> # Key: key, .model [1]
#> key .model index value .mean
#> <chr> <chr> <mth> <dist> <dbl>
#> 1 <aggregated> lm 1980 Jan N(2751, 24989) 2751.
#> 2 <aggregated> lm 1980 Feb N(2687, 24989) 2687.
#> 3 <aggregated> lm 1980 Mar N(2535, 24989) 2535.
#> 4 <aggregated> lm 1980 Apr N(2062, 24989) 2062.
#> 5 <aggregated> lm 1980 May N(1597, 24989) 1597.
#> 6 <aggregated> lm 1980 Jun N(1401, 24989) 1401.
#> 7 <aggregated> lm 1980 Jul N(1343, 24989) 1343.
#> 8 <aggregated> lm 1980 Aug N(1200, 24989) 1200.
#> 9 <aggregated> lm 1980 Sep N(1189, 24989) 1189.
#> 10 <aggregated> lm 1980 Oct N(1482, 24989) 1482.
#> # … with 14 more rows
Created on 2020-07-03 by the reprex package (v0.3.0)
i would like to generate forecasts using auto.arima however i dont see future dates populated. How can i get future forecasts with date. I am having weekly data, want to generate forecasts upto Dec 2020
i am using forecast package in R
fit <- auto.arima(zoo_ts)
fcast <- forecast(fit, h=83)
Need weekly forecast from july 2019 with dates having weekly interval. I am not providing any data. Can anyone share how to do it will be great
The forecast package uses ts objects, which are not great for weekly data. The time index is stored numerically in terms of years. So 2019.5385 means week 28 of 2019 (as 28/52 = 0.5385).
An alternative is to use the fable and tsibble packages. Here is an example using weekly data.
library(tsibble)
library(fable)
library(fpp3) # For the data
# Fit the model
fit <- us_gasoline %>% model(arima = ARIMA(Barrels))
# Produce forecasts
fcast <- forecast(fit, h = 83)
fcast
#> # A fable: 83 x 4 [1W]
#> # Key: .model [1]
#> .model Week Barrels .distribution
#> <chr> <week> <dbl> <dist>
#> 1 arima 2017 W04 8.30 N(8.3, 0.072)
#> 2 arima 2017 W05 8.44 N(8.4, 0.077)
#> 3 arima 2017 W06 8.53 N(8.5, 0.082)
#> 4 arima 2017 W07 8.59 N(8.6, 0.086)
#> 5 arima 2017 W08 8.48 N(8.5, 0.091)
#> 6 arima 2017 W09 8.49 N(8.5, 0.096)
#> 7 arima 2017 W10 8.61 N(8.6, 0.101)
#> 8 arima 2017 W11 8.52 N(8.5, 0.106)
#> 9 arima 2017 W12 8.58 N(8.6, 0.111)
#> 10 arima 2017 W13 8.47 N(8.5, 0.115)
#> # … with 73 more rows
The time index is stored in weeks here. This can be converted to a date using as.Date:
# Convert weekly index to a date
fcast %>% mutate(date = as.Date(Week))
#> # A fable: 83 x 5 [1W]
#> # Key: .model [1]
#> .model Week Barrels .distribution date
#> <chr> <week> <dbl> <dist> <date>
#> 1 arima 2017 W04 8.30 N(8.3, 0.072) 2017-01-23
#> 2 arima 2017 W05 8.44 N(8.4, 0.077) 2017-01-30
#> 3 arima 2017 W06 8.53 N(8.5, 0.082) 2017-02-06
#> 4 arima 2017 W07 8.59 N(8.6, 0.086) 2017-02-13
#> 5 arima 2017 W08 8.48 N(8.5, 0.091) 2017-02-20
#> 6 arima 2017 W09 8.49 N(8.5, 0.096) 2017-02-27
#> 7 arima 2017 W10 8.61 N(8.6, 0.101) 2017-03-06
#> 8 arima 2017 W11 8.52 N(8.5, 0.106) 2017-03-13
#> 9 arima 2017 W12 8.58 N(8.6, 0.111) 2017-03-20
#> 10 arima 2017 W13 8.47 N(8.5, 0.115) 2017-03-27
#> # … with 73 more rows
Created on 2019-10-16 by the reprex package (v0.3.0)