hts accuracy of forecast and method of splitting training and testing - r

I find 2 ways of training and testing hts and I am not sure which one is appropriate
1. In this question (link given)
MASE Extraction Hierarchical Data ('hts' and 'forecast' packages R)
you have divided the time series into train and test and applied hts
2. In this question (link provided)
Hierarchical Time Series
you are applying hts to the entire ts and then splitting into train and test.
My question is which of these is correct? please help
Also, I would like to know how can i check the accuracy of the forecast i make. That is, how far into the future can I determine that the forecast would statistically be appropriate using the hts model I choose to use. Any pointers to any articles or an example would be helpful. I would like to determine if i should forecast for 4,8, or 12 weeks ahead.
Thanks in advance

Related

When predicting using R ARIMA object, how to declare the time series' history?

Suppose I fit AR(p) model using R arima function from stats package. I fit it using a sample x_1,...,x_n. In theory, when predicting x_{n+1} using this model, it needs an access x_n,...x_{n-p}.
How does the model know which observation I want to predict? What if I wanted to actually predict x_n based on x_{n-1},...,x_{n-p-1} and how my code would differ in this case? Can I make in-sample forecasts, similar to Python's functionality?
If my questions imply that I think about forecasting in a wrong way, please kindly correct my understanding of the subject.

R - Forecast multiple time-series (15K Products)

Hi Stack Overflow community.
I have 5 years of weekly price data for more than 15K Products (5*15K**52 records). Each product is a univariate time series. The objective is to forecast the price of each product.
I am familiar with the univariate time series analysis in which we can visualize each ts series, plot its ACF, PACF, and forecast the series. But, Univariate time series analysis is not possible in this case when I have 15K different time-series, can not visualize each time series, its ACF, PACF, and forecast separately of each product, and make a tweak/decision on it.
I am looking for some recommendations and directions to solve this multi-series forecasting problem using R (preferable). Any help and support will be appreciated.
Thanks in advance.
I would suggest you use auto.arima from the forecast package.
This way you don't have to search for the right ARIMA model.
auto.arima: Returns best ARIMA model according to either AIC, AICc or BIC value. The function conducts a search over possible models within the order constraints provided.
fit <- auto.arima(WWWusage)
plot(forecast(fit,h=20))
Instead of WWWusage you could put one of your time series, to fit an ARIMA model.
With forecast you then perform the forecast - in this case 20 time steps ahead (h=20).
auto.arima basically chooses the ARIMA parameters for you (according to AIC - Akaike information criterion).
You would have to try, if it is too computational expensive for you. But in general it is not that uncommon to forecast that many time series.
Another thing to keep in mind could be, that it might after all not be that unlikely, that there is some cross-correlation in the time series. So from a forecasting precision standpoint it could make sense to not treat this as a univariate forecasting problem.
The setting it sounds quite similar to the m5 forecasting competition that was recently held on Kaggle. Goal was to point forecasts the unit sales of various products sold in the USA by Walmart.
So a lot of time series of sales data to forecast. In this case the winner did not do a univariate forecast. Here a link to a description of the winning solution. Since the setting seems so similar to yours, it probably makes sense to read a little bit in the kaggle forum of this challenge - there might be even useful notebooks (code examples) available.

Multivariate ARIMA (MARIMA) modelling in R

I am currently using the Marima package for R invented by Henrik Spliid in order to forecast multivariate time series with ARIMA.
Overview can be found here:
https://cran.r-project.org/web/packages/marima/marima.pdf
http://orbit.dtu.dk/files/123996117/marima.anv.talk.pdf
When using the Marima function, it is required to define both the order of AR(p) and MA(q) first.
My question is, how can I determine appropriate values for p and q?
I know when it comes to univariate ARIMA analysis, that auto.arima gives a good suggestion for p and q. However, when I use auto.arima for every single univariate time series I want to analyze, there are (slightly) different suggestions for each time series. (For example (2,2,1) for the first, (1,1,1) for the second and so on)
Since I want to analyze all of the time series combined in the multivariate ARIMA model and I only can choose one value for each p and q (if I understood it correctly), I wonder how I can choose those values the most accurate way.
Could I just try to run the model a couple times and see what values for p and q work best (e.g. by testing the residuals of the forecast)?
What are your suggestions?
I would appreciate any help!

R Forecast package for anomaly detection

I'm trying to detect anomalies from training data.
First, I train a model according to a given time series using the forecast package:
train <- forecast(ts(sin((2*pi)*seq(from=0,to=10,by=0.01)),frequency=100))
Then, once I get new time series i try seeing how well they fit the trained data, and by that finding anomalies.
Currently I'm using the accuracy function which doesn't seem to be the right tool for the job:
test <- ts(sin((2*pi)*seq(from=0,to=20,by=0.01))+sin((3*pi)*seq(from=0,to=20,by=0.01)),frequency=100)accuracy(train,test)
accuracy(train,test)
I also thought of somehow analyzing the residuals of the new dataset according to the trained model.
Does anyone have any good ideas as to how to optimize this task?

double or triple seasonality ARIMA modelling in R

I'm trying to find whether there is any package in R that deals with multiple seasonality for ARIMA models and, if not, if there is any way of going through it.
I have an hourly series and would like to test seasonality for lags=c(24,7*24,30*24)
Many thanks in advance.
You can use TBATS model, and her's an example

Resources