Apply Ensemble for timeseries forecasting - r

I'm using multiple timeseries models like ARIMA, holtwinters, prophet. Now I want to do ensemble of all this and produce the results . I need suggestions what is the best way to apply ensemble on timeseries. Please help. I'm new to this.

There is a recent package called tsensembler (full disclose, I am the author).
link for documentation with useful examples
link for github
It essentially trains a set of regression models for predicting the next value(s) of the time series, and combines them automatically using a metalearning approach. The scientific basis was presented in the conference ECML-PKDD2017, and it won the best student machine learning paper award.

I suggest exploring the opera package.
install.packages("opera")
Here is the vignette:
https://cran.r-project.org/web/packages/opera/vignettes/opera-vignette.html

The following link provides sample code and walks through using two ensemble packages in R. The packages are 'opera' and 'forecastHybrid'.
Opera & forecastHybrid

Related

Is it possible to visualize an individual tree from a random forest obtained via tidymodels?

Good day,
for presentation purposes I would like to plot a couple of decision trees from a random forest (with about 100 trees). I found a post from last year where its clear is not really possible or there is not an function using tidymodels. R: Tidymodels: Is it possible to plot the trees for a random forest model in tidy models?
I´m wondering if somebody has found a way! I remember I could easily do this using the "Caret" package, but tidymodels makes everything so convenient I was hoping for someone with a solution.
Many thanks!
Summarizing what trees can be ploted with tidymodels based in comments comments and other Stackoverflow posts
Decision trees. There are some options but the function rpart.plot()seems to be the most popular.
Individual tree from a random forest. Doesn´t seem to be possible to plot one (yet) using the tidymodel environment. See this post: here
XGBoost models: See Julia comment:
You should be able to use a function like xgb.plot.tree() with a
trained tidymodels workflow or parsnip model by extracting out the
underlying object created with the xgboost engine. You can do this
with extract_fit_engine()

Stanford dataset to CoreML

I have a dataset, downloaded from link.
I know about coremltool (created by Apple).
The question is:
is it possible to convert Stanford dataset to CoreML ?
If yes - can somebody give me instructions ?
Thanks in advance !
This question is asked so often, that finally I've decided to draw a diagram.
Explanation:
Dataset is a "fuel" that you put into your model to make it work.
Model is a machine learning algorithm: neural network, decision tree etc.
Supported ML frameworks and models are listed here together with the instructions for conversion.
You can make your own .mlmodel file using your own data set with a python script and a python library called coremltools. You can train your model using sklearn, keras, etc. and can customize what is uses to train like SVM, kNN, regression, and so on. Then you save it as a .mlmodelfile and drop that into your project. This video is helpful:
https://youtu.be/T4t73CXB7CU

Cointelation (hybrid method between COINTegration and corrELATION technique) on two series using R

Cointelation is a a hybrid method between COINTegration and corrELATION techniques.
Ref:
http://wilmott.com/pdfs/MVMC_CSL_05_2012.pdf
There are some great packages in R on cointegration but I could not find anything on cointelation on http://www.rseek.org/ or in http://cran.r-project.org/web/views/
I have two time series
library(quantmod)
getSymbols("YHOO",src="google")
getSymbols("GOOG",src="yahoo")
How can I perform a cointelation on the two series using R?
Is there any package, functions, code or references in R dealing with cointelation?
I would be grateful for any help on this.
There is a new article which came out october 23rd. It can be found here:
http://onlinelibrary.wiley.com/doi/10.1002/wilm.10252/abstract
Have you done some research before posting?
To do things manually:http://cran.r-project.org/web/packages/tsDyn/vignettes/ThCointOverview.pdf
Otherwise you can check the pair trading package.

"auto.arima" in SAS?

I used to run arima model in R using "auto.arima" to identify the best arima model that fits the data. Even without it, it's easy in R to write a function to perform similar task. However, I have googled for the past few days, and I can't find a similar procedure in SAS. Does anyone know if there is a "auto.arima" in SAS? Or do I have to write one by myself? Thank you!
Edit:
After days of searching online, the closest one that I can find is Automatic Model Selection in time series forecasting package. However, that function is the one using GUI, and still one has to manually select all the different models to test. Does anyone know a command line procedure or package to do this? Thank you.
SAS has proc arima which is part of the SAS/ETS module (licensed seperately). You can use either the Enterprise Guide proc arima node for a GUI interface to it, or you can use Solutions->Analysis->Time Series Analysis for a base SAS interface. The base sas interface is what I usually use, it has the advantage of comparing many models other than just arima for a fit.
To check to see if you have the correct license run the following code:
proc setinit;
run;
You should see something like this in the results if you have it licensed:
---SAS/ETS (01 JAN 2020)
SAS HpF for high performance forecasting is the best in market for time series forecasting nothing can beat its accuracy when u are trying to generate forecast for multiple products ...
Proc hpfdiagnose followed by proc hpfengine you will hate auto.arima after using this
You might want to give PROC FORECAST a try.
I'm working on a similar problem where I have about 6,000 separate time series to forecast so modeling each one individually is out of the question. You can specify a BY variable in PROC FORECAST that lets you forecast many series at once pretty quickly (it ran my moderately large dataset in less than 3 seconds). And if you choose the STEPAR method, it will fit the best autoregressive model it can find to your data.
Here's a good overview of the FORECAST procedure: http://www.okstate.edu/sas/v8/saspdf/ets/chap12.pdf
Still not as awesome as auto.arima in R, but gets the job done.
Good luck!
SAS has high performance forecasting procedures (PROC HPFDIAGNOSE+PROC HPFENGINE), which not only selects the best ARIMA model, but can also select the best among ARIMA, ESM, UCM, IDM, combination models, and external models, etc. You can either let it automatically picks the best based on default selection criterion, or customize the selection criterion. There is a procedure family to customize everything: PROC HPFDIAGNOSE, PROC HPFENGINE, PROC ARIMASPEC, etc. If you want to do more flexible time series analysis plus coding, you can also use PROC TIMEDATA with all the built-in time series packages, which allows you to program whatever you want and also do all the automatic modeling.
Like being mentioned above, it is the best in market for time series forecasting, and nothing can beat its accuracy when you are trying to generate forecasts for multiple series. However, it usually licensed with SAS Forecast Server or SAS Forecast Studio, which are enterprised forecasting solutions with GUI. It's understandable since other forecasting solutions built on R and Python which can handle automatic
parallelization and automatic forecasting also charge money.
For the cloud computing version, there is also PROC TSMODEL and Visual Forecasting version, which has both forecast accuracy and computation performance advantages. However, it is also for enterprise use and pricey. Afterall, it is targeted to markets that require forecasting for thousands or millions of time series.
For free versions, maybe the closest one would be PROC FORECAST.

Analysis of complex survey design with multiple plausible values

I am working with several large databases (e.g. PISA and NAEP) that use a complex survey design with replicate weights and multiple plausible values. I can address the former using the survey package. However, does there exist an R package/function to analyze the latter?
For reference, I have found this article to provide a good overview of the issue: http://www.ierinstitute.org/fileadmin/Documents/IERI_Monograph/IERI_Monograph_Volume_02_Chapter_01.pdf
I'm not sure how the general idea of 'plausible values' differs from using multiple imputation to generate several sets of imputed values (such as the the Amelia package does). But Thomas Lumley's mitools package can be used to combine the various sets of imputed values, and it might be the case that it can be used to combine your sets of plausible values to obtain the 'correct' standard errors of the estimates.
Daniel Caro develop an R package for large scale assessments. You can find it here http://cran.r-project.org/web/packages/intsvy/index.html
This is code example using the regression command, over the plausible values on Mathemathics:
## Not run:
# Table I.2.3a, p. 305, International Report 2012
pisa.reg.pv(pvlabel="MATH", x="ST04Q01", by = "IDCNTRYL", data=pisa)
Although, I'm not sure if this package can be used to analyze NAEP data.
I hope this fulfill your purposes; at least partially.
As of survey version 3.36 there's withPV
data(pisamaths, package="mitools")
des<-svydesign(id=~SCHOOLID+STIDSTD, strata=~STRATUM, nest=TRUE,
weights=~W_FSCHWT+condwt, data=pisamaths)
options(survey.lonely.psu="remove")
results<-withPV(list(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH),
data=des,
action=quote(svyglm(maths~ST04Q01*(PCGIRLS+SMRATIO)+MATHEFF+OPENPS, design=des)))
summary(MIcombine(results))

Resources