Fable functions - theoretical questions - r

My master thesis is in health forecasting and I'm using R (fable, fabletools, fasster) to implement the methods.
For the theoretical part of the thesis, I need to know the heuristics and the theoretical basis of each function I use.
I have been using Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos and I have already read R documentation on these functions but I still have some doubts.
I need information like what theoretical method they follow (ARIMA, Moving Averages, ANN, etc), the mathematical expression they use and how it is decided which is the best fit (for automatic methods):
I use the following methods and gathered some information about each one.
I'm new in this field and I need some help.
Is this correct? Can anyone add anything else about any of the functions?
ARIMA() - MSARIMA model (meaning an ARIMA model that is sensible to seasonality and can take into account several external regressors:
SNAIVE()- Linear regression with seasonality;
NNETAR() - ANN model;
fasster()
ETS()
Thank you in advance!

The book you cite contains information on how SNAIVE, NNETAR, ETS, and ARIMA forecasts are calculated. It explains that for model classes such as ETS and ARIMA, the AICc is used to select a particular model. It gives equations for all these methods. Please read it.
fasster() is a new method that is not fully documented yet. The readme file (https://github.com/tidyverts/fasster) provides some information, and there is a talk by the author (https://www.youtube.com/watch?v=6YlboftSalY) explaining the state space modelling framework behind it.

Related

Is it possible to build a random forest with model based trees i.e., `mob()` in partykit package

I'm trying to build a random forest using model based regression trees in partykit package. I have built a model based tree using mob() function with a user defined fit() function which returns an object at the terminal node.
In partykit there is cforest() which uses only ctree() type trees. I want to know if it is possible to modify cforest() or write a new function which builds random forests from model based trees which returns objects at the terminal node. I want to use the objects in the terminal node for predictions. Any help is much appreciated. Thank you in advance.
Edit: The tree I have built is similar to the one here -> https://stackoverflow.com/a/37059827/14168775
How do I build a random forest using a tree similar to the one in above answer?
At the moment, there is no canned solution for general model-based forests using mob() although most of the building blocks are available. However, we are currently reimplementing the backend of mob() so that we can leverage the infrastructure underlying cforest() more easily. Also, mob() is quite a bit slower than ctree() which is somewhat inconvenient in learning forests.
The best alternative, currently, is to use cforest() with a custom ytrafo. These can also accomodate model-based transformations, very much like the scores in mob(). In fact, in many situations ctree() and mob() yield very similar results when provided with the same score function as the transformation.
A worked example is available in this conference presentation:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2017).
"Individual Treatment Effect Prediction Using Model-Based Random Forests."
Presented at Workshop "Psychoco 2017 - International Workshop on Psychometric Computing",
WU Wirtschaftsuniversität Wien, Austria.
URL https://eeecon.uibk.ac.at/~zeileis/papers/Psychoco-2017.pdf
The special case of model-based random forests for individual treatment effect prediction was also implemented in a dedicated package model4you that uses the approach from the presentation above and is available from CRAN. See also:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2019).
"model4you: An R Package for Personalised Treatment Effect Estimation."
Journal of Open Research Software, 7(17), 1-6.
doi:10.5334/jors.219

Difference between "mlp" and "mlpML"

I'm using the Caret package from R to create prediction models for maximum energy demand. What i need to use is neural network multilayer perceptron, but in the Caret package i found out there's 2 of the mlp method, which is "mlp" and "mlpML". what is the difference between the two?
I have read description from a book (Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization) but it still doesnt answer my question.
Caret has 238 different models available! However many of them are just different methods to call the same basic algorithm.
Besides mlp there are 9 other methods of calling a multi-layer-perceptron one of which is mlpML. The real difference is only in the parameters of the function call and which model you need depends on your use case and what you want to adapt about the basic model.
Chances are, if you don't know what mlpML or mlpWeightDecay,etc. does you are fine to just use the basic mlp.
Looking at the official documentation we can see that:
mlp(size) while mlpML(layer1,layer2,layer3) so in the first method you can only tune the size of the multi-layer-perceptron while in the second call you can tune each layer individually.
Looking at the source code here:
https://github.com/topepo/caret/blob/master/models/files/mlp.R
and here:
https://github.com/topepo/caret/blob/master/models/files/mlpML.R
It seems that the difference is that mlpML allows several hidden layers:
modelInfo <- list(label = "Multi-Layer Perceptron, with multiple layers",
while mlp has one single layer with hidden units.
The official documentation also hints at this difference. In my opinion, it is not particularly useful to have many different models that differ only very slightly, and the documentation does not explain those slight differences well.

How to create response surface using random forest model in R?

I have a made a rf model in R having six predictors and a response. The predictive model seems to be good enough but we also wanted to generate a response surface for this model.
attach(al_mf)
library(randomForest)
set.seed(1)
rfalloy=randomForest(Mf~.,data=al_mf,mtry=6,importance=TRUE)
rfalloy
rfpred=predict(rfalloy,al_mf$Mf)
rfpred
sse=sum((rfpred-mean(al_mf$Mf))^2)
sse
ssr=sum((rfpred-al_mf$Mf)^2)
ssr
Rsqaure=1-(ssr/(sse+ssr))
Rsqaure
importance(rfalloy)
At a general level, since you haven't provided too many specifics about exactly what you are looking for in your response surface, here are a few hopefully helpful starting points:
Have you taken a look at rsm? This documentation provides some good use cases for the package.
These in-class notes from a University of New Mexico stats lecture are full of code examples related to response surfaces. Just check out the table of contents and you'll probably find what you're looking for.
This StackOverflow post also provides an example using the rgl package.

Setting Contrasts for ANOVA in R

I've been attempting to perform an ANOVA in R recently on the attached data frame.
My question revolves around the setting of contrasts.
My design is a 3x5 within-subjects design.
There are 3 visual conditions under 'Circle1' and 5 audio under 'Beep1'.
Does anyone have any idea how I should set the contrasts? This is something I'm unfamiliar with as I'm making the transition from point and click stats in SPSS to coded in R.
Thanks for your time
Data file:
Reiterating my answer from another stackoverflow question that was flagged as similar, since you didn't provide any code, you might start by having a look at the contrast package in R. As they note in the document:
"The purpose of the contrast package is to provide a standardized interface for testing linear combinations of parameters from common regression models. The syntax mimics the contrast. Design function from the Design library. The contrast class has been extended in this package to linear models produced using the functions lm, glm, gls, lme and geese."
There is also a nice little tutorial here by Dr. William King who talks about factorial between subjects ANOVA and also includes an abundance of R code. This is wider scoped than you question but would be a great place to start (just to get context).
Finally, here is another resource that you can refer to which talks about setting up orthogonal contrasts in R.

On the issue of automatic time series fitting using R

we have to fit about 2000 or odd time series every month,
they have very idiosyncratic behavior in particular, some are arma/arima, some are ewma, some are arch/garch with or without seasonality and/or trend (only thing in common is the time series aspect).
one can in theory build ensemble model with aic or bic criterion to choose the best fit model but is the community aware of any library which attempts to solve this problem?
Google made me aware of the below one by Rob J Hyndman
link
but are they any other alternatives?
There are two automatic methods in the forecast package: auto.arima() which will handle automatic modelling using ARIMA models, and ets() which will automatically select the best model from the exponential smoothing family (including trend and seasonality where appropriate). The AIC is used in both cases for model selection. Neither handles ARCH/GARCH models though. The package is described in some detail in this JSS article: http://www.jstatsoft.org/v27/i03
Further to your question:
When will it be possible to use
forecast package functions, especially
ets function, with high dimensional
data(weekly data, for example)?
Probably early next year. The paper is written (see robjhyndman.com/working-papers/complex-seasonality) and we are working on the code now.
Thanks useRs, I have tried the forecast package, that too as a composite of arima and ets, but not to much acclaim from aic or bic(sbc), so i am now tempted to treat each of the time series to its own svm(support vector machine) because of its better genralization adaptability and also being able to add other variables apart from lags and non linear kernel functions
Any premonitions?

Resources