Ensemble machine learning model with NNETAR and BRNN - r

I used the forecast package to forecast the daily time-series of variable Y using its lag values and a time series of an external parameter X. I found nnetar model (a NARX model) was the best in terms of overall performance. However, I was not able to get the prediction of peaks of the time series well despite my various attempts with parameter tuning.
I then extracted the peak values (above a threshold) of Y (and of course this is not a regular time series anymore) and corresponding X values and tried to fit a regression model (note: not an autoregression model) using various models in carat package. I found out the prediction of peak values using brnn(Bidirectional recurrent neural networks) model just using X values is better than that of nnetar which uses both lag values and X values.
Now my question is how do I go from here to create ensamples of these two models (i.e whenever the prediction using brnn regression model ( or any other regression model) is better I want to replace the prediction using nnetar and move forward - I am mostly concerned about the peaks)? Is this a commonly used approach?

Instead of trying to pick one model that would be the superior at anytime, it's typically better to do an average of the models, in order to include as many individual views as possible.
In the experiments I've been involved in, where we tried to pick one model that would outperform, based on historical performance, it's typically shown that a simple average was as good or better. Which is in line with the typical results on this problem: https://otexts.com/fpp2/combinations.html
So, before you try to go more advanced at it by using trying to pick a specific model based on previous performance, or by using an weighted average, consider doing a simple average of the two models.
If you want to continue with a sort of selection/weighted averaging, try to have a look at the FFORMA package in R: https://github.com/pmontman/fforma
I've not tried the specific package (yet), but have seen promising results in my test using the original m4metalearning package.

Related

Imputation missing data for MLM in R

Maybe anyone can help me with this question. I conducted a follow-up study and obviously now have to face missing data. Now I am considering how to impute the missing data at best using MLM in R (f.e. participants concluded the follow up 2 survey, but not the follow up 1 survey, therefore I am missing L1 predictors for my longitudinal analysis).
I read about Multiple Imputation of multilevel data using the pan package (Schafer & Yucel, 2002) and came across the following code:
imp <- panImpute(data, formula = fml, n.burn = 1000, n.iter = 100, m = 5)
Yet, I have troubles understanding it completely. Is there maybe another way to impute missing data in R? Or maybe somebody could illustrate the process of the imputation method a bit more detailed, that would be so great! Do I have to conduct the imputation for every model I built in my MLM? (f.e. when I compared, whether a random intercept versus a random intercept and random slope model fits better for my data, do I have to use the imputation code for every model, or do I use it at the beginning of all my calculations?)
Thank you in advance
Is there maybe another way to impute missing data in R?
There are other packages. mice is the one that I normally use, and it does support multilevel data.
Do I have to conduct the imputation for every model I built in my MLM? (f.e. when I compared, whether a random intercept versus a random intercept and random slope model fits better for my data, do I have to use the imputation code for every model, or do I use it at the beginning of all my calculations?)
You have to specify the imputation model. Basically that means you have to tell the software which variables are predicted by which other variables. Since you are comparing models with the same fixed effect, and only changing the random effects (in particular comparing models with and without random slopes), the imputation model should be the same in both cases. So the workflow is:
perform the imputations;
run the model on all the imputed datasets,
pool the results (typically using Rubin's rules)
So you will need to do this twice, to end up with 2 sets of pooled results - one for each model. The software should provide functionality for doing all of this.
Having said all of that, I would advise against choosing your model based on fit statistics and instead use expert knowledge. If you have strong theoretical reasons for expecting slopes to vary by group, then include random slopes. If not, then don't include them.

How do I test for equality in parameters for two time series data sets described by the same NLS model?

I am currently in the midst of fitting curves to sales volumes of 2 successive product generations over an observation period. I am looking at two different markets. Through some research, I have identified a nonlinear model that fit the data well in both markets. To estimate the parameters of the model, I am using R and the package, nls2.
I have obtained regression output for the markets two individually.
Now, I would like to test whether the estimated parameters are different from each other on the two markets. That is, I would like to test each model parameter from one market against the corresponding parameter estimate from the other market.
Is there any function in R or in the nls2 package, that will allow me to do that? Or is there a smarter method?
Thank you in advance!
You can compare the parameters using MANOVA.

Quantile Regression with Time-Series Models (ARIMA-ARCH) in R

I am working on quantile forecasting with time-series data. The model I am using is ARIMA(1,1,2)-ARCH(2) and I am trying to get quantile regression estimates of my data.
So far, I have found "quantreg" package to perform quantile regression, but I have no idea how to put ARIMA-ARCH models as the model formula in function rq.
rq function seems to work for regressions with dependent and independent variables but not for time-series.
Is there some other package that I can put time-series models and do quantile regression in R? Any advice is welcome. Thanks.
I just put an answer on the Data Science forum.
It basically says that most of the ready made packages are using so called exact test based on assumption on the distribution (independent identical normal-Gauss distribution, or wider).
You also have a family of resampling methods in which you simulate a sample with a similar distribution of your observed sample, perform your ARIMA(1,1,2)-ARCH(2) and repeat the process a great number of times. Then you analyze this great number of forecast and measure (as opposed to compute) your confidence intervals.
The resampling methods differs in the way to generate the simulated samples. The most used are:
The Jackknife: in which you "forget" one point, that is you simulate a n samples of size n-1 (if n is the size of the observed sample).
The Bootstrap: in which you simulate a sample by taking n values of the original sample with replacements: some will be taken once, some twice or more, some never,...
It is a (not easy) theorem that the expectation of the confidence intervals, as most of the usual statistical estimators, are the same on the simulated sample than on the original sample. With the difference that you can measure them with a great number of simulations.
Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named "What topics can I ask about here?" and "What types of questions should I avoid asking?". And more importantly, please read the Stack Overflow question checklist. You might also want to learn about Minimal, Complete, and Verifiable Examples.
I can try to address your question, although this is hard since you don't provide any code/data. Also, I guess by "put ARIMA-ARCH models" you actually mean that you want to make an integrated series stationary using an ARIMA(1,1,2) plus an ARCH(2) filters.
For an overview of the R time-series capabilities you can refer to the CRAN task list.
You can easily apply these filters in R with an appropriate function.
For instance, you could use the Arima() function from the forecast package, then compute the residuals with residuals() from the stats package. Next, you can use this filtered series as input for the garch() function from the tseries package. Other possibilities are of course possible. Finally, you can apply quantile regression on this filtered series. For instance, you can check out the dynrq() function from the quantreg package, which allows time-series objects in the data argument.

Mixed Logit fitted probabilities in RSGHB

My question has to do with using the RSGHB package for predicting choice probabilities per alternative by applying mixed logit models (variation across respondents) with correlated coefficients.
I understand that the choice probabilities are simulated on an individual level and in order to get preference share an average of the individual shares would do. All the sources I have found treat each prediction as a separate simulation which makes the whole process cumbersome if many predictions are needed.
Since one can save the respondent specific coefficient draws wouldn't it be faster to simply apply the logit transform to each each (vector of) coefficient draw? Once this is done new or existing alternatives could be calculated faster than rerunning a whole simulation process for each required alternative. For the time being using a fitted() approach will not help me understand how prediction actually works.

In R, how to add an external variable to an ARIMA model?

Does anyone here know how I can specify additional external variables to an ARIMA model ?
In my case I am trying to make a volatility model and I would like to add the squared returns to model an ARCH.
The reason I am not using GARCH models, is that I am only interested in the volatility forecasts and the GARCH models present their errors on their returns which is not the subject of my study.
I would like to add an external variable and see the R^2 and p-values to see if the coefficient is statistically significant.
I know that this is a very old question but for people like me who were wondering this you need to use cbind with xreg.
For Example:
Arima(X,order=c(3,1,3),xreg = cbind(ts1,ts2,ts3))
Each external time series should be the same length as the original.

Resources