In R, how to add an external variable to an ARIMA model? - r

Does anyone here know how I can specify additional external variables to an ARIMA model ?
In my case I am trying to make a volatility model and I would like to add the squared returns to model an ARCH.
The reason I am not using GARCH models, is that I am only interested in the volatility forecasts and the GARCH models present their errors on their returns which is not the subject of my study.
I would like to add an external variable and see the R^2 and p-values to see if the coefficient is statistically significant.

I know that this is a very old question but for people like me who were wondering this you need to use cbind with xreg.
For Example:
Arima(X,order=c(3,1,3),xreg = cbind(ts1,ts2,ts3))
Each external time series should be the same length as the original.

Related

Changing coefficients in logistic regression

I will try to explain my problem as best as i can. I am trying to externally validate a prediction model, made by one of my colleagues. For the external validation, I have collected data from a set of new patients.
I want to test the accuracy of the prediction model on this new dataset. Online i have found a way to do so, using the coef.orig function to extract de coefficients of the original prediction model (see the picture i added). Here comes the problem, it is impossible for me to repeat the steps my colleague did to obtain the original prediction model. He used multiple imputation and bootstrapping for the development and internal validation, making it very complex to repeat his steps. What I do have, is the computed intercept and coefficients from the original model. Step 1 from the picture i added, could therefor be skipped.
My question is, how can I add these coefficients into the regression model, without the use of the 'coef()' function?
Steps to externally validate the prediction model:
The coefficients I need to use:
I thought that the offset function would possibly be of use, however this does not allow me to set the intercept and all the coefficients for the variables at the same time

Imputation missing data for MLM in R

Maybe anyone can help me with this question. I conducted a follow-up study and obviously now have to face missing data. Now I am considering how to impute the missing data at best using MLM in R (f.e. participants concluded the follow up 2 survey, but not the follow up 1 survey, therefore I am missing L1 predictors for my longitudinal analysis).
I read about Multiple Imputation of multilevel data using the pan package (Schafer & Yucel, 2002) and came across the following code:
imp <- panImpute(data, formula = fml, n.burn = 1000, n.iter = 100, m = 5)
Yet, I have troubles understanding it completely. Is there maybe another way to impute missing data in R? Or maybe somebody could illustrate the process of the imputation method a bit more detailed, that would be so great! Do I have to conduct the imputation for every model I built in my MLM? (f.e. when I compared, whether a random intercept versus a random intercept and random slope model fits better for my data, do I have to use the imputation code for every model, or do I use it at the beginning of all my calculations?)
Thank you in advance
Is there maybe another way to impute missing data in R?
There are other packages. mice is the one that I normally use, and it does support multilevel data.
Do I have to conduct the imputation for every model I built in my MLM? (f.e. when I compared, whether a random intercept versus a random intercept and random slope model fits better for my data, do I have to use the imputation code for every model, or do I use it at the beginning of all my calculations?)
You have to specify the imputation model. Basically that means you have to tell the software which variables are predicted by which other variables. Since you are comparing models with the same fixed effect, and only changing the random effects (in particular comparing models with and without random slopes), the imputation model should be the same in both cases. So the workflow is:
perform the imputations;
run the model on all the imputed datasets,
pool the results (typically using Rubin's rules)
So you will need to do this twice, to end up with 2 sets of pooled results - one for each model. The software should provide functionality for doing all of this.
Having said all of that, I would advise against choosing your model based on fit statistics and instead use expert knowledge. If you have strong theoretical reasons for expecting slopes to vary by group, then include random slopes. If not, then don't include them.

When predicting using R ARIMA object, how to declare the time series' history?

Suppose I fit AR(p) model using R arima function from stats package. I fit it using a sample x_1,...,x_n. In theory, when predicting x_{n+1} using this model, it needs an access x_n,...x_{n-p}.
How does the model know which observation I want to predict? What if I wanted to actually predict x_n based on x_{n-1},...,x_{n-p-1} and how my code would differ in this case? Can I make in-sample forecasts, similar to Python's functionality?
If my questions imply that I think about forecasting in a wrong way, please kindly correct my understanding of the subject.

Ensemble machine learning model with NNETAR and BRNN

I used the forecast package to forecast the daily time-series of variable Y using its lag values and a time series of an external parameter X. I found nnetar model (a NARX model) was the best in terms of overall performance. However, I was not able to get the prediction of peaks of the time series well despite my various attempts with parameter tuning.
I then extracted the peak values (above a threshold) of Y (and of course this is not a regular time series anymore) and corresponding X values and tried to fit a regression model (note: not an autoregression model) using various models in carat package. I found out the prediction of peak values using brnn(Bidirectional recurrent neural networks) model just using X values is better than that of nnetar which uses both lag values and X values.
Now my question is how do I go from here to create ensamples of these two models (i.e whenever the prediction using brnn regression model ( or any other regression model) is better I want to replace the prediction using nnetar and move forward - I am mostly concerned about the peaks)? Is this a commonly used approach?
Instead of trying to pick one model that would be the superior at anytime, it's typically better to do an average of the models, in order to include as many individual views as possible.
In the experiments I've been involved in, where we tried to pick one model that would outperform, based on historical performance, it's typically shown that a simple average was as good or better. Which is in line with the typical results on this problem: https://otexts.com/fpp2/combinations.html
So, before you try to go more advanced at it by using trying to pick a specific model based on previous performance, or by using an weighted average, consider doing a simple average of the two models.
If you want to continue with a sort of selection/weighted averaging, try to have a look at the FFORMA package in R: https://github.com/pmontman/fforma
I've not tried the specific package (yet), but have seen promising results in my test using the original m4metalearning package.

Interpreting a log transformed multiple regression model in R

I am trying to build a model that can predict SalePrice using independent variables that denote various house features. I used Multiple Regression Model, and also found that some predictor variables needed to be transformed, as well as the response variable.
My final model is as follows;
Model Output
How do I interpret this result? Can I conclude that a one unit increase in Years Since Remodel causes a -2.905e-03 change in log of Sale Price? How do I make this interpretation easier to understand? Thank you.

Resources