Arima.sim issues in R - r

I am working on making a prediction in R using time-series models.
I used the auto.arima function to find a model for my dataset (which is a ts object).
fit<-auto.arima(data)
I can then plot the results of the prediction for the 20 following dates using the forecast function:
plot(forecast(fit,h=20))
However I would like to add external variables and I cannot do it using forecast because it is kind of a black box to me as I am new to R.
So I tried to mimic it by using the arima.sim function and a problem arose:
HOW TO INITIALIZE THIS FUNCTION ?
I got the model by setting model=as.list(coef(fit)) but the other parameters are still obscure to me.
I went through hundreds of page including in stackoverflow but nobody seems to really know what is going on.
How is it calculated ? Like why does n.start (the burn-in period) must have ma+ar length and not only a max(ar,ma) length ? What is exactly start.innov?
I thought I understood when there is only an AR part but I cannot reproduce the results with an AR+MA filter.
My understanding as for the AR is concerned is that start.innov represent the errors between a filtered zero-signal and the true signal, is it true ?
Like if you want to have an ar of order 2 with initial conditions (a1,a2) you need to set
start.innov[1]=a1-ar1*0-ar2*0=a1
start.innov[2]=a2-ar1*start.innov[1]
and innov to rep(0,20) but what to do when facing an arima function how do you set the innov to get exactly the same curbs as forecast does ?
thanks for your help !!!

You seem to be confused between modelling and simulation. You are also wrong about auto.arima().
auto.arima() does allow exogenous variables via the xreg argument. Read the help file. You can include the exogenous variables for future periods using forecast.Arima(). Again, read the help file.
It is not clear at all why you are referring to arima.sim() here. It is for simulating ARIMA processes, not for modelling or forecasting.

Related

How can I get My.stepwise.glm to return the model outside the console?

I asked this question on RCommunity but haven't had anyone bite... so I'm here!
My current project involves me predicting whether some trees will survive given future climate change scenarios. Against better judgement (like using Maxent) I've decided to pursue this with a GLM, which requires presence and absence data. Everytime I generate my absence data (as I was only given presence data) using randomPoints from dismo, the resulting GLM model has different significant variables. I found a package called My.stepwise that has a My.stepwise.glm function (here: My.stepwise.glm: Stepwise Variable Selection Procedure for Generalized Linear... in My.stepwise: Stepwise Variable Selection Procedures for Regression Analysis) , and this goes through a forward/backward selection process to find the best variables and returns a model ready for you.
My problem is that I don't want to run My.stepwise.glm just once and use the model it spits out for me. I'd like to run it roughly 100 times with different pseudo-absence data and see which variables it returns, then take the most frequent variables and move forward with building my model using those. The issue is that the My.stepwise.glm function ends by 'print(summary(initial.model))' and I would like to be able to access the output similar to how step() returns a list, where you can then say 'step$coefficients' and have the function coefficients return as numerics. Can anyone help me with this?

Holt-Winters in r with function hw. Question on value of beta parameter and forecasting phase

I have used the function hw to analyze a time series
fcast<-hw(my_series,h=12,level=95,seasonal="multiplicative",damped=TRUE,lambda=NULL)
By looking at the fcast$model$par I observe the values of alpha, beta, gamma, phi, and the initial states.
I've also looked at the contents of fcast$model$states to see the evolution of all the values. I've tried to reproduce the results in Excel in order to understand the whole procedure.
To achieve the same values of b (trend) as in fcast$model$states I observe that I have to use a formula like the one in the bibliography about the Holt-Winters method:
b(t)=beta2*(l(t)-l(t-1)+(1-beta2)*phi*b(t-1)
But, if in fcast$model$par beta=0.08128968, I find that in order to achieve the same results I have to use beta2=0.50593541.
What's the reason for that? I don't see any relationship between beta and beta2.
I have also found that in order to get the same forecast as the one obtained with the hw function I have to use the following formulas once the data are finished:
l(t)=l(t-1)+b(t-1)
b(t)=phi*b(t-1)
^y(t)=(l(t-1)+b(t-1))*s(t-m)
I haven't found any bibliography on this forecasting phase, explaining that some parameters are no longer used. For instance, in this case phi is still used for b(t), but not used anymore for l(t).
Can anyone refer to any bibliography where I can find this part explained?
So in the end I've been able to reproduce the whole set of data in Excel, but there's a couple of steps I would like to understand better.
Thank you!!

R Neural Network Forecasting - Aspect of Randomness?

I have been trying different methods of forecasting and stumbled upon the
nnetar()
function in the forecast package of R. I soon quickly realized that while this does work to forecast, it gives me something different every time I run it. Could anybody help to explain why this happens? I thought I had a decent understanding of neural nets and I don't see what could make drastic differences in forecasts, unless the nnetar() function randomly selects the number of nodes or something. Any help?
20, by default, networks are trained with random starting values and then their predictions are averaged when you use the function.
Because the function uses random starting values for each run, the forecasts will be different for each call too.
EDIT: new question from OP in the comments
In order to control the function and get the same random starting values each time, you can simple use the function set.seed() with the value of your choice.
For example:
set.seed(666)
forecast(nnetar(...),...)
set.seed(666)
forecast(nnetar(...),...)
set.seed(666)
forecast(nnetar(...),...)
will give the same results every time you run it with this "seed" value (666). You have to run set.seed(666) before every run of the rest of you code of course.
EDIT 2: new new question from OP in the comments
In order to have 100 different networks to fit with random starting weights:
nnetar(...,repeats=100,...)

R - Predicting using the arimax funciton of the TSA package

I am trying to fit a transfer function model using R in order to apply the fitted model to a validation set of data, because SPSS doesn't allow me to (or I don't know how to) compute point forecasts just like the function Arima() from forecast package does. It does let me apply the model, but it does not use the dependet variable's lagged values, that's why I am trying R.
Anyone know how I could get those type of "updated" or validation forecasts using the arimax() function? I am not looking for the following type of predictions:
predict(vixari011, n.ahead=12)
But rather these:
Arima(test$VIX, model = vixari)
From what I have been reading there is no prediction function for the arimax() function, any ideas about how I could forecast to evaluate point-by-point performance? I can just think of computing manually using a spreadsheet...
I had the same problem. I know this post is old but this can help someone.
I used this it worked just fine
forecast(fitted(arimax_ts_model), h=11)

R - How to get one "summary" prediction map instead for 5 when using 5-fold cross-validation in maxent model?

I hope I have come to the right forum. I'm an ecologist making species distribution models using the maxent (version 3.3.3, http://www.cs.princeton.edu/~schapire/maxent/) function in R, through the dismo package. I have used the argument "replicates = 5" which tells maxent to do a 5-fold cross-validation. When running maxent from the maxent.jar file directly (the maxent software), an html file with statistics will be made, including the prediction maps. In R, an html file is also made, but the prediction maps have to be extracted afterwards, using the function "predict" in the dismo package in r. When I do this, I get 5 maps, due to the 5-fold cross-validation setting. However, (and this is the problem) I want only one output map, one "summary" prediction map. I assume this is possible, although I don't know how maxent computes it. The maxent tutorial (see link above) says that:
"...you may want to avoid eating up disk space by turning off the “write output grids” option, which will suppress writing of output grids for the replicate runs, so that you only get the summary statistics grids (avg, stderr etc.)."
A list of arguments that can be put into R is found in this forum https://groups.google.com/forum/#!topic/maxent/yRBlvZ1_9rQ.
I have tried to use the argument "outputgrids=FALSE" both in the maxent function itself, and in the predict function, but it doesn't work. I still get 5 maps, even though I don't get any errors in R.
So my question is: How do I get one "summary" prediction map instead of the five prediction maps that results from the cross-validation?
I hope someone can help me with this, I am really stuck and haven't found any answers anywhere on the internet. Not even a discussion about this. Hope my question is clear. This is the R-script that I use:
model1<-maxent(x=predvars, p=presence_points, a=target_group_absence, path="//home//...//model1", args=c("replicates=5", "outputgrids=FALSE"))
model1map<-predict(model1, predvars, filename="//home//...//model1map.tif", outputgrids=FALSE)
Best regards,
Kristin
Sorry to be the bearer of bad news, but based on the source code, it looks like Dismo's predict function does not have the ability to generate a summary map.
Nitty-gritty details for those who care: When you call maxent with replicates set to something greater than 1, the maxent function returns a MaxEntReplicates object, rather than a normal MaxEnt object. When predict receives a MaxEntReplicates object, it just iterates through all of the models that it contains and calls predict on them individually.
So, what next? Fortunately, all is not lost! The reason that Dismo doesn't have this functionality is that for most kinds of model-building, there isn't actually a valid way to average parameters across your cross-validation models. I don't want to go so far as to say that that's definitely the case for MaxEnt specifically, but I suspect it is. As such, cross-validation is usually used more as a way of checking that your model building methodology works for your data than as a way of building your model directly (see this question for further discussion of that point). After verifying via cross-validation that models built using a given procedure seem to be accurate for the phenomenon you're modelling, it's customary to build a final model using all of your data. In theory this new model should only be better than models trained on a subset of your data.
So basically, assuming your cross-validated models look reasonable, you can run MaxEnt again with only one replicate. Your final result will be a model accuracy estimate based on the cross-validation and a map based on the second run with all of your data lumped together. Depending on what exactly your question is, there might be other useful summary statistics from the cross-validation that you want to use, but those are all things you've already seen in the html output.
I may have found this a couple of years later. But you could do something like this:
xm <- maxent(predictors, pres_train) # basically the maxent model
px <- predict(predictors, xm, ext=ext, progress= '' ) #prediction
px2 <- predict(predictors, xm2, ext=ext, progress= '' ) #prediction #02
models <- stack(px,px2) # create a stack of prediction from all the models
final_map <- mean(px,px2) # Take a mean of all the prediction
plot(final_map) #plot the averaged map
xm1,xm2,.. would be the maxent models for each partitions in cross-validation, and px, px2,.. would be the predicted maps.

Resources