ACF PACF Determination ARIMA - r

Please help me confirm my understanding. For below graphs, I believe
AR(p) = 0 and MA(q) = 0
Is that correct ?

First, let's learn more...
As Aashiq Reza brought the description link, I think the ACF and PACF plots that you shared is like an MA(2) process.
ARIMA(p, i,q) has three elements, p is for AR, i is for difference, and q stands for MA process lag. Because the lag parameter defines the lag in the model's regression formula, if both of p and q be zero, then the model is not ARIMA anymore.
My suggestion: probabilistic model selection...
You can evaluate the correctness of a model for a time-series object using information criteria like AIC and BIC. For example, you have a preset of possible p and q, then you can test each one and get the criteria for it. The model with the least criterion is the best one. This link helps with the calculation in python.

Related

Estimation of a state-space model with lags in the measurement equation in R

I'm trying to estimate an SS model from this paper that has the following form:
Setting the order of the first lag polynomial to zero and the second one to one, we can reformulate it using terms from the MARSS package guide when applicable (x is the state, y is the observed variable, d is exogenous):
MARSS package allows for estimation of a simpler model that dooesn't include lagged variables in the measurement equation. Is there a way to estimate this one using MARSS or any other package without rewriting the estimation routine for this special case? Maybe there is a way to reformulate it so it could be "fed" to MARSS or some other package?
Take a look at how say the BSM Structural time series model or ARMA model is formulated as a MARSS model, aka a multivariate state-space model. That'll give you an idea of how to reform your model in multivariate state-space form.
Basically, your x will look like
See how the x_2 is just a dummy that is forced to be x(t-1)?
Now the y equation
The d and a are your D and A. I wrote in small case to spec that they are scalars. But they can be matrices in general (if y is multivariate say). Your inputs are the d_t and y_{t-1}. You prepare that 2x1xT matrix as an input.
Be careful with your initial condition specification. Probably best/easiest to set it at t=1 and estimate or use diffuse prior.
You can fit this model with MARSS. You can fit with any Kalman filter function that will allow you to pass in inputs in the y equation (some do, some don't). KFAS::KFS() allows that using the SScustom() function.
In MARSS the model list will look like so
mod.list=list(
B=matrix(list("b",1,0,0),2,2),
U=matrix(0,2,1),
Q=matrix(list("q",0,0,0),2,2),
Z=matrix(c("z", "c"),1,2),
A=matrix(0),
R=matrix("r"),
D=matrix(c("d", "a"),1,2),
x0=matrix(c("x1","x2"),2,1),
tinitx=1,
d=rbind(dt[2:TT],y[1:(TT-1)])
)
dat <- y[2:TT] # since you need y_{t-1} in the d (inputs)
fit <- MARSS(dat, model=mod.list)
It'll probably complain that it wants initial conditions for x0. Anything will work. The EM algorithm isn't sensitive to that like a BFGS or Newton algorithm. But method="BFGS" is actually often better for this type of structural ts model and in that case pick a reasonable initial condition for x (reasonable = close to your data in this case I think).

Simluating an ARMA Model Using R

My professor and I who are new to time series analysis in R are attempting the simulate an ARMA model. However, we are having trouble understanding where the parameters for the time series simulation come from. When simulating an ARMA model in R using the arima.sim() function, one argument that is required in the function is model =, which is a list with component ar and ma giving the AR and MA coefficients respectively. The issue we are running into is that we do not know where these AR and MA coefficients come from. Would anyone happen to know where the coefficients arise from?
I have tried searching the internet for information regarding this issue. However, the only answer that I have seen is that the coefficients are from
running an ACF and PACF. Though, there has been no further explanations as to what we are running the ACF and PACF over to generate these coefficients. Are we running ACF and PACF over previously simulated data or something else?
AR(1) Model Example Code
Ar.sm <- list(order = c(1,0,0), ar = 0.1, sd = 0.1)
Ar.lg <- list(order = c(1,0,0), ar = 0.1, sd = 0.1)
AR1.sm <- arima.sim(model = Ar.sm, n = 50)
AR1.lg <- arima.sim(model = Ar.lg, n = 50)
Any help would be greatly appreciated. Additionally, if anyone has found any literature or videos explaining this more in depth, that would be fantastic. Thank you and have a nice day.
The ARMA model is actually a class of models where you get different models by using different parameters. If you are using an ARMA(p,q) model then this means you have p auto-regressive (AR) terms and q moving-average (MA) terms. The AR and MA coefficients in the model set the size of these terms. If you are merely simulating a model (as opposed to making inferences from data) then it is up to you to set the coefficients to whatever values you want to simulate with. You are correct that different coefficient values give different kinds of results that are closely connected to the ACF and PACF.
Since you are simulating, may I suggest that you just try to simulate some examples using coefficients of your choice, and vary the coefficients you put into your simulation to see the differences in what you get out. It would also be a useful exercise for you to construct the sample ACF and PACF of your simulated data, and see how these vary as you change the coefficient values going into your simulation. This will give you a better idea of the connection between the coefficients and the output of the model.

evaluate forecast by the terms of p-value and pearson correlation

I am using R to do some evaluations for two different forecasting models. The basic idea of the evaluation is do the comparison of Pearson correlation and it corresponding p-value using the function of cor.() . The graph below shows the final result of the correlation coefficient and its p-value.
we suggestion that model which has lower correlation coefficient with corresponding lower p-value(less 0,05) is better(or, higher correlation coefficient but with pretty high corresponding p-value).
so , in this case, overall, we would say that the model1 is better than model2.
but the question here is, is there any other specific statistic method to quantify the comparison?
Thanks a lot !!!
Assuming you're working with time series data since you called out a "forecast". I think what you're really looking for is backtesting of your forecast model. From Ruey S. Tsay's "An Introduction to Analysis of Financial Data with R", you might want to take a look at his backtest.R function.
backtest(m1,rt,orig,h,xre=NULL,fixed=NULL,inc.mean=TRUE)
# m1: is a time-series model object
# orig: is the starting forecast origin
# rt: the time series
# xre: the independent variables
# h: forecast horizon
# fixed: parameter constriant
# inc.mean: flag for constant term of the model.
Backtesting allows you to see how well your models perform on past data and Tsay's backtest.R provides RMSE and Mean-Absolute-Error which will give you another perspective outside of correlation. Caution depending on the size of your data and complexity of your model, this can be a very slow running test.
To compare models you'll normally look at RMSE which is essentially the standard deviation of the error of your model. Those two are directly comparable and smaller is better.
An even better alternative is to set up training, testing, and validation sets before you build your models. If you train two models on the same training / test data you can compare them against your validation set (which has never been seen by your models) to get a more accurate measurement of your model's performance measures.
One final alternative, if you have a "cost" associated with an inaccurate forecast, apply those costs to your predictions and add them up. If one model performs poorly on a more expensive segment of data, you may want to avoid using it.
As a side-note, your interpretation of a p value as less is better leaves a little to be [desired] quite right.
P values address only one question: how likely are your data, assuming a true null hypothesis? It does not measure support for the alternative hypothesis.

Pseudo R squared for cumulative link function

I have an ordinal dependent variable and trying to use a number of independent variables to predict it. I use R. The function I use is clm in the ordinal package, to perform a cumulative link function with a probit link, to be precise:
I tried the function pR2 in the package pscl to get the pseudo R squared with no success.
How do I get pseudo R squareds with the clm function?
Thanks so much for your help.
There are a variety of pseudo-R^2. I don't like to use any of them because I do not see the results as having a meaning in the real world. They do not estimate effect sizes of any sort and they are not particularly good for statistical inference. Furthermore in situations like this with multiple observations per entity, I think it is debatable which value for "n" (the number of subjects) or the degrees of freedom is appropriate. Some people use McFadden's R^2 which would be relatively easy to calculate, since clm generated a list with one of its values named "logLik". You just need to know that the logLikelihood is only a multiplicative constant (-2) away from the deviance. If one had the model in the first example:
library(ordinal)
data(wine)
fm1 <- clm(rating ~ temp * contact, data = wine)
fm0 <- clm(rating ~ 1, data = wine)
( McF.pR2 <- 1 - fm1$logLik/fm0$logLik )
[1] 0.1668244
I had seen this question on CrossValidated and was hoping to see the more statistically sophisticated participants over there take this one on, but they saw it as a programming question and dumped it over here. Perhaps their opinion of R^2 as a worthwhile measure is as low as mine?
Recommend to use function nagelkerke from rcompanion package to get Pseudo r-squared.
When your predictor or outcome variables are categorical or ordinal, the R-Squared will typically be lower than with truly numeric data. R-squared merely a very weak indicator about model's fit, and you can't choose model based on this.

From Auto.arima to forecast in R

I don't quite understand the syntax of how forecast() applies external regressors in the library(forecast) in R.
My fit looks like this:
fit <- auto.arima(Y,xreg=factors)
where Y is a timeSeries object 100 x 1 and factors is a timeSeries object 100 x 5.
When I go to forecast, I apply...
forecast(fit, h=horizon)
And I get an error:
Error in forecast.Arima(fit, h = horizon) : No regressors provided
Does it want me to add back the xregressors from the fit? I thought these were included in the fit object as fit$xreg. Does that mean it's asking for future values of the xregressors, or that I should repeat the same values I used in the fit set? The documentation doesn't cover the meaning of xreg in the forecast step.
I believe all this means I should use
forecast(fit, h=horizon,xreg=factors)
or
forecast(fit, h=horizon,xreg=fit$xreg)
Which gives the same results. But I'm not sure whether the forecast step is interpreting the factors as future values, or appropriately as previous ones. So,
Is this doing a forecast out of purely past values, as I expect?
Why do I have to specify the xreg values twice? It doesn't run if I exclude them, so it doesn't behave like an option.
Correct me if I am wrong, but I think you may not completely understand how the ARIMA model with regressors works.
When you forecast with a simple ARIMA model (without regressors), it simply uses past values of your time series to predict future values. In such a model, you could simply specify your horizon, and it would give you a forecast until that horizon.
When you use regressors to build an ARIMA model, you need to include future values of the regressors to forecast. For example, if you used temperature as a regressor, and you were predicting disease incidence, then you would need future values of temperature to predict disease incidence.
In fact, the documentation does talk about xreg specifically. look up ?forecast.Arima and look at both the arguments h and xreg. You will see that If xreg is used, then h is ignored. Why? Because if your function uses xreg, then it needs them for forecasting.
So, in your code, h was simply ignored when you included xreg. Since you just used the values that you used to fit the model, it just gave you all the predictions for the same set of regressors as if they were in the future.
related
https://stats.stackexchange.com/questions/110589/arima-with-xreg-rebuilding-the-fitted-values-by-hand
I read that arima in R is borked
See Issue 3 and 4
https://www.stat.pitt.edu/stoffer/tsa4/Rissues.htm
the xreg was suggested to derive the proper intercept.
I'm using real statistics for excel to figure out what is the actual constant. I had a professor tell me you need to have a constant
These derive the same forecasts. So it appears you can use xreg to get some descriptive information, but you would have to use the statsexchange link to manually derive from them.
f = auto.arima(lacondos[,1])
f$coef
g = Arima(lacondos[,1],c(t(matrix(arimaorder(f)))),include.constant=FALSE,xreg=1:length(lacondos[,1]))
g$coef

Resources