I need to check second order stationarity of a time series of length 7320 (I have 1800 such time series). These time series are displacement recorded at 1800 sites on a mountain.
I tried using Priestley-Subba Rao in R : stationarity(). For 1 time series out of 1800, I got these values:
p-value for T : 2.109424e-15
p-value for I+R : 9.447661e-06
p-value for T+I+R : 1.4099e-10
Could you please tell me how to interpret it. All I know is if the p-value for T is 0, the null hypothesis of time series being stationary is rejected. Also, for 2nd time series out of 1800, I got these values;
p-value for T : 0
p-value for I+R : 1.458063e-09
p-value for T+I+R : 0
Could you tell me how to differentiate between the two. Both the time series are from the same dataset. Also, is it possible that one time series is stationary and another is not, given the fact they are from the same site and recorded at the exact same time.
I also tried Wavelet Spectrum Test in R: hwtos2() function. But this function takes the time-series length that are power of 2. Is there any other better test for looking at stationarity that does not limit with the length of time series?
The book "Nonstationarities in Hydrologic and Environmental Time Series" (Springer Ed.), at pag. 119, provides a good explanation for interpreting those p-values within the Priestley-Subba Rao test.
In general, you may also take a look at:
https://www.stat.tamu.edu/~suhasini/test_papers/priestley_subbarao70.pdf
About other stationarity tests, you may have a look at "weakly.stationary()"
function within "analytics" package and to the "costat" package whose info at:
https://www.jstatsoft.org/article/view/v055i01
where there is a suggestion to handle non dyadic length (i.e., 2^J
for some natural number J) time series. At pag. 5:
"It should be made clear that this is not a limitation of wavelets per se, but of the computationally efficient algorithms used to compute the intended quantities. Data sets of other lengths can be handled by zero-padding or truncation"
Some interesting info at:
https://arxiv.org/pdf/1603.06415.pdf
Related
I generated my own fictional Sales Data in order to execute a time series analysis.
It is supposed to represent a growing company and therefore i worked with a trend. However, I read through some tutorials and often read the information, that non-stationary time series should not be predicted by the auto.arima function.
But I receive results that make sense and If I would difference the data (which i did as well) the output doesn't make much sense.
So here comes my question: Can I use the auto.arima function with my data, that obviously has a trend?
Best regards and thanks in advance,
Francisco
eps <- rnorm(100, 30, 20)
trend <- seq(1, 100, 1)
trend <- 3 * trend
Sales <- trend + eps
timeframe<-seq(as.Date("2008/9/1"),by="month",length.out=100)
Data<-data.frame(Sales,timeframe)
plot(Data$timeframe,Data$Sales)
ts=ts(t(Data[,1]))
plot(ts[1,],type='o',col="black")
md=rwf(ts[1,],h=12,drift=T,level=c(80,95))
auto.arima(ts[1,])
Using the forecast function allows us to plot the expected sales for the next year: plot(forecast(auto.arima(ts[1,]),h=12))
Using the forecast function with our automated ARIMA can help us plan for the next quartal
forecast(auto.arima(ts[1,]),h=4)
plot(forecast(auto.arima(ts[1,])))
another way would be to use the autoplot function
fc<-forecast(ts[1,])
autoplot(fc)
The next step is to analyze our time-series. I execute the adf test, which has the null-hypothesis that the data is non-stationary.
So with the 5% default threshold our p-value would have to be greater than 0.05 in order to be certified as non-stationary.
library(tseries)
adf=adf.test(ts[1,])
adf
The output suggests that the data is non-stationary:
acf
acf=Acf(ts[1,])
Acf(ts[1,])
The autocorrelation is decreasing almost steadily, this points to non-stationary data also. Doing a kpss.test should verify that our data is non-stationary, since its null-hypothesis is the opposite of the adf test.
Do we expect a value smaller than 0.05
kpss=kpss.test(ts[1,])
kpss
We receive a p-value of 0.01, further proving that the data has a trend
ndiffs(ts[1,])
diff.data=diff(ts[1,])
auto.arima(diff.data)
plot(forecast(diff.data))
To answer your question - yes, you can use the auto.arima() function in the forecast package on non-stationary data.
If you look at the help file for auto.arima() (by typing ?auto.arima) you will see that it explains that you can choose to specify the "d" parameter - this is the order of differencing - first order means you difference the data once, second order means you difference the data twice etc. You can also choose not to specify this parameter and in this case, the auto.arima() function will determine the appropriate order of differencing using the "kpss" test. There are other unit root tests such as the Augmented Dickey-Fuller which you can choose to use in the auto.arima function by setting test="adf". It really depends on your preference.
You can refer to page 11 and subsequent pages for more information on the auto.arima function here:
https://cran.r-project.org/web/packages/forecast/forecast.pdf
I have been experimenting with a R package called RNN.
The following is the code site:
https://github.com/bquast/rnn
It has a very nice example for financial time series prediction.
I have read the code and I understand it uses the sequence of the time series to predict in advance the value of next day instrument.
The following is an example of run with 10 hidden nodes and 200 epochs
RNN financial time series prediction
What I would expect as result is that the algorithm succeed, at least in part, to predict in advance the value of the instrument.
From what I can see, apparently is only approximating the value of the time series at the current day, not giving any prediction on the next day.
Is my expectation wrong?
This code is very simple, how would you improve it?
y <- X[,1:input$training_amount+input$prediction_gap,as.numeric(input$target)]
matrix(y, ncol=input$training_amount)
y.train moves all the data forward by a day so that is what is being trained on - next day data for the currency pair you care about. With ncol = training_amount when there are too many columns (with them now equal to training_amount + prediction_gap), the first data points fall off; hence all the data gets moved forward by the prediction_gap.
I have half hourly data for 5 years measuring the electricity load.
I checked for stationary with acf, which shows it is non stationary. But when I used adf.test to check stationary, it showed opposite result:
adf.test(tsr1$LOAD.MW.,alternative="stationary")
# Augmented Dickey-Fuller Test
# data: tsr1$LOAD.MW.
# Dickey-Fuller = -9.7371, Lag order = 11, p-value = 0.01
# alternative hypothesis: stationary
Warning message:
In adf.test(tsr1$LOAD.MW., alternative = "stationary") :
p-value smaller than printed p-value
What should I consider? Though I have feeling that it is non stationary. If it is, how to make it stationary using R? Also I tried using command
decompose(tsr). It showed an error:
ERROR : time series has no or less than 2 periods
What is the issue?
The first step should be to visually examine your time series to see if it is stationary, and thereafter use the ADF-test to "formally" test for stationarity. This is more or less the standard procedure, at least in finance literature. (You could of course use another test like the KPSS or PP)
When plotting the ACF or PACF functions you mainly examine the AR and MA representation of your series.
As your series seems to be stationary according to the ADF-test you don't have to "make it stationary". However please keep in mind that these tests might give you the "wrong" answer when dealing with small sample data, seasonality or structural breaks.
If you don't want to soley rely on the ADF-test, you also could consider the KPSS test which have the opposite null-hypothosis.
In a time series analysis, I tested 30 time series with 62 observations for a unit root with the ur.df test from the R package urca (Bernard Pfaff), with lag length decided by the AIC criterion. With no exception, a lag length of 1 was chosen. This seems highly implausible. Testing with a CADF test from the R package CADFtest (which performs an ordinary ADF test if x~1 is chosen), and the AIC criterion for lag length selection, the number of lags varies between 0 and 7. Is there someone who can explain the tendency to a uniform and short lag length in urca?
Furthermore, if the lag lengths in ur.df and CADFtest are the same, the test statistics are not. For instance, for the time series lcon (natural logarithm of consumption per head) 1950-2010 in the Netherlands, the test statistics (constant and trend) are -1.5378 (1) with ur.df and -2.4331 (1) with CADFtest. Adf.test from the R package tseries computes a test statistic equal to ur.df (-1.5378, 1 lag). So rejection of a unit root is dependent on the package, which is not an optimal situation.
There seems to be a severe problem due to sensitivity of results with regard to sample length. Some observations might change the result dramatically (i.e. comparing lag length p=3 and 4 the series starts for the former at y_t=3 and for the latter at y_t=4). Therefore the time series should start at a common date (as is also recommended for IC based selection of lag length for VAR models in gerneral). So if max.lag.y=6 the provided time series needs to be truncated accordingly (i.e. y[-c(1:5)]). Unfortunately this is not the default. Hope this helps. Not sure if this is the only issue with CADFtest though....
(see also https://stat.ethz.ch/pipermail/r-help/2011-November/295519.html )
Bests
Hannes
I had the same problem. You need to specify the maximum number of lags, otherwise the default will be 1.
for example
ur.df(variable, type = "drift", lags=30, selectlags = "AIC")
I have a file containing 2,500 random numbers. Is it possible to rearrange these saved numbers in the way that a specific autocorrelation is created? Lets say, autocorrelation to the lag 1 of 0.2, autocorrelation to the lag 2 of 0.4, etc.etc.
Any help is greatly appreciated!
To be more specific:
The time series of a daily return in percent of an asset has the following characteristics that I am trying to recreate:
Leptokurtic, symmetric distribution, let's say centered at a daily return of zero
No significant autocorrelations (because the sign of a daily return is not predictable)
Significant autocorrleations if the time series is squared
The aim is to produce a random time series which satisfies all these three characteristics. The only two inputs should be the leptokurtic distribution (this I have already created) and the specific autocorrelation of the squared resulting time series (e.g. the final squared time series should have an autocorrelation at lag 1 of 0.2).
I only know how to produce random numbers out of my own mixed-distribution. Naturally if I would square this resulting time series, there would be no autocorrelation. I would like to find a way which takes this into account.
Generally the most straightforward way to create autocorrelated data is to generate the data so that it's autocorrelated. For example, you could create an auto correlated path by always using the value at p-1 as the mean for the random draw at time period p.
Rearranging is not only hard, but sort of odd conceptually. What are you really trying to do in the end? Giving some context might allow better answers.
There are functions for simulating correlated data. arima.sim() from stats package and simulate.Arima() from the forecast package.
simulate.Arima() has the advantages that (1.) it can simulate seasonal ARIMA models (maybe sometimes called "SARIMA") and (2.) It can simulate a continuation of an existing timeseries to which you have already fit an ARIMA model. To use simulate.Arima(), you do need to already have an Arima object.
UPDATE:
type ?arima.sim then scroll down to "examples".
Alternatively:
install.packages("forecast")
library(forecast)
fit <- auto.arima(USAccDeaths)
plot(USAccDeaths,xlim=c(1973,1982))
lines(simulate(fit, 36),col="red")