I have half hourly data for 5 years measuring the electricity load.
I checked for stationary with acf, which shows it is non stationary. But when I used adf.test to check stationary, it showed opposite result:
adf.test(tsr1$LOAD.MW.,alternative="stationary")
# Augmented Dickey-Fuller Test
# data: tsr1$LOAD.MW.
# Dickey-Fuller = -9.7371, Lag order = 11, p-value = 0.01
# alternative hypothesis: stationary
Warning message:
In adf.test(tsr1$LOAD.MW., alternative = "stationary") :
p-value smaller than printed p-value
What should I consider? Though I have feeling that it is non stationary. If it is, how to make it stationary using R? Also I tried using command
decompose(tsr). It showed an error:
ERROR : time series has no or less than 2 periods
What is the issue?
The first step should be to visually examine your time series to see if it is stationary, and thereafter use the ADF-test to "formally" test for stationarity. This is more or less the standard procedure, at least in finance literature. (You could of course use another test like the KPSS or PP)
When plotting the ACF or PACF functions you mainly examine the AR and MA representation of your series.
As your series seems to be stationary according to the ADF-test you don't have to "make it stationary". However please keep in mind that these tests might give you the "wrong" answer when dealing with small sample data, seasonality or structural breaks.
If you don't want to soley rely on the ADF-test, you also could consider the KPSS test which have the opposite null-hypothosis.
Related
I've downloaded this dataset, and when I plot it it is clearly non-stationary
df <- read.csv('https://raw.githubusercontent.com/ourcodingclub/CC-time-series/master/monthly_milk.csv')
plot(df,type="l")
But when I apply the Augmented Dickie-Fuller Test I get a p value of 0.01, implying that there is evidence to reject the null that the series is non-stationary. I am puzzled as to why this is happening. Is this because the confidence level is basically too high or is something else going on?
adf.test(df[,2])
#> Augmented Dickey-Fuller Test
#>
#> data: df[, 2]
#> Dickey-Fuller = -9.9714, Lag order = 5, p-value = 0.01
#> alternative hypothesis: stationary
Thanks Nick Wray
Keep in mind that the ADF test includes a constant trend. Once you detrend your series, it does look stationary. Try the below
df$index <- seq(1,168,1)
lm(milk_prod_per_cow_kg ~index,data=df)
coef <- summary(lm(milk_prod_per_cow_kg ~index,data=df))$coefficients[2,1]
df$detrended <- df$milk_prod_per_cow_kg-df$index*coef
plot(df$detrended,type="l")
As you can see, the series always returns to something near the initial value after a few observations. The ADF checks for whether large jumps in a series are persistent i.e. have an effect on all the subsequent values in the series, after detrending the series. In this case the jumps are clearly temporary.
I don't have enough reputation to post images but here is an imgur link to a series that is non-stationary even after you de-trend it: https://i.imgur.com/gc5FEtX.png
Based on freely available data from FRED.
I am looking at count data over a period of 31 years and 274 connected plots. I suspect this data to have spatial/temporal autocorrelation and I am looking for a well arranged method for testing this.
So far I used DHARMa to check my model residuals which returned the following message:
testing for spatial autocorrelation requires unique x,y values - if
you have several observations per location, either use the
recalculateResiduals function to aggregate residuals per location, or
extract the residuals from the fitted object, and plot / test each of
them independently for spatially repeated subgroups (a typical
scenario would repeated spatial observation, in which case one could
plot / test each time step separately for temporal autocorrelation).
Note that the latter must be done by hand, outside
testSpatialAutocorrelation
As aggregating is not an option I’ve decided to test each of them independently. The following output is the result of testing a single year for spatial autocorrelation:
> Moran.I(spat$Residuals, dists.inv)
$observed
[1] -0.007104585
$expected
[1] -0.003663004
$sd
[1] 0.004742504
$p.value
[1] 0.4680297
How can I interpret this output? And what would be a good method of testing every single year in my dataset? I thought about writing a loop which would make things very hard to read.
The same thing applies to testing for temporal autocorrelation. This is one of the 274 plots:
> lmtest::dwtest(temp$Residuals ~ 1, order.by = temp$year)
Durbin-Watson test
data: temp$Residuals ~ 1
DW = 2.1637, p-value = 0.6775
alternative hypothesis: true autocorrelation is greater than 0
Is there a smart method of running multiple tests to quickly identify the affected years/plots?
Also how would I include the strength of the spatial/temporal autocorrelation separately per year or plot in the final model?
I need to check second order stationarity of a time series of length 7320 (I have 1800 such time series). These time series are displacement recorded at 1800 sites on a mountain.
I tried using Priestley-Subba Rao in R : stationarity(). For 1 time series out of 1800, I got these values:
p-value for T : 2.109424e-15
p-value for I+R : 9.447661e-06
p-value for T+I+R : 1.4099e-10
Could you please tell me how to interpret it. All I know is if the p-value for T is 0, the null hypothesis of time series being stationary is rejected. Also, for 2nd time series out of 1800, I got these values;
p-value for T : 0
p-value for I+R : 1.458063e-09
p-value for T+I+R : 0
Could you tell me how to differentiate between the two. Both the time series are from the same dataset. Also, is it possible that one time series is stationary and another is not, given the fact they are from the same site and recorded at the exact same time.
I also tried Wavelet Spectrum Test in R: hwtos2() function. But this function takes the time-series length that are power of 2. Is there any other better test for looking at stationarity that does not limit with the length of time series?
The book "Nonstationarities in Hydrologic and Environmental Time Series" (Springer Ed.), at pag. 119, provides a good explanation for interpreting those p-values within the Priestley-Subba Rao test.
In general, you may also take a look at:
https://www.stat.tamu.edu/~suhasini/test_papers/priestley_subbarao70.pdf
About other stationarity tests, you may have a look at "weakly.stationary()"
function within "analytics" package and to the "costat" package whose info at:
https://www.jstatsoft.org/article/view/v055i01
where there is a suggestion to handle non dyadic length (i.e., 2^J
for some natural number J) time series. At pag. 5:
"It should be made clear that this is not a limitation of wavelets per se, but of the computationally efficient algorithms used to compute the intended quantities. Data sets of other lengths can be handled by zero-padding or truncation"
Some interesting info at:
https://arxiv.org/pdf/1603.06415.pdf
I generated my own fictional Sales Data in order to execute a time series analysis.
It is supposed to represent a growing company and therefore i worked with a trend. However, I read through some tutorials and often read the information, that non-stationary time series should not be predicted by the auto.arima function.
But I receive results that make sense and If I would difference the data (which i did as well) the output doesn't make much sense.
So here comes my question: Can I use the auto.arima function with my data, that obviously has a trend?
Best regards and thanks in advance,
Francisco
eps <- rnorm(100, 30, 20)
trend <- seq(1, 100, 1)
trend <- 3 * trend
Sales <- trend + eps
timeframe<-seq(as.Date("2008/9/1"),by="month",length.out=100)
Data<-data.frame(Sales,timeframe)
plot(Data$timeframe,Data$Sales)
ts=ts(t(Data[,1]))
plot(ts[1,],type='o',col="black")
md=rwf(ts[1,],h=12,drift=T,level=c(80,95))
auto.arima(ts[1,])
Using the forecast function allows us to plot the expected sales for the next year: plot(forecast(auto.arima(ts[1,]),h=12))
Using the forecast function with our automated ARIMA can help us plan for the next quartal
forecast(auto.arima(ts[1,]),h=4)
plot(forecast(auto.arima(ts[1,])))
another way would be to use the autoplot function
fc<-forecast(ts[1,])
autoplot(fc)
The next step is to analyze our time-series. I execute the adf test, which has the null-hypothesis that the data is non-stationary.
So with the 5% default threshold our p-value would have to be greater than 0.05 in order to be certified as non-stationary.
library(tseries)
adf=adf.test(ts[1,])
adf
The output suggests that the data is non-stationary:
acf
acf=Acf(ts[1,])
Acf(ts[1,])
The autocorrelation is decreasing almost steadily, this points to non-stationary data also. Doing a kpss.test should verify that our data is non-stationary, since its null-hypothesis is the opposite of the adf test.
Do we expect a value smaller than 0.05
kpss=kpss.test(ts[1,])
kpss
We receive a p-value of 0.01, further proving that the data has a trend
ndiffs(ts[1,])
diff.data=diff(ts[1,])
auto.arima(diff.data)
plot(forecast(diff.data))
To answer your question - yes, you can use the auto.arima() function in the forecast package on non-stationary data.
If you look at the help file for auto.arima() (by typing ?auto.arima) you will see that it explains that you can choose to specify the "d" parameter - this is the order of differencing - first order means you difference the data once, second order means you difference the data twice etc. You can also choose not to specify this parameter and in this case, the auto.arima() function will determine the appropriate order of differencing using the "kpss" test. There are other unit root tests such as the Augmented Dickey-Fuller which you can choose to use in the auto.arima function by setting test="adf". It really depends on your preference.
You can refer to page 11 and subsequent pages for more information on the auto.arima function here:
https://cran.r-project.org/web/packages/forecast/forecast.pdf
In a time series analysis, I tested 30 time series with 62 observations for a unit root with the ur.df test from the R package urca (Bernard Pfaff), with lag length decided by the AIC criterion. With no exception, a lag length of 1 was chosen. This seems highly implausible. Testing with a CADF test from the R package CADFtest (which performs an ordinary ADF test if x~1 is chosen), and the AIC criterion for lag length selection, the number of lags varies between 0 and 7. Is there someone who can explain the tendency to a uniform and short lag length in urca?
Furthermore, if the lag lengths in ur.df and CADFtest are the same, the test statistics are not. For instance, for the time series lcon (natural logarithm of consumption per head) 1950-2010 in the Netherlands, the test statistics (constant and trend) are -1.5378 (1) with ur.df and -2.4331 (1) with CADFtest. Adf.test from the R package tseries computes a test statistic equal to ur.df (-1.5378, 1 lag). So rejection of a unit root is dependent on the package, which is not an optimal situation.
There seems to be a severe problem due to sensitivity of results with regard to sample length. Some observations might change the result dramatically (i.e. comparing lag length p=3 and 4 the series starts for the former at y_t=3 and for the latter at y_t=4). Therefore the time series should start at a common date (as is also recommended for IC based selection of lag length for VAR models in gerneral). So if max.lag.y=6 the provided time series needs to be truncated accordingly (i.e. y[-c(1:5)]). Unfortunately this is not the default. Hope this helps. Not sure if this is the only issue with CADFtest though....
(see also https://stat.ethz.ch/pipermail/r-help/2011-November/295519.html )
Bests
Hannes
I had the same problem. You need to specify the maximum number of lags, otherwise the default will be 1.
for example
ur.df(variable, type = "drift", lags=30, selectlags = "AIC")