ARIMA forecasts with R - how to update data - r

I've been trying to develop an ARIMA model to forecast wind speed values. I have a four year data series (from january 2008 until december 2011). The series presents 10 minute data, which means that in a day we have 144 observations. Well, I'm using the first three years (observations 1 to 157157) to generate the model and the last year to validate the model.
The thing is I want to update the forecast. On other words, when one forecast ends up, more data is added to the dataset and another forecast is performed. But the result seems like I had just lagged the original series. Here's the code:
#1 - Load data:
z=read.csv('D:/Faculdade/Mestrado/Dissertação/velocidade/tudo_10m.csv', header=T, dec=".")
vel=ts(z, start=c(2008,1), frequency=52000)
# 5 - ARIMA Forecasts:
library(forecast)
n=157157
while(n<=157200){
amostra <- vel[1:n] # Only data until 2010
pred <- auto.arima(amostra, seasonal=TRUE,
ic="aicc", stepwise=FALSE, trace=TRUE,
approximation=TRUE, xreg=NULL,
test="adf",
allowdrift=TRUE, lambda=NULL, parallel=TRUE, num.cores=4)
velpred <- arima(pred) # Is this step really necessary?
velpred
predvel<- forecast(pred, h=12) # h means the forecast steps ahead
predvel
plot(amostra, xlim=c(157158, n), ylim=c(0,20), col="blue", main="Previsões e Observações", type="l", lty=1)
lines(fitted(predvel), xlim=c(157158, n), ylim=c(0,20), col="red", lty=2)
n=n+12
}
But when it plot the results (I couldn't post the picture here), it exhibits the observed series and the forecasted plot, which seems just the same as the observed series, but one step lagged.
Can anyone help me examining my code and/or giving tips on how to get the best of my model? Thanks! (Hope my English is understandable...)

Related

forecasting with tscv auto.arima predicted values in R

I want to do an out-of-sample forecast experiment using the auto.arima function. Further, time series cross validation with a fixed rolling window size should be applied. The goal is to obtain one step forecasts for 1,3 and 6 steps ahead.
library(forecast)
library(tseries)
#the time series
y1 = 2+ 0.15*(1:20) + rnorm(20,2)
y2 = y1[20]+ 0.3*(1:30) + rnorm(30,2)
y = as.ts(c(y1,y2))
#10obs in test set, 40obs in training set
ntest <- 10
ntrain <- length(y)-ntest
#auto.arima with some prefered specifications
farima <- function(x,h){forecast(auto.arima(x,ic="aic",test=c("adf"),seasonal=FALSE,
stepwise=FALSE, approximation = FALSE,
method=c("ML")),h=h)}
# executing the following function, gives the forecast errors in a matrix for each one-step forecast
e <- tsCV(y,farima,h = 6,window=40)
The predicted values are given by subtracting the error from the true value:
#predicted values
fc1 <- c(NA,y[2:50]-e[1:49,1])
fc1 <- fc1[41:50]
fc3 <- c(NA,y[2:50]-e[1:49,3])
fc3 <- fc3[41:50]
fc6 <- c(NA,y[2:50]-e[1:49,6])
fc6 <- fc6[41:50]
However I´m curious whether the predicted values for the 3-step ahead are coded correctly. Since the first 3-step ahead forecast is the prediction of the 43th observation?
Also i dont understand why the matrix e for the 3-step ahead error [3th column] has a value for observation 40. Since i thought the first 3-step ahead forecast is obtained for observation 43 and thus there shouldnt be an error for observation 40.
Always read the help file:
Value
Numerical time series object containing the forecast errors as a vector (if h=1) and a matrix otherwise. The time index corresponds to the last period of the training data. The columns correspond to the forecast horizons.
So tsCV() returns errors in a matrix where the (i,j)th entry contains the error for forecast origin i and forecast horizon h. So the value in row 40 and column 3 is a 3-step error made at time 40, for time period 43.
Thanks for your help!
So for the h=1,2,3 steps ahead the predicted values are the following:
#predicted values
#h=1
fc1 <- c(NA,y[41:50]-e[40:49,1])
fc1 <- fc1[2:11]
#h=2
fc2 <- c(NA,y[42:50]-e[40:49,2])
fc2 <- fc2[2:10]
#h=3
fc3 <- c(NA,y[43:50]-e[40:49,3])
fc3 <- fc3[2:9]
Is that correct?

Forecasting existing time series data

I have a data frame with time series data, called rData. The data is distributed into quarters and there is four years of data available. I analyzed the data and fitted an ARIMA model to the series, now I can compute forecasting for the periods to follow. But I wish to create a new column in my data frame that displays the forecast value that corresponds to the available time stamp. Then I wish to plot the two graphs against each other in R. Is their a way to compute these forecast values in R without individually analyzing all of the data prior to the available time stamp. Also how many cycles of data is necessary before forecasting can be computed?
Date <- seq(as.Date("2000-01-01"), as.Date("2003-12-31"), by = "quarter")
Sales <- c(2.8,2.1,4,4.5,3.8,3.2,4.8,5.4,4,3.6,5.5,5.8,4.3,3.9,6,6.4)
rData <- data.frame(Date, Sales)
tsData <- ts(data = rData$Sales, start = c(2000, 1), frequency = 4)
> tsExcelData
Qtr1 Qtr2 Qtr3 Qtr4
2000 2.8 2.1 4.0 4.5
2001 3.8 3.2 4.8 5.4
2002 4.0 3.6 5.5 5.8
2003 4.3 3.9 6.0 6.4
myModel <- auto.arima(tsData)
myForcast <- forecast(myModel, level = 95, h = 8)
The end result should be a data frame with an additional column and a graph with to plots, one for the actual data and one for the forecast data. Something like this.
Actual Data vs Forecast Data:
did you mean something like this, for the past values? If so just add this to your code:
extract_fitted_values <- myModel$fitted
plot(tsData, xlab = "Time", ylab = "Sales", type = "b", pch = 19)
lines(extract_fitted_values, col = "red")
As you see, you can extract the fitted values from the model fit.
Regarding your question: the data prior the time for the forecast IS actually analyzed when you run the auto.arima model.
That is how the Arima model estimates the parameters (by using past data) and then proceeds to do the forecasts. It is just that with the auto-arima function it (in addition) chooses the model specification automatically.
So basically the prior data analysis is a pre-requisite for the subsequent forecasts. It is worth noting that the red line that you see here represents the fitted values, i.e. your model is using all the data-points up to the last time point to calculate them and produce the numbers.
Maybe see more here if that point is a bit unclear:
https://stats.stackexchange.com/questions/260899/what-is-difference-between-in-sample-and-out-of-sample-forecasts
If you wanted to do "out-of-sample" forecasts for the past data (2000-2004) then this is also possible, but you would just need to fit, say on 2000-2002, produce a forecast for 1 step, then roll 1 quarter forward and repeat the same etc. etc.
If you want them into a data.frame and plot the real values vs the fitted + the predicted, you can try this:
df <- data.frame( # your data and some NAs, for the forecasting
real = c(tsData, rep(NA,length(data.frame(myForcast)$Point.Forecast )))
# in a vector the fitted and the predicted
, pred = c(myModel$fitted, data.frame(myForcast)$Point.Forecast)
# the time for the plot
, time = c(time(tsData), seq(2004,2005.75, by = 0.25)
))
plot(df$real, xlab = "time", ylab = "real black, pred red", type = "b", pch = 19,xaxt="n")
lines(df$pred, col = "red")
axis(1, at=1:24, labels=df$time)
For the theory part, as already said, the fitted values are calculated when you run your model. Running the model is the base for the forecasting, but you can have the fitted without forecasting of course.

In R, auto.arima fails to capture seasonality

auto.arima() is giving me no seasonal component for my series, even though I can see that there is one present. The function gives me a non seasonal ARIMA model of order (5,0,0). So, when I try to forecast using that model, it just gives the mean. The time series is of daily minimum temperatures in Melbourne, Australia for ten years.
Click this link to see the data and the auto.arima forecast
`
library(readr)
temp <- read_csv("~/Downloads/Melbourne Minimum Temp.csv",
col_types = cols(Date = col_date(format = "%m/%d/%y"),
Temp = col_number()))
t <- ts(temp$Temp, start = temp$Date\[1], end = temp$Date[nrow(temp)])
auto.arima(t, trace = T)
`
Tried using the data as a ts object, as an xts object, and as a vector.
Just reporting a good well explained - as usual - blogpost by Rob Hyndman.
https://robjhyndman.com/hyndsight/dailydata/
The relevant part to your question says (blockquoting exactly the page):
When the time series is long enough to take in more than a year, then
it may be necessary to allow for annual seasonality as well as weekly
seasonality. In that case, a multiple seasonal model such as TBATS is
required.
y <- msts(x, seasonal.periods=c(7,365.25))
fit <- tbats(y)
fc <- forecast(fit)
plot(fc)
This should capture the weekly pattern as well as the longer annual
pattern. The period 365.25 is the average length of a year allowing
for leap years. In some countries, alternative or additional year
lengths may be necessary.
I think it does exactly what you want.
I also tried to simply create the time series with msts
y <- msts(x[1:1800], seasonal.periods=c(7,365.25))
(I cut the time series in half to be quicker)
and then run auto.arima() directly on it, forcing a seasonal component with D=1
fc = auto.arima(y,D=1,trace=T,stepwise = F)
it will take a while.. because I set stepwise = FALSE (if you want it to look at all combinations without shortcuts you can set approximation=FALSE as well)
Series: y
ARIMA(1,0,3)(0,1,0)[365]
Coefficients:
ar1 ma1 ma2 ma3
0.9036 -0.3647 -0.3278 -0.0733
s.e. 0.0500 0.0571 0.0405 0.0310
sigma^2 estimated as 12.63: log likelihood=-3854.1
AIC=7718.19 AICc=7718.23 BIC=7744.54
and then the forecast
for_fc = forecast(fc)
plot(for_fc)
I am adding a figure with the complete time series (red) on top of the output of
plot(for_fc)
and it seems to work decently - but it was just a quick test.

Time Series Decomposition on a few months of data?

I'm trying to decompose my data to see what the trend and seasonality effects are. I have 4 months of data, recorded daily. Data looks like:
date amount
11/1/2000 1700
11/2/2000 11087
11/3/2000 11248
11/4/2000 13336
11/5/2000 18815
11/6/2000 8820
11/7/2000 7687
11/8/2000 5514
11/9/2000 9591
11/10/2000 9676
11/11/2000 14782
11/12/2000 18554
And so forth to the end of Feb 2001. I read in the data like so and generate a timeseries object:
myvector <- read.table("clipboard", sep="\t", header=T)
myts <- ts(myvector$amount, start=c(2000,11), frequency=52)
I'm very confused as to how to read this data in as a time series object. The data is recorded daily, but if I use frequency=365, then try
fit <- stl(myts2, s.window="periodic")
I get:
Error in stl(myts2, s.window = "periodic") :
series is not periodic or has less than two periods
Every example I find does the object casting with multiple years worth of data. Is this not possible in my case?
I know the next steps for plotting the trend and decomposition are:
fit <- stl(myts, s.window="periodic")
plot(fit)
Try seasonal differencing, which is similar to regular differencing except is applied over different periods:
An example:
data(austres)
plot(austres)
seasonal <- diff(austres, lag = 12, differences = 1)
plot(seasonal)
d.seasonal <- diff(seasonal, differences = 2)
plot(d.seasonal)
Now you've made stationary the seasonal component of the time series.

Plot rolling forecast intraday time series custom interval

I would like to plot the historic forecasts of intraday 30 min data form the SPY. Data is here.
I first plot the forecast fitted from a time window of the last 10 days. h= 13 is the number of 30 minute intervals on a trading day.
require(forecast)
a.win <- window(spy.close,start = end(spy.close)[1]-10*1440*60,end =end(spy.close)[1])
a.fit <- auto.arima(a.win)
a.pred <- forecast(a.fit, h=13)
plot(a.pred, type="l", xlab="period", ylab="price",
main="Overlay historic forecasts & actual prices")
Then I shift the 10 day window for one day 5 times, fit the model, and plot the forecasted mean on each run of the loop.
for (j in seq(1, 5, by=1)) { ## Loop to overlay early forecasts
result1 <- tryCatch({
b.end <- end(spy.close)[1]-j*1440*60 ## Window the time series
b.start <- b.end[1]-10*1440*60
b.window <- window(spy.close, start=b.start, end=b.end)
b.fit <-auto.arima(b.window)
b.pred <- forecast(b.fit, h=13)
lines(b.pred$mean, col="green", lty="dashed" )
}, error = function(e) {return(e$message)} ) ## Skip Errors
}
But something is messing up the time axis. So I figured that the problem is that the data is irregular (trading hours from 9.30 to 16, weekends markets are closed), but I could not find a suitable solution (even in the forecast package doc there is no mention of intraday time intervals). Hope that someone can help...

Resources