How to decide the frequency while using the forecast function in R? - r

I have a series of daily data from 01-01-2014 to 31-01-2022. I want to predict the next 30 days. I am using auto.arima and it has some exogenous variables attached.
Here's the code: -
datax$NMD1<-(datax$NMD1/1000000000)
#Here to make an Arima series out of NMD 1. Exogenous variables here.
ts1<- ts(datax, frequency = 1)
class(ts1)
colnames(ts1)
autoplot(ts1[,"NMD1"])
#defining the set of exogenous variables
xset<- as.matrix(ts1[,"1Y TD INTEREST RATE"], ts1[,"BSE"], ts1[,"Repo Rate"], ts1[,"MIBOR Rate"], ts1[,"1Y OIS Rate" ], ts1[,"3M CD rate(PSU)"], ts1[,"2 Y GSec Rate"])
#Fitting the model
model1 <- auto.arima(ts1[,'NMD1'], xreg=xset, approximation = FALSE, allowmean = FALSE, allowdrift = FALSE)
summary(model1)
checkresiduals(model1)
fcast <- forecast(model1,xreg=xset, h=1)
print(summary(fcast))
autoplot(fcast)
My problems: -
While my model seems to work fine, I am not able to understand what value of h shall i put while forecasting. I also don't understand what frequency really is while we define a time series.
Please help.

Related

Grey-Markov method in R

In R, I have loaded the built-in time series: AirPassengers and split it in train- and testdata like this:
rm(list = ls())
data = AirPassengers
traindata = ts(data[1:(0.75*length(data))], frequency = 12)
testdata = ts(data[((0.75*length(data))+1):length(data)], frequency = 12)
from here I want to estimate future values of a time series with the traindata using the Grey-Markov method. I know the Grey-Markov method consist of a Grey GM(1, 1) forecasting model followed by a Markov chain forecasting model refinement. But is there a function in R that performs this Grey-Markov method on its own, just like, for example, the auto.arima function?

Forecasting of multivariate data through Vector Autoregression model

I am working in the functional time series using the multivariate time series data(hourly time series data). I am using FAR model more than one order for which no statistical package is available in R, so for this I convert my data into functional form and obtained the functional principle component and from those FPCA I extract their corresponding** FPCscores**. Know I use the VAR model on those FPCscores for the forecasting of each 24 hours through the VAR model, but the VAR give me the forecasted value for all 23hours when I put phat=23, but whenever I put phat=24 for example want to predict each 24 hours its give the results in the form of NA. the code is given below
library(vars)
library(fda)
fdata<- function(mat){
nb = 27 # number of basis functions for the data
fbf = create.fourier.basis(rangeval=c(0,1), nbasis=nb) # basis for data
args=seq(0,1,length=24)
fdata1=Data2fd(args,y=t(mat),fbf) # functions generated from discretized y
return(fdata1)
}
prediction.ffpe = function(fdata1){
n = ncol(fdata1$coef)
D = nrow(fdata1$coef)
#center the data
#mu = mean.fd(fdata1)
data = center.fd(fdata1)
#ffpe = fFPE(fdata1, Pmax=10)
#p.hat = ffpe[2] #order of the model
d.hat=23
p.hat=6
#fPCA
fpca = pca.fd(data,nharm=D, centerfns=TRUE)
scores = fpca$scores[,0:d.hat]
# to avoid warnings from vars predict function below
colnames(scores) <- as.character(seq(1:d.hat))
VAR.pre= predict(VAR(scores, p.hat), n.ahead=1, type="const")$fcst
}
kindly guide me that how can I solve out my problem or what error I doing. THANKS

Auto.Arima incorrectly predicts first point

I'm trying to complete a time series analysis of some reservoir data and am using auto.arima with a Fourier component to account for seasonality, as described here https://otexts.com/fpp2/dhr.html#dhr The code I have used is shown below and the dataset I used can be found here https://www.dropbox.com/sh/563nu3daeid0agb/AAB6NSddVUKgBCCbQtuqXPsZa?dl=0
Reservoir = read.csv("Reservoir1.csv",TRUE,",")
#impute missing data from data set
Reservoir = imputeTS::na_interpolation(Reservoir)
#Create Time Series
Reservoir = ts(Reservoir[,2],frequency = (365.25),start = c(2013,116))
plots = list()
for (i in seq (10)) {
fit = auto.arima(Reservoir, xreg = fourier(Reservoir, K = i), seasonal = FALSE)
plots[[i]] = autoplot(forecast(fit, xreg = fourier(Reservoir, K = i, h=10))) +
xlab(paste("K=",i,"AICC=",round(fit[["aicc"]],2))) + ylab("")
}
gridExtra::grid.arrange(plots[[1]],plots[[2]],plots[[3]],plots[[4]],plots[[5]],
plots[[6]],plots[[7]],plots[[8]],plots[[9]],plots[[10]],
nrow=5)
bestfit = auto.arima(Reservoir, xreg=fourier(Reservoir, K=9), seasonal=FALSE)
summary(bestfit)
checkresiduals(bestfit)
plot(Reservoir,col="red")
lines(fitted(bestfit),col="blue")
The model fits well, except for the incorrect first prediction. I'm lost as to why only this value would be so far off. Or, is this an acceptable error?
The residuals are the one-step forecast errors using all previous observations. At time 1, the residual is the forecast error with no previous observations, so it is simply based on the fitted model. In fact, it is an artificially "good" forecast because the differencing means there is no way for the model to know the location of the data until there is an observation. But the way ARIMA models are implemented in R makes the first prediction use a little more information than it should.

Holt Winters For Weekly Volume And Errors In R

I'm trying to use Holt Winters and prediction function for stock index weekly volume from last 10 years, however i am still getting error. Can you help me please?
This is what i'm trying to do now:
volumen<-read.csv(file.choose(), header = TRUE, sep = ";")
lines(volumen[,6])
HoltWinters(volumen)
This is error I'm getting on third row:
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) :
the time series has no periods or has less than 2
For prediction i have below code, however it does not seems to work with previous error:
lines(predict(volumen.hw,n.ahead=12),col=2)
Data in R Studio looks correct. I have decided to use file.choose() to make this code more universal. I am using *.csv file. Could someone guide me or advise what the code should look like to apply the Holt and Winters method and prediction?
It's hard to be 100% sure but
HoltWinters(lynx)
generates the same message as you are gettin,g but
HoltWinters(lynx, gamma = FALSE)
generates
Holt-Winters exponential smoothing with trend and without seasonal
component.
Call: HoltWinters(x = lynx, gamma = FALSE)
Smoothing parameters:
alpha: 1
beta : 0
gamma: FALSE
Coefficients: [,1]
a 3396
b 52
Which I learned from reading the examples in the HoltWinters documentation.
first of all it would be nice if you put your data here (if it is not private).
Secondly as far as I know you only can user HoltWinters() or any other method in the forecasting package to a vector or a time series so loading the entire dataset (volume) without specifying the rows could lead you to a problem.
Finally I recommend you to try the HW to an auxiliary vector containing the data that you want to study and also specify the frequency of the time series:
aux_train<-as.ts(volumen$variable, start=1, end=0.9*nrow(volume), freq="yourfrecuency")
prediction<-forecast(aux_train, h="number of forecast", method="hw")
accuracy(prediction, volumen$value)
I have finally won this battle - I have deleted my code and started from scratch. Here is what I came with:
dane2<-read.csv2(file.choose(), header = TRUE, sep = ";", dec=",")
dane2 <-ts(dane2[,5], start=c(2008,1),frequency=52)
past <- window(dane2, end = 2017)
future <- window(dane2, start = 2017)
model <- HoltWinters(past, seasonal = "additive")
model2 <- HoltWinters(past, seasonal = "multiplicative")
pred <- predict(model, n.ahead = 52)
pred2 <- predict(model2, n.ahead = 52)
dane2.hw<-HoltWinters(dane2)
predict(dane2.hw,n.ahead=52)
par(mfrow = c(2,1))
plot(model, predicted.values = pred)
lines(future, col="blue")
plot(model2, predicted.values = pred2)
lines(future, col="blue")
Now it works, so thank you for your answers.

arima model for multiple seasonalities in R

I'm learning to create a forecasting model for time series that has multiple seasonalities. Following is the subset of dataset that I'm refering to. This dataset includes hourly data points and I wish to include daily as well as weekly seasonalities in my arima model. Following is the subset of dataset:
data= c(4,4,1,2,6,21,105,257,291,172,72,10,35,42,77,72,133,192,122,59,29,25,24,5,7,3,3,0,7,15,91,230,284,147,67,53,54,55,63,73,114,154,137,57,27,31,25,11,4,4,4,2,7,18,68,218,251,131,71,43,55,62,63,80,120,144,107,42,27,11,10,16,8,10,7,1,4,3,12,17,58,59,68,76,91,95,89,115,107,107,41,40,25,18,14,15,6,12,2,4,1,6,9,14,43,67,67,94,100,129,126,122,132,118,68,26,19,12,9,5,4,2,5,1,3,16,89,233,304,174,53,55,53,52,59,92,117,214,139,73,37,28,15,11,8,1,2,5,4,22,103,258,317,163,58,29,37,46,54,62,95,197,152,58,32,30,17,9,8,1,3,1,3,16,109,245,302,156,53,34,47,46,54,65,102,155,116,51,30,24,17,10,7,4,8,0,11,0,2,225,282,141,4,87,44,60,52,74,135,157,113,57,44,26,29,17,8,7,4,4,2,10,57,125,182,100,33,27,41,39,35,50,69,92,66,30,11,10,11,9,6,5,10,4,1,7,9,17,24,21,29,28,48,38,30,21,26,25,35,10,9,4,4,4,3,5,4,4,4,3,5,10,16,28,47,63,40,49,28,22,18,27,18,10,5,8,7,3,2,2,4,1,4,19,59,167,235,130,57,45,46,42,40,49,64,96,54,27,17,18,15,7,6,2,3,1,2,21,88,187,253,130,77,47,49,48,53,77,109,147,109,45,41,35,16,13)
The code I'm trying to use is following:
tsdata = ts (data, frequency = 24)
aicvalstemp = NULL
aicvals= NULL
for (i in 1:5) {
for (j in 1:5) {
xreg1 = fourier(tsdata,i,24)
xreg2 = fourier(tsdata,j,168)
xregs = cbind(xreg1,xreg2)
armodel = auto.arima(bike_TS_west, xreg = xregs)
aicvalstemp = cbind(i,j,armodel$aic)
aicvals = rbind(aicvals,aicvalstemp)
}
}
The cbind command in the above command fails because the number of rows in xreg1 and xreg2 are different. I even tried using 1:length(data) argument in the fourier function but that also gave me an error. If someone can rectify the mistakes in the above code to produce a forecast of next 24 hours using an arima model with minimum AIC values, it would be really helpful. Also if you can include datasplitting in your code by creating training and testing data sets, it would be totally awesome. Thanks for your help.
I don't understand the desire to fit a weekly "season" to these data as there is no evidence for one in the data subset you provided. Also, you should really log-transform the data because they do not reflect a Gaussian process as is.
So, here's how you could fit models with a some form of hourly signals.
## the data are not normal, so log transform to meet assumption of Gaussian errors
ln_dat <- log(tsdata)
## number of hours to forecast
hrs_out <- 24
## max number of Fourier terms
max_F <- 5
## empty list for model fits
mod_res <- vector("list", max_F)
## fit models with increasing Fourier terms
for (i in 1:max_F) {
xreg <- fourier(ln_dat,i)
mod_res[[i]] <- auto.arima(tsdata, xreg = xreg)
}
## table of AIC results
aic_tbl <- data.frame(F=seq(max_F), AIC=sapply(mod_res, AIC))
## number of Fourier terms in best model
F_best <- which(aic_tbl$AIC==min(aic_tbl$AIC))
## forecast from best model
fore <- forecast(mod_res[[F_best]], xreg=fourierf(ln_dat,F_best,hrs_out))

Resources