Defining a seasonal ARIMA model in R - r

This is my first post as I'm struggling with coding in R.
I'm investigating power prices in the Danish electricity market and want to test the forecast power of those prices upon different model selection techniques. My data is hourly power prices within the timeframe of 2021.09.01 - 2021.11.25.
However, I face issues in defining my seasonal adjusted ARIMA model.
After running the necessary R packages, I execute the following code:
**expforecast = function(yt, ar, ma, ini, n){
if (is.list(yt)){yt=as.matrix(yt)}
t <- length(yt)
n_for <- t-n-ini+1
e_arma <- rep(0, n_for)
for(z in 1:n_for){
y_t=yt[1:(ini+z-1)]
model1 <- arima(y_t, order=c(ar,0,ma), method="ML")
y1_hat <- predict(model1, h=n)
e_arma[z] <- yt[(ini+z+n-1)] - y1_hat$pred[1]
}
return(e_arma)
}**
**El_spot <- read_excel("El-spot.xls")
View(El_spot)**
**NoOfHours <- as.numeric(ymd_hms("2021-11-25 00:00:00") - ymd_hms("2021-01-01 00:00:00"))*24
ymd_hms("2021-01-01 00:00:00") + hours(0:NoOfHours)**
**y <- ts(El_spot$c1, start=1, frequency=24*7)**
Please bear in mind that I use frequency as 24/7 due to daily and weekly seasonality (correct me if I'm wrong). The 24/7 frequency creates ~13 weeks.
Furthermore I run the following code to test for stationarity in my time-series:
**decomp = stl(y, "periodic")
deseasonal_cnt <- seasadj(decomp)
plot(decomp)
[enter image description here][2]
adf.test(y, alternative = "stationary")**
I receive a p-value of .01, thus stationarity is obtained. However, when running the ACF and PACF plot, I find clear autocorrelation pattern.
**Acf(y, main='')**
enter image description here
**Pacf(y, main='')**
enter image description here
I hope some of you skilled guys can assist me with this struggle and maybe recommend a seasonal ARIMA model to go forward with.

Related

How to decide the frequency while using the forecast function in R?

I have a series of daily data from 01-01-2014 to 31-01-2022. I want to predict the next 30 days. I am using auto.arima and it has some exogenous variables attached.
Here's the code: -
datax$NMD1<-(datax$NMD1/1000000000)
#Here to make an Arima series out of NMD 1. Exogenous variables here.
ts1<- ts(datax, frequency = 1)
class(ts1)
colnames(ts1)
autoplot(ts1[,"NMD1"])
#defining the set of exogenous variables
xset<- as.matrix(ts1[,"1Y TD INTEREST RATE"], ts1[,"BSE"], ts1[,"Repo Rate"], ts1[,"MIBOR Rate"], ts1[,"1Y OIS Rate" ], ts1[,"3M CD rate(PSU)"], ts1[,"2 Y GSec Rate"])
#Fitting the model
model1 <- auto.arima(ts1[,'NMD1'], xreg=xset, approximation = FALSE, allowmean = FALSE, allowdrift = FALSE)
summary(model1)
checkresiduals(model1)
fcast <- forecast(model1,xreg=xset, h=1)
print(summary(fcast))
autoplot(fcast)
My problems: -
While my model seems to work fine, I am not able to understand what value of h shall i put while forecasting. I also don't understand what frequency really is while we define a time series.
Please help.

how to develop the crossbasis using a binary term

The indicator hw is used to show if a day is a heat wave day. I intend to study the exposure-lag relationship of heatwave using dlnm in R. When I developed the crossbasis and predicted the results, I got an error as following:
Error in crosspred(hw.basis, model) :
coef/vcov not consistent with basis matrix. See help(crosspred)
My code:
hw.knots <- equalknots(hw, fun="ns",df=4)
hw.logknots <- logknots(10,fun="ns",df=4,intercept=TRUE)
hw.basis <- crossbasis(hw,lag=10, argvar=list(fun="ns",df=4), arglag=list(knots=hw.logknots))
model <- glm(ntr ~ hw.basis+ dow + ns(time,df=7*7),
family=quasipoisson(), data)
hw.pred<-crosspred(hw.basis,model)

ARIMA Issues in RStudio - ARIMA for Stocks

This is my first post here on this platform. I'm an student in Business Administration so please have mercy with my nooby questions.
I'm currently creating ARIMA Models for some Stocks respectively their closing prices. However, when plotting the forecasts, all I get is a straight line with a little bit of drift. But that's it. I don't get any clear patterns for example, no ups and downs in the forecast, just straight line with drift.
I'm not sure if I did any kind of mistake maybe.
install.packages(quantmod)
install.packages(tseries)
install.packages(timeSeries)
install.packages(forecast)
install.packages(MASS)
install.packages(ggplot2)
install.packages(zoo)
install.packages(xts)
library(quantmod)
library(tseries)
library(timeSeries)
library(forecast)
library(MASS)
library(ggplot2)
library(zoo)
library(xts)
# load data
energy = getSymbols(Symbols = "XLES.L", auto.assign = F, from = "2015-01-01", to = "2020-01-01")
# remove NAs
energy <- na.omit(energy$XLES.L.Close)
plot(energy)
# create TS
ts <- ts(energy, start = c(2015,01), frequency = 252)
plot(ts) #does not seem stationary
# check for stationarity
adf.test(ts) # --> not stationariy, differencing required
#Create Arima Model
arima <- auto.arima(ts, d = 1)
arima
# Create Forecast (Out-Of-Sample for 20days/1month)
forecast_energy <- forecast(arima, h = 20)
plot(forecast_energy)
plot(forecast_energy, include = 50)
My questions are:
Why is it a straight line?
Is it necessary to create a Time Series with the ts-function since the data imported is already in a ts (or is it not?)
Is this correct what I did?
HERE THE PLOTS:
HERE THE PRINT
> print(arima)
Series: ts
ARIMA(2,1,0)
Coefficients:
ar1 ar2
0.0125 -0.0502
s.e. 0.0283 0.0283
sigma^2 estimated as 20.19: log likelihood=-3682.99
AIC=7371.98 AICc=7372 BIC=7387.4
Can someone please help me :)
Best regards
Noob
An example of simple signal, which is able to break auto arima
library(forecast)
set.seed(1)
mynoise <- rnorm(252*5,0,sd = 100) # high short term noise, non integrated
mytrend <- 1:(252*5) # long term trend
mysignal <- mynoise+mytrend
library(forecast)
mymodel <- auto.arima(mysignal)
plot(forecast(mymodel,50))
the difference of the signal is u=1+e-lag(e) and lag(u)=1+lag(e)-lag2(e)
let epsilon be e-lag(e) it is an ar1 with epsilon=-lag(epsilon)+e
So the process is likely to be seen as a stationnary 011, with 1 non very significative, and then auto arima estimates an arima(0,1,1) with the moving average term around -1.
Which is not a total fail : it's decent for short terms predictions, but it makes silly long term predictions.
You are getting forecast as a straight line because your model is not able to find and seasonality in data, when this happen the model simply take average of your historical data and generate forecast, that is why you are getting straight line.
It is very difficult for a model to forecast accurately with out any good seasonality and trend present in historical data.

arima model for multiple seasonalities in R

I'm learning to create a forecasting model for time series that has multiple seasonalities. Following is the subset of dataset that I'm refering to. This dataset includes hourly data points and I wish to include daily as well as weekly seasonalities in my arima model. Following is the subset of dataset:
data= c(4,4,1,2,6,21,105,257,291,172,72,10,35,42,77,72,133,192,122,59,29,25,24,5,7,3,3,0,7,15,91,230,284,147,67,53,54,55,63,73,114,154,137,57,27,31,25,11,4,4,4,2,7,18,68,218,251,131,71,43,55,62,63,80,120,144,107,42,27,11,10,16,8,10,7,1,4,3,12,17,58,59,68,76,91,95,89,115,107,107,41,40,25,18,14,15,6,12,2,4,1,6,9,14,43,67,67,94,100,129,126,122,132,118,68,26,19,12,9,5,4,2,5,1,3,16,89,233,304,174,53,55,53,52,59,92,117,214,139,73,37,28,15,11,8,1,2,5,4,22,103,258,317,163,58,29,37,46,54,62,95,197,152,58,32,30,17,9,8,1,3,1,3,16,109,245,302,156,53,34,47,46,54,65,102,155,116,51,30,24,17,10,7,4,8,0,11,0,2,225,282,141,4,87,44,60,52,74,135,157,113,57,44,26,29,17,8,7,4,4,2,10,57,125,182,100,33,27,41,39,35,50,69,92,66,30,11,10,11,9,6,5,10,4,1,7,9,17,24,21,29,28,48,38,30,21,26,25,35,10,9,4,4,4,3,5,4,4,4,3,5,10,16,28,47,63,40,49,28,22,18,27,18,10,5,8,7,3,2,2,4,1,4,19,59,167,235,130,57,45,46,42,40,49,64,96,54,27,17,18,15,7,6,2,3,1,2,21,88,187,253,130,77,47,49,48,53,77,109,147,109,45,41,35,16,13)
The code I'm trying to use is following:
tsdata = ts (data, frequency = 24)
aicvalstemp = NULL
aicvals= NULL
for (i in 1:5) {
for (j in 1:5) {
xreg1 = fourier(tsdata,i,24)
xreg2 = fourier(tsdata,j,168)
xregs = cbind(xreg1,xreg2)
armodel = auto.arima(bike_TS_west, xreg = xregs)
aicvalstemp = cbind(i,j,armodel$aic)
aicvals = rbind(aicvals,aicvalstemp)
}
}
The cbind command in the above command fails because the number of rows in xreg1 and xreg2 are different. I even tried using 1:length(data) argument in the fourier function but that also gave me an error. If someone can rectify the mistakes in the above code to produce a forecast of next 24 hours using an arima model with minimum AIC values, it would be really helpful. Also if you can include datasplitting in your code by creating training and testing data sets, it would be totally awesome. Thanks for your help.
I don't understand the desire to fit a weekly "season" to these data as there is no evidence for one in the data subset you provided. Also, you should really log-transform the data because they do not reflect a Gaussian process as is.
So, here's how you could fit models with a some form of hourly signals.
## the data are not normal, so log transform to meet assumption of Gaussian errors
ln_dat <- log(tsdata)
## number of hours to forecast
hrs_out <- 24
## max number of Fourier terms
max_F <- 5
## empty list for model fits
mod_res <- vector("list", max_F)
## fit models with increasing Fourier terms
for (i in 1:max_F) {
xreg <- fourier(ln_dat,i)
mod_res[[i]] <- auto.arima(tsdata, xreg = xreg)
}
## table of AIC results
aic_tbl <- data.frame(F=seq(max_F), AIC=sapply(mod_res, AIC))
## number of Fourier terms in best model
F_best <- which(aic_tbl$AIC==min(aic_tbl$AIC))
## forecast from best model
fore <- forecast(mod_res[[F_best]], xreg=fourierf(ln_dat,F_best,hrs_out))

Create Forecast and Check Accuracy

I have data of the form SaleDateTime = '2015-01-02 23:00:00.000' SaleCount=4.
I'm trying to create an hourly forecast for the next 12 hours, using the code below.
I'm new to forecasting and could definitely appreciate some advice.
I'm trying to partition the data, train a model, plot the forecast with x axis of the form '2015-01-02 23:00:00.000', and test the accuracy of the model on a test time series.
I'm getting the error message below, when I try to run the accuracy as shown. Does anyone know why I'm getting the error message below?
When I run the plot as shown below it has an x axis from 0 to 400, does anyone know how to show that as something like '2015-01-02 23:00:00.000'? I would also like to narrow the plot to the last say 3 months.
My understanding is that if you don't specify a model for forecast, then it tries to fit the best model it can to the data for the forecast. Is that correct?
How do I filter for the same timeseries range with the forecast as the ts1Test that I'm trying to run accuracy on, is it something like ts(fcast2, start=2001, end = 8567) ?
Since I'm using the zoo package is the as.POSIXct step unnecessary, could I just do eventdata <- zoo(Value, order.by = SaleDateTime) instead?
library("forecast")
library("zoo")
SampleData<-SampleData
Value<-SampleData[,c("SaleDateTime","SaleCount")]
rDateTime<-as.POSIXct(SampleData$SaleDateTime, format="%Y-%m-%d %H:%M:%S")
eventdata <- zoo(Value, order.by = rDateTime)
##Partitioning data Training/Testing
ts1SampleTrain<-eventdata[1:2000,]
ts1Train<-ts(ts1SampleTrain$SaleCount, frequency=24)
ts1SampleTest<-eventdata[2001:28567,]
ts1Test<-ts(ts1SampleTest$SaleCount, frequency=24)
#Training Model
fcast2<-forecast(ts1Train,h=8567)
plot(fcast2)
accuracy(fcast2,ts1Test)
New Error:
Error in -.default(xx, ff[1:n]) : non-numeric argument to binary operator
To make your accuracy test run you should ensure that the length of your test data ts1Test and your forecasting horizon, h in fcast2<-forecast(ts1Train,h=8567) are of the same length. Now you have 26567 datapoints vs 8567.
Following your approach, the next toy example will work:
library(forecast)
library(zoo)
Value <- rnorm(1100)
rDateTime <- seq(as.POSIXct('2012-01-01 00:00:00'), along.with=Value, by='hour')
eventDate <- ts(zoo(Value, order.by=rDateTime), frequency = 24)
tsTrain <-eventDate[1:1000]
tsTest <- eventDate[1001:1100]
fcast<-forecast(tsTrain,h=100)
accuracy(fcast, tsTest)
ME RMSE MAE MPE MAPE MASE ACF1
Training set -2.821378e-04 9.932745e-01 7.990188e-01 1.003861e+02 1.007542e+02 7.230356e-01 4.638487e-02
Test set 0.02515008 1.02271839 0.86072703 99.79208174 100.14023919 NA NA
Concerning your other two questions:
Use of POSIX timestamps and zoo package. You don't need them to
use forecast. ts(Value, frequency) would suffice.
Plotting time series object with datetimes as your labels. The
following code snippet should get you started in this direction. Look for
axis function that provides the desired behavior:
par(mar=c(6,2,1,1)) # bottom, left, top, right margins
plot(tsTrain, type="l", xlab="", xaxt="n")
axis(side=1, at=seq(1,1000,100), label=format(rDateTime[seq(1,1000,100)], "%Y-%m-%d"), las=2)

Resources