ARIMA Issues in RStudio - ARIMA for Stocks - r

This is my first post here on this platform. I'm an student in Business Administration so please have mercy with my nooby questions.
I'm currently creating ARIMA Models for some Stocks respectively their closing prices. However, when plotting the forecasts, all I get is a straight line with a little bit of drift. But that's it. I don't get any clear patterns for example, no ups and downs in the forecast, just straight line with drift.
I'm not sure if I did any kind of mistake maybe.
install.packages(quantmod)
install.packages(tseries)
install.packages(timeSeries)
install.packages(forecast)
install.packages(MASS)
install.packages(ggplot2)
install.packages(zoo)
install.packages(xts)
library(quantmod)
library(tseries)
library(timeSeries)
library(forecast)
library(MASS)
library(ggplot2)
library(zoo)
library(xts)
# load data
energy = getSymbols(Symbols = "XLES.L", auto.assign = F, from = "2015-01-01", to = "2020-01-01")
# remove NAs
energy <- na.omit(energy$XLES.L.Close)
plot(energy)
# create TS
ts <- ts(energy, start = c(2015,01), frequency = 252)
plot(ts) #does not seem stationary
# check for stationarity
adf.test(ts) # --> not stationariy, differencing required
#Create Arima Model
arima <- auto.arima(ts, d = 1)
arima
# Create Forecast (Out-Of-Sample for 20days/1month)
forecast_energy <- forecast(arima, h = 20)
plot(forecast_energy)
plot(forecast_energy, include = 50)
My questions are:
Why is it a straight line?
Is it necessary to create a Time Series with the ts-function since the data imported is already in a ts (or is it not?)
Is this correct what I did?
HERE THE PLOTS:
HERE THE PRINT
> print(arima)
Series: ts
ARIMA(2,1,0)
Coefficients:
ar1 ar2
0.0125 -0.0502
s.e. 0.0283 0.0283
sigma^2 estimated as 20.19: log likelihood=-3682.99
AIC=7371.98 AICc=7372 BIC=7387.4
Can someone please help me :)
Best regards
Noob

An example of simple signal, which is able to break auto arima
library(forecast)
set.seed(1)
mynoise <- rnorm(252*5,0,sd = 100) # high short term noise, non integrated
mytrend <- 1:(252*5) # long term trend
mysignal <- mynoise+mytrend
library(forecast)
mymodel <- auto.arima(mysignal)
plot(forecast(mymodel,50))
the difference of the signal is u=1+e-lag(e) and lag(u)=1+lag(e)-lag2(e)
let epsilon be e-lag(e) it is an ar1 with epsilon=-lag(epsilon)+e
So the process is likely to be seen as a stationnary 011, with 1 non very significative, and then auto arima estimates an arima(0,1,1) with the moving average term around -1.
Which is not a total fail : it's decent for short terms predictions, but it makes silly long term predictions.

You are getting forecast as a straight line because your model is not able to find and seasonality in data, when this happen the model simply take average of your historical data and generate forecast, that is why you are getting straight line.
It is very difficult for a model to forecast accurately with out any good seasonality and trend present in historical data.

Related

How to decide the frequency while using the forecast function in R?

I have a series of daily data from 01-01-2014 to 31-01-2022. I want to predict the next 30 days. I am using auto.arima and it has some exogenous variables attached.
Here's the code: -
datax$NMD1<-(datax$NMD1/1000000000)
#Here to make an Arima series out of NMD 1. Exogenous variables here.
ts1<- ts(datax, frequency = 1)
class(ts1)
colnames(ts1)
autoplot(ts1[,"NMD1"])
#defining the set of exogenous variables
xset<- as.matrix(ts1[,"1Y TD INTEREST RATE"], ts1[,"BSE"], ts1[,"Repo Rate"], ts1[,"MIBOR Rate"], ts1[,"1Y OIS Rate" ], ts1[,"3M CD rate(PSU)"], ts1[,"2 Y GSec Rate"])
#Fitting the model
model1 <- auto.arima(ts1[,'NMD1'], xreg=xset, approximation = FALSE, allowmean = FALSE, allowdrift = FALSE)
summary(model1)
checkresiduals(model1)
fcast <- forecast(model1,xreg=xset, h=1)
print(summary(fcast))
autoplot(fcast)
My problems: -
While my model seems to work fine, I am not able to understand what value of h shall i put while forecasting. I also don't understand what frequency really is while we define a time series.
Please help.

Defining a seasonal ARIMA model in R

This is my first post as I'm struggling with coding in R.
I'm investigating power prices in the Danish electricity market and want to test the forecast power of those prices upon different model selection techniques. My data is hourly power prices within the timeframe of 2021.09.01 - 2021.11.25.
However, I face issues in defining my seasonal adjusted ARIMA model.
After running the necessary R packages, I execute the following code:
**expforecast = function(yt, ar, ma, ini, n){
if (is.list(yt)){yt=as.matrix(yt)}
t <- length(yt)
n_for <- t-n-ini+1
e_arma <- rep(0, n_for)
for(z in 1:n_for){
y_t=yt[1:(ini+z-1)]
model1 <- arima(y_t, order=c(ar,0,ma), method="ML")
y1_hat <- predict(model1, h=n)
e_arma[z] <- yt[(ini+z+n-1)] - y1_hat$pred[1]
}
return(e_arma)
}**
**El_spot <- read_excel("El-spot.xls")
View(El_spot)**
**NoOfHours <- as.numeric(ymd_hms("2021-11-25 00:00:00") - ymd_hms("2021-01-01 00:00:00"))*24
ymd_hms("2021-01-01 00:00:00") + hours(0:NoOfHours)**
**y <- ts(El_spot$c1, start=1, frequency=24*7)**
Please bear in mind that I use frequency as 24/7 due to daily and weekly seasonality (correct me if I'm wrong). The 24/7 frequency creates ~13 weeks.
Furthermore I run the following code to test for stationarity in my time-series:
**decomp = stl(y, "periodic")
deseasonal_cnt <- seasadj(decomp)
plot(decomp)
[enter image description here][2]
adf.test(y, alternative = "stationary")**
I receive a p-value of .01, thus stationarity is obtained. However, when running the ACF and PACF plot, I find clear autocorrelation pattern.
**Acf(y, main='')**
enter image description here
**Pacf(y, main='')**
enter image description here
I hope some of you skilled guys can assist me with this struggle and maybe recommend a seasonal ARIMA model to go forward with.

ets: Error in ets(timeseries, model = "MAM") : Nonseasonal data

I'm trying to create a forecast using an exponential smoothing method, but get the error "nonseasonal data". This is clearly not true - see code below.
Why am I getting this error? Should I use a different function (it should be able to perform simple, double, damped trend, seasonal, Winters method)?
library(forecast)
timelen<-48 # use 48 months
dates<-seq(from=as.Date("2008/1/1"), by="month", length.out=timelen)
# create seasonal data
time<-seq(1,timelen)
season<-sin(2*pi*time/12)
constant<-40
noise<-rnorm(timelen,mean=0,sd=0.1)
trend<-time*0.01
values<-constant+season+trend+noise
# create time series object
timeseries<-as.ts(x=values,start=min(dates),end=max(dates),frequency=1)
plot(timeseries)
# forecast MAM
ets<-ets(timeseries,model="MAM") # ANN works, why MAM not?
ets.forecast<-forecast(ets,h=24,level=0.9)
plot(ets.forecast)
Thanks&kind regards
You should use ts simply to create a time series from a numeric vector. See the help file for more details.
Your start and end values aren't correctly specified.
And setting the frequency at 1 is not a valid seasonality, it's the same as no seasonality at all.
Try:
timeseries <- ts(data=values, frequency=12)
ets <- ets(timeseries, model="MAM")
print(ets)
#### ETS(M,A,M)
#### Call:
#### ets(y = timeseries, model = "MAM")
#### ...
The question in your comments, why ANN works is because the third N means no seasonnality, so the model can be computed even with a non-seasonal timeseries.

arima model for multiple seasonalities in R

I'm learning to create a forecasting model for time series that has multiple seasonalities. Following is the subset of dataset that I'm refering to. This dataset includes hourly data points and I wish to include daily as well as weekly seasonalities in my arima model. Following is the subset of dataset:
data= c(4,4,1,2,6,21,105,257,291,172,72,10,35,42,77,72,133,192,122,59,29,25,24,5,7,3,3,0,7,15,91,230,284,147,67,53,54,55,63,73,114,154,137,57,27,31,25,11,4,4,4,2,7,18,68,218,251,131,71,43,55,62,63,80,120,144,107,42,27,11,10,16,8,10,7,1,4,3,12,17,58,59,68,76,91,95,89,115,107,107,41,40,25,18,14,15,6,12,2,4,1,6,9,14,43,67,67,94,100,129,126,122,132,118,68,26,19,12,9,5,4,2,5,1,3,16,89,233,304,174,53,55,53,52,59,92,117,214,139,73,37,28,15,11,8,1,2,5,4,22,103,258,317,163,58,29,37,46,54,62,95,197,152,58,32,30,17,9,8,1,3,1,3,16,109,245,302,156,53,34,47,46,54,65,102,155,116,51,30,24,17,10,7,4,8,0,11,0,2,225,282,141,4,87,44,60,52,74,135,157,113,57,44,26,29,17,8,7,4,4,2,10,57,125,182,100,33,27,41,39,35,50,69,92,66,30,11,10,11,9,6,5,10,4,1,7,9,17,24,21,29,28,48,38,30,21,26,25,35,10,9,4,4,4,3,5,4,4,4,3,5,10,16,28,47,63,40,49,28,22,18,27,18,10,5,8,7,3,2,2,4,1,4,19,59,167,235,130,57,45,46,42,40,49,64,96,54,27,17,18,15,7,6,2,3,1,2,21,88,187,253,130,77,47,49,48,53,77,109,147,109,45,41,35,16,13)
The code I'm trying to use is following:
tsdata = ts (data, frequency = 24)
aicvalstemp = NULL
aicvals= NULL
for (i in 1:5) {
for (j in 1:5) {
xreg1 = fourier(tsdata,i,24)
xreg2 = fourier(tsdata,j,168)
xregs = cbind(xreg1,xreg2)
armodel = auto.arima(bike_TS_west, xreg = xregs)
aicvalstemp = cbind(i,j,armodel$aic)
aicvals = rbind(aicvals,aicvalstemp)
}
}
The cbind command in the above command fails because the number of rows in xreg1 and xreg2 are different. I even tried using 1:length(data) argument in the fourier function but that also gave me an error. If someone can rectify the mistakes in the above code to produce a forecast of next 24 hours using an arima model with minimum AIC values, it would be really helpful. Also if you can include datasplitting in your code by creating training and testing data sets, it would be totally awesome. Thanks for your help.
I don't understand the desire to fit a weekly "season" to these data as there is no evidence for one in the data subset you provided. Also, you should really log-transform the data because they do not reflect a Gaussian process as is.
So, here's how you could fit models with a some form of hourly signals.
## the data are not normal, so log transform to meet assumption of Gaussian errors
ln_dat <- log(tsdata)
## number of hours to forecast
hrs_out <- 24
## max number of Fourier terms
max_F <- 5
## empty list for model fits
mod_res <- vector("list", max_F)
## fit models with increasing Fourier terms
for (i in 1:max_F) {
xreg <- fourier(ln_dat,i)
mod_res[[i]] <- auto.arima(tsdata, xreg = xreg)
}
## table of AIC results
aic_tbl <- data.frame(F=seq(max_F), AIC=sapply(mod_res, AIC))
## number of Fourier terms in best model
F_best <- which(aic_tbl$AIC==min(aic_tbl$AIC))
## forecast from best model
fore <- forecast(mod_res[[F_best]], xreg=fourierf(ln_dat,F_best,hrs_out))

Time series modelling: "train" function with method "nnet" is not giving satisfactory result

I was trying to implement the use of train function in R using nnet as method on monthly consumption data. But the output (the predicted values) are all showing to be equal to some mean value.
I have data for 24 time points (each representing a month's data) and I have used first 20 for training and the rest 4 for testing the model. Here is my code:
a<-read.csv("...",header=TRUE)
tem<-a[,5]
hum<-a[,4]
con<- a[,3]
require(quantmod)
require(nnet)
require(caret)
y<-con
plot(con,type="l")
dat <- data.frame( y, x1=tem, x2=hum)
names(dat) <- c('y','x1','x2')
#Fit model
model <- train(y ~ x1+x2,
dat[1:20,],
method='nnet',
linout=TRUE,
trace = FALSE)
ps <- predict(model2, dat[21:24,])
plot(1:24,y,type="l",col = 2)
lines(1:24,c(y[1:20],ps), col=3,type="o")
legend(5, 70, c("y", "pred"), cex=1.5, fill=2:3)
Any suggestion on how can I approach this problem alternatively? Is there any way to use Neural Network more efficiently? Or is there any other better method for this?
The problem is likely to be not enough data. 24 data points is quite low, for any machine learning problem. If the curve/shape/surface of the data is eg a simple sin wave, then 24 would be enough.
But for any more complex function, the more data the better. Can you accurately model eg sin^2 x * cos^0.3 x / sinh x with only 6 data points? No, because the available data does not capture enough detail.
If you can acquire daily data, use that instead.

Resources