Forecast() function in R: how it works? - r

I have a doubt related to the forecast () function from the package Forecast.
I am using this function for forecasting the closing price of a stock given an ARIMAX model (with xreg). The doubt is: when it is forecasting, the closing price at time t depends on the external regressors at time t-1 or it (closing price) depends on the external regressors at time t?
In other words, today I still don't know the high price (i.e.) so the closing price of today cannot depend on the high price of today, but on the one of yesterday.
This function works like that or in a different way?
I hope I have been clear. Thanks!

you can setup the function to work like this yes! Though there are some steps to take:
lag the regressor as you want yesterdays value to explain todays
clean values without regressor (first value of timeseries got no regressor as it will be used for the second value of the ts)
build the regressor for prediction
model and predict
Below I wrangled something together from a few links that shows how it can be done and thus should explain how prediction with regressor in your case works with forecast:
library(quantmod)
library(forecast)
library(dplyr)
# get some finance data to play with
quantmod::getSymbols("AAPL", from = '2017-01-01',
to = "2018-03-01",warnings = FALSE,
auto.assign = TRUE)
# I prefer working with df and then convert to ts objects later
new_AAPL <- as.data.frame(AAPL)%>%
# select close values and lag high values
dplyr::transmute(AAPL.Close,
AAPL.High = lag(AAPL.High)) %>%
# keep only complete values
dplyr::filter(across(everything(), ~!is.na(.x)))
# set up new time series, regressor (watch the starting points)
AAPL.Close <- ts(new_AAPL$AAPL.Close, start = as.Date("2017-01-04"), frequency = 365)
AAPL.High <- ts(new_AAPL$AAPL.High, start = as.Date("2017-01-04"), frequency = 365)
# set up the future regressor (last value of original high values
AAPL.futureg <- ts(as.data.frame(AAPL)$AAPL.High[291], start = as.Date("2018-03-02"), frequency = 365)
# I will use a arima model here
modArima <- forecast::auto.arima(AAPL.Close, xreg=AAPL.High)
# forecast with regressor
forecast::forecast(modArima, h = 1, xreg = AAPL.futureg)
Here is where I got the infos from:
https://www.codingfinance.com/post/2018-03-27-download-price/
https://stats.stackexchange.com/questions/41070/how-to-setup-xreg-argument-in-auto-arima-in-r

Related

Hierarchical forecasting of time series including missing values (R)

I am trying to forecast a hierarchical time series including missing values.
I expect the same behavior like auto.arima for a single time series.
The missing values should not influence the result and also not be displayed.
fit = ts %>% auto.arima()
forecast(fit, h=20) %>% autoplot()
But when I try to forecast the hierarchical time series, the NAs are automatically replaced by 0.
This influences the results dramatically.
Both of the following functions have the same output:
hts_fc <- forecast(object = hts
, h = 20
, fmethod = "arima"
)
hts_fc <- forecast(object = hts
, h = 20
, FUN = auto.arima
)
plot(hts_fc)

ARIMA Issues in RStudio - ARIMA for Stocks

This is my first post here on this platform. I'm an student in Business Administration so please have mercy with my nooby questions.
I'm currently creating ARIMA Models for some Stocks respectively their closing prices. However, when plotting the forecasts, all I get is a straight line with a little bit of drift. But that's it. I don't get any clear patterns for example, no ups and downs in the forecast, just straight line with drift.
I'm not sure if I did any kind of mistake maybe.
install.packages(quantmod)
install.packages(tseries)
install.packages(timeSeries)
install.packages(forecast)
install.packages(MASS)
install.packages(ggplot2)
install.packages(zoo)
install.packages(xts)
library(quantmod)
library(tseries)
library(timeSeries)
library(forecast)
library(MASS)
library(ggplot2)
library(zoo)
library(xts)
# load data
energy = getSymbols(Symbols = "XLES.L", auto.assign = F, from = "2015-01-01", to = "2020-01-01")
# remove NAs
energy <- na.omit(energy$XLES.L.Close)
plot(energy)
# create TS
ts <- ts(energy, start = c(2015,01), frequency = 252)
plot(ts) #does not seem stationary
# check for stationarity
adf.test(ts) # --> not stationariy, differencing required
#Create Arima Model
arima <- auto.arima(ts, d = 1)
arima
# Create Forecast (Out-Of-Sample for 20days/1month)
forecast_energy <- forecast(arima, h = 20)
plot(forecast_energy)
plot(forecast_energy, include = 50)
My questions are:
Why is it a straight line?
Is it necessary to create a Time Series with the ts-function since the data imported is already in a ts (or is it not?)
Is this correct what I did?
HERE THE PLOTS:
HERE THE PRINT
> print(arima)
Series: ts
ARIMA(2,1,0)
Coefficients:
ar1 ar2
0.0125 -0.0502
s.e. 0.0283 0.0283
sigma^2 estimated as 20.19: log likelihood=-3682.99
AIC=7371.98 AICc=7372 BIC=7387.4
Can someone please help me :)
Best regards
Noob
An example of simple signal, which is able to break auto arima
library(forecast)
set.seed(1)
mynoise <- rnorm(252*5,0,sd = 100) # high short term noise, non integrated
mytrend <- 1:(252*5) # long term trend
mysignal <- mynoise+mytrend
library(forecast)
mymodel <- auto.arima(mysignal)
plot(forecast(mymodel,50))
the difference of the signal is u=1+e-lag(e) and lag(u)=1+lag(e)-lag2(e)
let epsilon be e-lag(e) it is an ar1 with epsilon=-lag(epsilon)+e
So the process is likely to be seen as a stationnary 011, with 1 non very significative, and then auto arima estimates an arima(0,1,1) with the moving average term around -1.
Which is not a total fail : it's decent for short terms predictions, but it makes silly long term predictions.
You are getting forecast as a straight line because your model is not able to find and seasonality in data, when this happen the model simply take average of your historical data and generate forecast, that is why you are getting straight line.
It is very difficult for a model to forecast accurately with out any good seasonality and trend present in historical data.

LSTM time series forecasting, predictions stabilize

My code is in R using the Keras and Tensorflow libraries. I'm creating an LSTM model to forecast 100 future values. My input shape is (100,200,1).
Let's say my input data is X. I make a prediction at time step t=201 and get the column Y of predictions. Then I create Xnew = c(X[2:200],Y) a new variable where I concatenate X (except for the first column) and Y. I use this Xnew to predict the next time step.
What's happening is that, after a certain number of predicted future time steps (around 15), the predictions become constant for each time step afterwards. Does anyone know why this happens?
prdvec = function(dat,modname, numpreds, cnt, scl){
model = load_model_hdf5(modname)
inpt = dat
pred = list()
for(i in 1:numpreds){
pred[[i]] <- predict(model, reshape_X_3d((inpt[,1:ncol(inpt)]-cnt)/scl), batch_size = 1)
inpt = cbind(inpt[,2:ncol(inpt)],(pred[[i]]*scl+cnt))
print(i)
flush.console()
}
pred
}
I encounter a similar problem. Maybe when the LSTM units take into input created by itself, it tends to stabilize.

Imputed predictions for missing time-series data nearly stationary (flat line)

I have player over time data that is missing player counts over several years. I'm trying to fill in/predict the missing player count data over different intervals.
Data available here: https://1drv.ms/u/s!AvEZ_QPY7OZuhJAlKJN89rH185SUhA
I'm following the instructions below that use KalmanRun to impute the missing values. I've tried 3 different approaches to transforming the data- using an xts object, and 2 approaches to converting it into time series data
https://stats.stackexchange.com/questions/104565/how-to-use-auto-arima-to-impute-missing-values
require(forecast)
library(xts)
library(anytime)
library(DescTools)
df_temp = read.csv("r_share.csv")
df_temp[['DateTime']] <- as.Date(strptime(df_temp[['DateTime']], format='%Y-%m-%d %H:%M:%S'))
3 approaches to convert data; xts seems to work best by returning non-zero data that is interpretable.
#Convert df_temp to TimeSeries object
df_temp = xts(df_temp$Players, df_temp$DateTime)
#df_temp = as.ts(log(df_temp$Players), start = start(df_temp$DateTime), end = end(df_temp$DateTime), frequency = 365)
#df_temp = ts(df_temp$Players, start = c(2013, 02, 02), end = c(2016, 01, 31), frequency = 365)
Fitting and plotting:
fit <- auto.arima(df_temp, seasonal = TRUE)
id.na <- which(is.na(df_temp))
kr <- KalmanRun(index(df_temp), fit$model, update = FALSE)
#?KalmanRun$tol
for (i in id.na)
df_temp[i] <- fit$model$Z %*% kr$states[i,]
plot(df_temp)
The expected output is data that mimics the variability seen in the actual data and is different for each interval, whereas the actual output is relatively stationary and unchanging (both intervals have nearly the same prediction).
It needs to be with model arima()?.
Maybe you could try with another model, developed by Facebook named Prophet.
Here you can find the guide and github page.
If I understood you want something like this:
# Import library
library(prophet)
# Read data
df = read.csv("C:/Users/Downloads/r_share.csv",sep = ";")
# Transform to date
df["DateTime"] = as.Date(df$DateTime,format = "%d/%m/%Y")
# Change names for the model
colnames(df) = c("ds","y")
# call model
m = prophet(df)
# make "future" just one day greater than past
future = make_future_dataframe(m,periods = 1)
# predict the points
forecast = predict(m,future)
# plot results
plot(m,forecast)

How to Create a R TimeSeries for Hourly data

I have hourly snapshot of an event starting from 2012-05-15-0700 to 2013-05-17-1800. How can I create a Timeseries on this data and perform HoltWinters to it?
I tried the following
EventData<-ts(Eventmatrix$X20030,start=c(2012,5,15),frequency=8000)
HoltWinters(EventData)
But I got Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) : time series has no or less than 2 periods
What value should I put from Frequency?
I think you should consider using ets from the package forecast to perform exponential smoothing. Read this post to have a comparison between HoltWinters and ets .
require(xts)
require(forecast)
time_index <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by = "hour")
set.seed(1)
value <- rnorm(n = length(time_index))
eventdata <- xts(value, order.by = time_index)
ets(eventdata)
Now if you want to know more about the syntax of ets check the help of this function and the online book of Rob Hyndman (Chap 7 section 6)
Please take a look at the following post which might answer the question:
Decompose xts hourly time series
Its explains how you can create a xts object using POSIXct objects. This xts object can have its frequency attribute set manually and you will probably then be able to use HoltWinters

Resources