Forecasting a time series in R with 3 independent variables - r

I have a time series of revenue and need to forecast revenue for 3 years.
My dependent variable is Revenue and independent variables are GDP, Company wealth, and S and P 500 Index.
How should I go about it?
Can a simple Linear regression model work?

Look into the "forecast" package in R. If your timeseries is x and your independent variables are in a matrix mat, you can utilize the auto.arima() function to automatically fit an arima model with covariates.
library(forecast)
mod <- auto.arima(x, xreg = mat)
# Forecast 12 periods
forecast(mod, h = 12)

Related

Is there a way to do Dynamic Harmonic Regression in R using multiple variables?

The company I work for would like to forecast weekly transactions, given a certain weekly sales budget (i.e. predicted weekly sales) for a period of time.
We are a highly seasonal business, particularly around holidays, which do not always have a fixed recurring date, e.g. Easter.
ETS smoothing didn't seem applicable, and ARIMA modelling is poor working with high integer seasonal periods (e.g. days/weeks rather than months/quarters), and also struggles with a non-integer seasonal period (i.e. 52 weeks some years, 53 weeks other years).
My best bet seemed to be Dynamic Harmonic Regression with ARIMA errors, however, I don't know how (or if it's possible) to do this with multiple variables. Specifically Sales and Transactions, but potentially also dummy variables representing holidays. Is this possible?
Datacamp has the following R code demonstrating Multivariate Dynamic Regression:
# Matrix of regressors
xreg <- cbind(MaxTemp = elec[, "Temperature"],
MaxTempSq = elec[, "Temperature"]**2,
Workday = elec[, "Workday"])
# Fit model
fit <- auto.arima(elec[,"Demand"], xreg = xreg)
# Forecast fit one day ahead
forecast(fit, xreg = cbind(20, 20^2, 1))
It also has the following code demonstrating Dynamic Harmonic Regression:
# Set up harmonic regressors of order 13
harmonics <- fourier(gasoline, K = 13)
# Fit regression model with ARIMA errors
fit <- auto.arima(gasoline, xreg = harmonics, seasonal = FALSE)
# Forecasts next 3 years (weekly data)
newharmonics <- fourier(gasoline, K = 13, h = 156)
fc <- forecast(fit, xreg = newharmonics)
Is it possible to combine these two methodologies? That is, Multivariate Dynamic Harmonic Regression? Or is there perhaps a different way to go about tackling my particular problem?

How to remove seasonality and trend from GDP time series data in R

I am doing a time series analysis to forecast the GDP for the next years and in order to get a good forecasting model I need to remove the trend and the seasonality.
I have used the seasonally adjusted data but it did not completely remove the trend and seasonality of the data. I am using the multiplicative method to remove trend and seasonality.
Seasonally adjusted GDP
decmopose_GDP <- decompose(GDP, 'multiplicative')
adjustGDP <- GDP/decmopose_GDP$seasonal
plot(adjustGDP)
Does anyone know any other method to remove trend and seasonality from the time series ?
You can try categorical variable of seasons and splines for time. For example, the model can be . In the model, X contains indicator variable of seasons and also splines for time (you can give them specific degree of freedom). Then the GDP with seasonality and time trend removed will be obtained, i.e., residuals of the model. The code can be as follows.
##Fitting a model with season and time variable
model1 <- lm(gdp ~ cat_season + ns(time, df = n))
##Extract the GDP without time trend
GDP_withouttrend <- resid(model1)
##Plot the GDP without trend
plot(GDP_withouttrend)
In case it may be useful for those who still read this old question, there are many R packages (e.g., stl)available to decompose a time series into seasonality, trend, and remainder. I believe what is being looked for here is the remainder component. Here I use a package called Rbeast developed by me as an example. Rbeast does time series decomposition and changepoint detection at the same time.
library(Rbeast)
Y = covid19$newcases # covi19 is a daily time series coming with the Rbeast package
# the 'seasonality'/periodicity is 7 days (a week)
o = beast(Y)
plot(o)
remainder = Y - o$trend$Y -o$season$Y
library(Rbeast)
Ylog = log( covid19$newcases+0.001) # treat covid19 using a multiplicative decomposition model
# the first datapoint is 0; adding 0.001 to nudge it a bit to avoid getting a inf from the log
o = beast(Ylog,freq=7)
plot(o)
remainder = exp( Ylog-o$trend$Y -o$season$Y )

Forecast using auto.arima using exogenous variables

I would love to be able to use the exogenous variables to help in the arima forecast. I run into issues in everyway i try to use the variables outside of the one I am trying to forecast.
I would also love for the actual plot to be more beautiful than the default r.
Error in auto.arima(datats1$Slots, seasonal = TRUE, xreg = datats1ts) :
xreg is rank deficient
All of the problems associated with having rank deficient dataframe or matrix do not hold. There are no linear combinations in the dataset.
#Load Data
datats1 <- read.csv("ProjectTS2.CSV") # Time Series I want to forecast
xreg <- read.csv("ProjectTS4.CSV") #Data is want to use as exogenous
datats1$Slots <- ts(datats1$slots, start=2015,frequency=365)
dfTS<-as.matrix(ts(xreg))
new<-auto.arima(datats1$slots,seasonal=TRUE,xreg=dfTS)
seas_fcast <- forecast(new, h=30)
ts.plot(seas_fcast,xlim=c(2018,2018.2))

forecasting univariate time series with svm in R

I have a time series of 100 observations. I fit a svm regression model and I want to predict the value for y_{101}, but when I do it in R it returns 100 predictions, because it assumes I have 100 classifiers. How can I obtain one-step ahead forecast?
require(e1071)
y=cumsum(rnorm(100))
x=1:100
DF <- data.frame(x = x, y = y)
svm(DF$y~DF$x,type="eps-regression",kernel="radial",cost=10000, gamma=10)
predict(aa, newdata=101)

Back Test ARIMA model with Exogenous Regressors

Is there a way to create a holdout/back test sample in following ARIMA model with exogenous regressors. Lets say I want to estimate the following model using the first 50 observations and then evaluate model performance on the remaining 20 observations where the x-variables are pre-populated for all 70 observations. What I really want at the end is a graph that plots actual and fitted values in development period and validation/hold out period (also known as Back Testing in time series)
library(TSA)
xreg <- cbind(GNP, Time_Scaled_CO) # two time series objects
fit_A <- arima(Charge_Off,order=c(1,1,0),xreg) # Charge_Off is another TS object
plot(Charge_Off,col="red")
lines(predict(fit_A, Data),col="green") #Data contains Charge_Off, GNP, Time_Scaled_CO
You don't seem to be using the TSA package at all, and you don't need to for this problem. Here is some code that should do what you want.
library(forecast)
xreg <- cbind(GNP, Time_Scaled_CO)
training <- window(Charge_Off, end=50)
test <- window(Charge_Off, start=51)
fit_A <- Arima(training,order=c(1,1,0),xreg=xreg[1:50,])
fc <- forecast(fit_A, h=20, xreg=xreg[51:70,])
plot(fc)
lines(test, col="red")
accuracy(fc, test)
See http://otexts.com/fpp/9/1 for an intro to using R with these models.

Resources