Holt Winters For Weekly Volume And Errors In R - r

I'm trying to use Holt Winters and prediction function for stock index weekly volume from last 10 years, however i am still getting error. Can you help me please?
This is what i'm trying to do now:
volumen<-read.csv(file.choose(), header = TRUE, sep = ";")
lines(volumen[,6])
HoltWinters(volumen)
This is error I'm getting on third row:
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) :
the time series has no periods or has less than 2
For prediction i have below code, however it does not seems to work with previous error:
lines(predict(volumen.hw,n.ahead=12),col=2)
Data in R Studio looks correct. I have decided to use file.choose() to make this code more universal. I am using *.csv file. Could someone guide me or advise what the code should look like to apply the Holt and Winters method and prediction?

It's hard to be 100% sure but
HoltWinters(lynx)
generates the same message as you are gettin,g but
HoltWinters(lynx, gamma = FALSE)
generates
Holt-Winters exponential smoothing with trend and without seasonal
component.
Call: HoltWinters(x = lynx, gamma = FALSE)
Smoothing parameters:
alpha: 1
beta : 0
gamma: FALSE
Coefficients: [,1]
a 3396
b 52
Which I learned from reading the examples in the HoltWinters documentation.

first of all it would be nice if you put your data here (if it is not private).
Secondly as far as I know you only can user HoltWinters() or any other method in the forecasting package to a vector or a time series so loading the entire dataset (volume) without specifying the rows could lead you to a problem.
Finally I recommend you to try the HW to an auxiliary vector containing the data that you want to study and also specify the frequency of the time series:
aux_train<-as.ts(volumen$variable, start=1, end=0.9*nrow(volume), freq="yourfrecuency")
prediction<-forecast(aux_train, h="number of forecast", method="hw")
accuracy(prediction, volumen$value)

I have finally won this battle - I have deleted my code and started from scratch. Here is what I came with:
dane2<-read.csv2(file.choose(), header = TRUE, sep = ";", dec=",")
dane2 <-ts(dane2[,5], start=c(2008,1),frequency=52)
past <- window(dane2, end = 2017)
future <- window(dane2, start = 2017)
model <- HoltWinters(past, seasonal = "additive")
model2 <- HoltWinters(past, seasonal = "multiplicative")
pred <- predict(model, n.ahead = 52)
pred2 <- predict(model2, n.ahead = 52)
dane2.hw<-HoltWinters(dane2)
predict(dane2.hw,n.ahead=52)
par(mfrow = c(2,1))
plot(model, predicted.values = pred)
lines(future, col="blue")
plot(model2, predicted.values = pred2)
lines(future, col="blue")
Now it works, so thank you for your answers.

Related

How to decide the frequency while using the forecast function in R?

I have a series of daily data from 01-01-2014 to 31-01-2022. I want to predict the next 30 days. I am using auto.arima and it has some exogenous variables attached.
Here's the code: -
datax$NMD1<-(datax$NMD1/1000000000)
#Here to make an Arima series out of NMD 1. Exogenous variables here.
ts1<- ts(datax, frequency = 1)
class(ts1)
colnames(ts1)
autoplot(ts1[,"NMD1"])
#defining the set of exogenous variables
xset<- as.matrix(ts1[,"1Y TD INTEREST RATE"], ts1[,"BSE"], ts1[,"Repo Rate"], ts1[,"MIBOR Rate"], ts1[,"1Y OIS Rate" ], ts1[,"3M CD rate(PSU)"], ts1[,"2 Y GSec Rate"])
#Fitting the model
model1 <- auto.arima(ts1[,'NMD1'], xreg=xset, approximation = FALSE, allowmean = FALSE, allowdrift = FALSE)
summary(model1)
checkresiduals(model1)
fcast <- forecast(model1,xreg=xset, h=1)
print(summary(fcast))
autoplot(fcast)
My problems: -
While my model seems to work fine, I am not able to understand what value of h shall i put while forecasting. I also don't understand what frequency really is while we define a time series.
Please help.

R and multiple time series and Error in model.frame.default: variable lengths differ

I am new to R and I am using it to analyse time series data (I am also new to this).
I have quarterly data for 15 years and I am interested in exploring the interplay between drinking and smoking rates in young people - treating smoking as the outcome variable. I was advised to use the gls command in the nlme package as this would allow me to include AR and MA terms. I know I could use more complex approaches like ARIMAX but as a first step, I would like to use simpler models.
After loading the data, specify the time series
data.ts = ts(data=data$smoke, frequency=4, start=c(data[1, "Year"], data[1, "Quarter"]))
data.ts.dec = decompose(data.ts)
After decomposing the data and some tests (KPSS and ADF test), it is clear that the data are not stationary so I differenced the data:
diff_dv<-diff(data$smoke, difference=1)
plot.ts(diff_dv, main="differenced")
data.diff.ts = ts(diff_dv, frequency=4, start=c(hse[1, "Year"], hse[1, "Quarter"]))
The ACF and PACF plots suggest AR(2) should also be included so I set up the model as:
mod.gls = gls(diff_dv ~ drink+time , data = data,
correlation=corARMA(p=2), method="ML")
However, when I run this command I get the following:
"Error in model.frame.default: variable lengths differ".
I understand from previous posts that this is due to the differencing and the fact that the diff_dv is now shorter. I have attempted fixing this by modifying the code but neither approach works:
mod.gls = gls(diff_dv ~ drink+time , data = data[1:(length(data)-1), ],
correlation=corARMA(p=2), method="ML")
mod.gls = gls(I(c(diff(smoke), NA)) ~ drink+time+as.factor(quarterly) , data = data,
correlation=corARMA(p=2), method="ML")
Can anyone help with this? Is there a workaround which would allow me to run the -gls- command or is there an alternative approach which would be equivalent to the -gls- command?
As a side question, is it OK to include time as I do - a variable with values 1 to 60? A similar question is for the quarters which I included as dummies to adjust for possible seasonality - is this OK?
Your help is greatly appreciated!
Specify na.action = na.omit or na.action = na.exclude to omit the rows with NA's. Here is an example using the built-in Ovary data set. See ?na.fail for info on the differences between these two.
Ovary2 <- transform(Ovary, dfoll = c(NA, diff(follicles)))
gls(dfoll ~ sin(2*pi*Time) + cos(2*pi*Time), Ovary2,
correlation = corAR1(form = ~ 1 | Mare), na.action = na.exclude)

Auto.Arima incorrectly predicts first point

I'm trying to complete a time series analysis of some reservoir data and am using auto.arima with a Fourier component to account for seasonality, as described here https://otexts.com/fpp2/dhr.html#dhr The code I have used is shown below and the dataset I used can be found here https://www.dropbox.com/sh/563nu3daeid0agb/AAB6NSddVUKgBCCbQtuqXPsZa?dl=0
Reservoir = read.csv("Reservoir1.csv",TRUE,",")
#impute missing data from data set
Reservoir = imputeTS::na_interpolation(Reservoir)
#Create Time Series
Reservoir = ts(Reservoir[,2],frequency = (365.25),start = c(2013,116))
plots = list()
for (i in seq (10)) {
fit = auto.arima(Reservoir, xreg = fourier(Reservoir, K = i), seasonal = FALSE)
plots[[i]] = autoplot(forecast(fit, xreg = fourier(Reservoir, K = i, h=10))) +
xlab(paste("K=",i,"AICC=",round(fit[["aicc"]],2))) + ylab("")
}
gridExtra::grid.arrange(plots[[1]],plots[[2]],plots[[3]],plots[[4]],plots[[5]],
plots[[6]],plots[[7]],plots[[8]],plots[[9]],plots[[10]],
nrow=5)
bestfit = auto.arima(Reservoir, xreg=fourier(Reservoir, K=9), seasonal=FALSE)
summary(bestfit)
checkresiduals(bestfit)
plot(Reservoir,col="red")
lines(fitted(bestfit),col="blue")
The model fits well, except for the incorrect first prediction. I'm lost as to why only this value would be so far off. Or, is this an acceptable error?
The residuals are the one-step forecast errors using all previous observations. At time 1, the residual is the forecast error with no previous observations, so it is simply based on the fitted model. In fact, it is an artificially "good" forecast because the differencing means there is no way for the model to know the location of the data until there is an observation. But the way ARIMA models are implemented in R makes the first prediction use a little more information than it should.

ARFIMA model and accurancy function

I am foresting with data sets from fpp2 package and forecast package. So my intention is to make automatic forecasting with a several time series. So for that reason I am forecasting with function. You can see code below:
# CODE
library(fpp2)
library(dplyr)
library(forecast)
df<-qauselec
# Forecasting function
fct_fun <- function(Z, hrz = forecast_horizon) {
timeseries <- msts(Z, start = 1956, seasonal.periods = 4)
forecast <- arfima(timeseries)
}
acc_list <- lapply(X = df, fct_fun)
So next step is to check accuracy of model. So for that reason I am trying with this line of code you can see below
accurancy_arfima <- lapply(acc_list, accuracy)
Until now this line of code or function accuracy worked perfectly with other models like snaive,ets etc. but with arfima can’t work properly.
So can anybody help me how to resolve this problem with accuracy function?
Follow R-documentation, Returns range of summary measures of the forecast accuracy. If x is provided, the function measures test set forecast accuracy based on x-f . If x is not provided, the function only produces training set accuracy measures of the forecasts based on f["x"]-fitted(f).
And usage summary can be seen :
accuracy(f, x, test = NULL, d = NULL, D = NULL,
...)
So :
accuracy(acc_list[[1]]$fitted, df)
If you want to evaluate separately accuracy, It will work.
a <- c()
for (i in 1:4) {
b <- accuracy(df[i], acc_list[[1]]$fitted[i])
a <- rbind(a,b)
}

Error in rep(1, n.ahead) : invalid 'times' argument in R

I'm working on dataset to forecast with ARIMA, and I'm so close to the last step but I'm getting error and couldn't find reference to figure out what I'm missing.
I keep getting error message when I do the following command:
ForcastData<-forecast(fitModel,testData)
Error in rep(1, n.ahead) : invalid 'times' argument
I'll give brief view on the work I did where I have changed my dataset from data frame to Time series and did all tests to check volatility, and Detect if data stationary or not.
Then I got the DataAsStationary as good clean data to apply ARIMA, but since I wanna train the model on train data and test it on the other part of the data, I splitted dataset into training 70% and testing 30%:
ind <-sample(2, nrow(DataAsStationary), replace = TRUE, prob = c(0.7,0.3))
traingData<- DataStationary1[ind==1,]
testData<- DataStationary1[ind==2,]
I used Automatic Selection Algorithm and found that Arima(2,0,3) is the best.
autoARIMAFastTrain1<- auto.arima(traingData, trace= TRUE, ic ="aicc", approximation = FALSE, stepwise = FALSE)
I have to mentioned that I did check the if residuals are Uncorrelated (White Noise) and deal with it.
library(tseries)
library(astsa)
library(forecast)
After that I used the training dataset to fit the model:
fitModel <- Arima(traingData, order=c(2,0,3))
fitted(fitModel)
ForcastData<-forecast(fitModel,testData)
output <- cbind(testData, ForcastData)
accuracy(testData, ForcastData)
plot(outp)
Couldn't find any resource about the error:
Error in rep(1, n.ahead) : invalid 'times' argument
Any suggestions!! Really
I tried
ForcastData<-forecast.Arima(fitModel,testData)
but I get error that
forecast.Arima not found !
Any idea why I get the error?
You need to specify the arguments to forecast() a little differently; since you didn't post example data, I'll demonstrate with the gold dataset in the forecast package:
library(forecast)
data(gold)
trainingData <- gold[1:554]
testData <- gold[555:1108]
fitModel <- Arima(trainingData, order=c(2, 0, 3))
ForcastData <- forecast(fitModel, testData)
# Error in rep(1, n.ahead) : invalid 'times' argument
ForcastData <- forecast(object=testData, model=fitModel) # no error
accuracy(f=ForcastData) # you only need to give ForcastData; see help(accuracy)
ME RMSE MAE MPE MAPE MASE
Training set 0.4751156 6.951257 3.286692 0.09488746 0.7316996 1.000819
ACF1
Training set -0.2386402
You may want to spend some time with the forecast package documentation to see what the arguments for the various functions are named and in what order they appear.
Regarding your forecast.Arima not found error, you can see this answer to a different question regarding the forecast package -- essentially that function isn't meant to be called by the user, but rather called by the forecast function.
EDIT:
After receiving your comment, it seems the following might help:
library(forecast)
# Read in the data
full_data <- read.csv('~/Downloads/onevalue1.csv')
full_data$UnixHour <- as.Date(full_data$UnixHour)
# Split the sample
training_indices <- 1:floor(0.7 * nrow(full_data))
training_data <- full_data$Lane1Flow[training_indices]
test_data <- full_data$Lane1Flow[-training_indices]
# Use automatic model selection:
autoARIMAFastTrain1 <- auto.arima(training_data, trace=TRUE, ic ="aicc",
approximation=FALSE, stepwise=FALSE)
# Fit the model on test data:
fit_model <- Arima(training_data, order=c(2, 0, 3))
# Do forecasting
forecast_data <- forecast(object=test_data, model=fit_model)
# And plot the forecasted values vs. the actual test data:
plot(x=test_data, y=forecast_data$fitted, xlab='Actual', ylab='Predicted')
# It could help more to look at the following plot:
plot(test_data, type='l', col=rgb(0, 0, 1, alpha=0.7),
xlab='Time', ylab='Value', xaxt='n', ylim=c(0, max(forecast_data$fitted)))
ticks <- seq(from=1, to=length(test_data), by=floor(length(test_data)/4))
times <- full_data$UnixHour[-training_indices]
axis(1, lwd=0, lwd.ticks=1, at=ticks, labels=times[ticks])
lines(forecast_data$fitted, col=rgb(1, 0, 0, alpha=0.7))
legend('topright', legend=c('Actual', 'Predicted'), col=c('blue', 'red'),
lty=1, bty='n')
I was able to run
ForcastData <- forecast(object=testData, model=fitModel)
without no error
and Now want to plot the testData and the forecasting data and check if my model is accurate:
so I did:
output <- cbind(testData, ForcastData)
plot(output)
and gave me the error:
Error in error(x, ...) :
improper length of one or more arguments to merge.xts
So when I checked ForcastData, it gave the output:
> ForcastData
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2293201 -20.2831770 -308.7474 268.1810 -461.4511 420.8847
2296801 -20.1765782 -346.6400 306.2868 -519.4593 479.1061
2300401 -18.3975657 -348.8556 312.0605 -523.7896 486.9945
2304001 -2.2829565 -332.7483 328.1824 -507.6860 503.1201
2307601 2.7023277 -327.8611 333.2658 -502.8509 508.2555
2311201 4.5777316 -328.6756 337.8311 -505.0893 514.2447
2314801 4.3198927 -331.4470 340.0868 -509.1913 517.8310
2318401 3.8277285 -332.7898 340.4453 -510.9844 518.6398
2322001 1.4364973 -335.2403 338.1133 -513.4662 516.3392
2325601 -0.4013561 -337.0807 336.2780 -515.3080 514.5053
I thought I will get list of result as I have in my testData. I need to get the chart that shows 2 lines of actual data(testData), and expected data(ForcastData).
I have really went through many documentation about forcast, but I can't find something explain what I wanna do.

Resources