I'm trying to create a forecast using an exponential smoothing method, but get the error "nonseasonal data". This is clearly not true - see code below.
Why am I getting this error? Should I use a different function (it should be able to perform simple, double, damped trend, seasonal, Winters method)?
library(forecast)
timelen<-48 # use 48 months
dates<-seq(from=as.Date("2008/1/1"), by="month", length.out=timelen)
# create seasonal data
time<-seq(1,timelen)
season<-sin(2*pi*time/12)
constant<-40
noise<-rnorm(timelen,mean=0,sd=0.1)
trend<-time*0.01
values<-constant+season+trend+noise
# create time series object
timeseries<-as.ts(x=values,start=min(dates),end=max(dates),frequency=1)
plot(timeseries)
# forecast MAM
ets<-ets(timeseries,model="MAM") # ANN works, why MAM not?
ets.forecast<-forecast(ets,h=24,level=0.9)
plot(ets.forecast)
Thanks&kind regards
You should use ts simply to create a time series from a numeric vector. See the help file for more details.
Your start and end values aren't correctly specified.
And setting the frequency at 1 is not a valid seasonality, it's the same as no seasonality at all.
Try:
timeseries <- ts(data=values, frequency=12)
ets <- ets(timeseries, model="MAM")
print(ets)
#### ETS(M,A,M)
#### Call:
#### ets(y = timeseries, model = "MAM")
#### ...
The question in your comments, why ANN works is because the third N means no seasonnality, so the model can be computed even with a non-seasonal timeseries.
Related
This is my first post here on this platform. I'm an student in Business Administration so please have mercy with my nooby questions.
I'm currently creating ARIMA Models for some Stocks respectively their closing prices. However, when plotting the forecasts, all I get is a straight line with a little bit of drift. But that's it. I don't get any clear patterns for example, no ups and downs in the forecast, just straight line with drift.
I'm not sure if I did any kind of mistake maybe.
install.packages(quantmod)
install.packages(tseries)
install.packages(timeSeries)
install.packages(forecast)
install.packages(MASS)
install.packages(ggplot2)
install.packages(zoo)
install.packages(xts)
library(quantmod)
library(tseries)
library(timeSeries)
library(forecast)
library(MASS)
library(ggplot2)
library(zoo)
library(xts)
# load data
energy = getSymbols(Symbols = "XLES.L", auto.assign = F, from = "2015-01-01", to = "2020-01-01")
# remove NAs
energy <- na.omit(energy$XLES.L.Close)
plot(energy)
# create TS
ts <- ts(energy, start = c(2015,01), frequency = 252)
plot(ts) #does not seem stationary
# check for stationarity
adf.test(ts) # --> not stationariy, differencing required
#Create Arima Model
arima <- auto.arima(ts, d = 1)
arima
# Create Forecast (Out-Of-Sample for 20days/1month)
forecast_energy <- forecast(arima, h = 20)
plot(forecast_energy)
plot(forecast_energy, include = 50)
My questions are:
Why is it a straight line?
Is it necessary to create a Time Series with the ts-function since the data imported is already in a ts (or is it not?)
Is this correct what I did?
HERE THE PLOTS:
HERE THE PRINT
> print(arima)
Series: ts
ARIMA(2,1,0)
Coefficients:
ar1 ar2
0.0125 -0.0502
s.e. 0.0283 0.0283
sigma^2 estimated as 20.19: log likelihood=-3682.99
AIC=7371.98 AICc=7372 BIC=7387.4
Can someone please help me :)
Best regards
Noob
An example of simple signal, which is able to break auto arima
library(forecast)
set.seed(1)
mynoise <- rnorm(252*5,0,sd = 100) # high short term noise, non integrated
mytrend <- 1:(252*5) # long term trend
mysignal <- mynoise+mytrend
library(forecast)
mymodel <- auto.arima(mysignal)
plot(forecast(mymodel,50))
the difference of the signal is u=1+e-lag(e) and lag(u)=1+lag(e)-lag2(e)
let epsilon be e-lag(e) it is an ar1 with epsilon=-lag(epsilon)+e
So the process is likely to be seen as a stationnary 011, with 1 non very significative, and then auto arima estimates an arima(0,1,1) with the moving average term around -1.
Which is not a total fail : it's decent for short terms predictions, but it makes silly long term predictions.
You are getting forecast as a straight line because your model is not able to find and seasonality in data, when this happen the model simply take average of your historical data and generate forecast, that is why you are getting straight line.
It is very difficult for a model to forecast accurately with out any good seasonality and trend present in historical data.
The complete R data and code for my question is here: https://pastebin.com/QtG6A7ZX.
I am new to R and still a beginner when it comes to time series analysis, so please forgive my ignorance.
I am attempting to model and forecast some enrollment data with 2 dummy-coded regressors. I have already used auto.arima to fit the model:
model <- auto.arima(enroll, xreg=x)
Before I forecast with this model, I am attempting to test its accuracy by selecting only a part of the time series (1:102 instead of 1:112), and likewise, a partial list of regressors.
Based on auto.arima, I fit the partial model as follows:
model_par <-arima((enroll_partial), c(1, 1, 1),seasonal = list(order = c(1, 0, 0), period = 5), xreg=x_par)
I have tried three different ways to forecast and get essentially the same error:
fcast_par <- forecast(model_par, h=10) #error
fcast_par <- forecast(model_par, h=10, xreg=x_par) #error
fcast_par <- forecast(model_par, h=10, xreg=forecast(x_par,h=10)) #error
'xreg' and 'newxreg' have different numbers of columns
I have tested using auto.arima with the partial data. That works, but gives me a different model and, although I specified 10 predictions, I get over 50:
model_par2 <- auto.arima(enroll_partial, xreg=x_par)
fcast_par <- forecast(model_par2, h=12, xreg=x_par)
fcast_par
So, my main question is, how do I specify an exact model and predict using more than 1 regressor given my data (see Paste Bin link above)?
The forecast() function is from the forecast package, and works with model functions that are from that package. This is why it is possible to produce forecasts from auto.arima() using forecast(model_par2,xreg=x_fcst).
The arima() function comes from the stats package, and so there are no guarantees that it would work with forecast(). To specify your own ARIMA model, you can use the Arima() function, which behaves very similarly to arima(), but you will be able to produce forecasts from it using forecast(model_par2,xreg=x_fcst).
You have two problems. One of them is that the various forecasting functions in R are making it (intentionally?) difficult on you.
The first problem is that you need to define the values of your regressors for the forecasting period. Extract the relevant data from x by using window():
x_fcst <- window(x,start=c(2017,4))
The second problem is that forecast() (which dispatches to forecast.Arima()) is a red herring here. You need to use predict() (which dispatches to predict.Arima() - note the capitalization in both cases!):
predict(model_par,newxreg=x_fcst,h=nrow(x_fcst))
which yields
$pred
Time Series:
Start = c(2017, 3)
End = c(2019, 1)
Frequency = 5
[1] 52.00451 52.00451 52.00451 52.00451 52.00451 52.00451 52.00451 52.00451
[9] 52.00451
$se
Time Series:
Start = c(2017, 3)
End = c(2017, 3)
Frequency = 5
[1] 17.13345
You can also use auto.arima(). Confusingly enough, this time forecast() (which still dispatches to forecast.Arima()) does work:
model_par2 <- auto.arima(enroll_partial, xreg=x_par)
forecast(model_par2,xreg=x_fcst)
which yields
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2017.40 39.91035 17.612358 62.20834 5.808514 74.01219
2017.60 59.51003 32.783451 86.23661 18.635254 100.38481
2017.80 69.81000 39.290834 100.32917 23.134962 116.48505
2018.00 57.49140 23.601444 91.38136 5.661183 109.32162
2018.20 55.45759 18.503034 92.41214 -1.059524 111.97470
2018.40 34.57866 -7.306747 76.46406 -29.479541 98.63686
2018.60 52.30199 6.702068 97.90192 -17.437074 122.04106
2018.80 61.61591 12.582055 110.64977 -13.374900 136.60672
2019.00 50.47661 -1.765945 102.71917 -29.421485 130.37471
And yes, you do get five times as many predictions. The first column is an expectation forecast, and the others give prediction intervals. These are governed by the level parameter to forecast().
I have been working on a script in R that will predict a number.
# Load Forecast library
library(forecast)
# Load dataset
bwi <- read.csv(file="C:/Users/nsoria/Downloads/AMS Globales/TEC_BWI.csv", header=TRUE, sep=';', dec=",")
# Create time series starting in January 2015
ts_bwi <- ts(bwi$BWI, frequency = 12, start = c(2015,1))
# Pull out the seasonal, trend, and irregular components from the time series
model <- stl(ts_bwi, s.window = "periodic")
# Predict the next 5 months of SLA
pred <- forecast(model, h = 5)
# Plot the results
plot(pronostico)
This output gives this
Somehow, the forecasted line is not linked with the actual values.
Question: How can I make the line linked from the last known value to the first forecasted value?
Edit 01/01: Here is the link where the CSV is located to reproduce the case.
You need to add your real time series to the predicted one like in the code below
pred_mod<-pred
ts_real<-pred$x
pred_mod$x<-ts(c(ts_real,pred$mean),frequency=12,start=c(2015,1))
plot(pred_mod)
here the result
I have fitted a TBATS model around my seasonal time-series data and used the forecast package to obtain predictions. My R code is:
library("forecast")
data = read.csv("data.csv")
season_info <- msts(data,seasonal.periods=c(24,168))
model <- tbats(season_info)
forecasted <- forecast.tbats(best_model,h=24,level=90)
forecasted
Now, I have a variable called 'forecasted' that outputs as such:
> forecasted
Point Forecast Lo 90 Hi 90
6.940476 5080.641 4734.760 5426.523
6.946429 5024.803 4550.111 5499.496
6.952381 4697.625 4156.516 5238.733
6.958333 4419.105 3832.765 5005.446
6.964286 4262.782 3643.528 4882.037
6.970238 4187.629 3543.062 4832.196
6.976190 4349.196 3684.444 5013.947
6.982143 4484.108 3802.574 5165.642
6.988095 4247.858 3551.955 4943.761
6.994048 3851.379 3142.831 4559.927
7.000000 3575.951 2855.962 4295.941
7.005952 3494.943 2764.438 4225.449
7.011905 3501.354 2760.968 4241.739
7.017857 3445.563 2695.781 4195.345
I need to gather the forecasted values from the column 'Forecast' and store it in a CSV file. I tried to read the page for the TBATS and 'forecast' method online, but they do not say how a particular column of forecasted values could be extracted, ignoring the other columns such as 'Hi' 'Lo' and 'Point'.
I'm looking for this output in my CSV:
hour,forecasted_value
0,5080.641
1,5024.803
2,4697.625
...
They are stored in three parts. You can look at the object structure with str(ret):
library(forecast)
fit <- tbats(USAccDeaths)
ret <- forecast(fit)
ret$upper # Upper interval
ret$lower # Lower interval
ret$mean # Point forecast
You can obtain the output shown by using print():
library("forecast")
data = read.csv("data.csv")
season_info <- msts(data,seasonal.periods=c(24,168))
model <- tbats(season_info)
forecasted <- forecast.tbats(best_model,h=24,level=90)
dfForec <- print(forecasted)
this will give you the data.frame, now you can pick out the columns you want, ie. dfForec[, 1] for only the point-forecast, then use write.csv(dfForec[, 1, drop = FALSE], ...) to write it to a flat file.
use mean function for getting your Point Forecast
library("forecast")
data = read.csv("data.csv")
season_info <- msts(data,seasonal.periods=c(24,168))
model <- tbats(season_info)
forecasted <- (forecast.tbats(best_model,h=24,level=90))$mean
or
forecasted$mean
I have data of the form SaleDateTime = '2015-01-02 23:00:00.000' SaleCount=4.
I'm trying to create an hourly forecast for the next 12 hours, using the code below.
I'm new to forecasting and could definitely appreciate some advice.
I'm trying to partition the data, train a model, plot the forecast with x axis of the form '2015-01-02 23:00:00.000', and test the accuracy of the model on a test time series.
I'm getting the error message below, when I try to run the accuracy as shown. Does anyone know why I'm getting the error message below?
When I run the plot as shown below it has an x axis from 0 to 400, does anyone know how to show that as something like '2015-01-02 23:00:00.000'? I would also like to narrow the plot to the last say 3 months.
My understanding is that if you don't specify a model for forecast, then it tries to fit the best model it can to the data for the forecast. Is that correct?
How do I filter for the same timeseries range with the forecast as the ts1Test that I'm trying to run accuracy on, is it something like ts(fcast2, start=2001, end = 8567) ?
Since I'm using the zoo package is the as.POSIXct step unnecessary, could I just do eventdata <- zoo(Value, order.by = SaleDateTime) instead?
library("forecast")
library("zoo")
SampleData<-SampleData
Value<-SampleData[,c("SaleDateTime","SaleCount")]
rDateTime<-as.POSIXct(SampleData$SaleDateTime, format="%Y-%m-%d %H:%M:%S")
eventdata <- zoo(Value, order.by = rDateTime)
##Partitioning data Training/Testing
ts1SampleTrain<-eventdata[1:2000,]
ts1Train<-ts(ts1SampleTrain$SaleCount, frequency=24)
ts1SampleTest<-eventdata[2001:28567,]
ts1Test<-ts(ts1SampleTest$SaleCount, frequency=24)
#Training Model
fcast2<-forecast(ts1Train,h=8567)
plot(fcast2)
accuracy(fcast2,ts1Test)
New Error:
Error in -.default(xx, ff[1:n]) : non-numeric argument to binary operator
To make your accuracy test run you should ensure that the length of your test data ts1Test and your forecasting horizon, h in fcast2<-forecast(ts1Train,h=8567) are of the same length. Now you have 26567 datapoints vs 8567.
Following your approach, the next toy example will work:
library(forecast)
library(zoo)
Value <- rnorm(1100)
rDateTime <- seq(as.POSIXct('2012-01-01 00:00:00'), along.with=Value, by='hour')
eventDate <- ts(zoo(Value, order.by=rDateTime), frequency = 24)
tsTrain <-eventDate[1:1000]
tsTest <- eventDate[1001:1100]
fcast<-forecast(tsTrain,h=100)
accuracy(fcast, tsTest)
ME RMSE MAE MPE MAPE MASE ACF1
Training set -2.821378e-04 9.932745e-01 7.990188e-01 1.003861e+02 1.007542e+02 7.230356e-01 4.638487e-02
Test set 0.02515008 1.02271839 0.86072703 99.79208174 100.14023919 NA NA
Concerning your other two questions:
Use of POSIX timestamps and zoo package. You don't need them to
use forecast. ts(Value, frequency) would suffice.
Plotting time series object with datetimes as your labels. The
following code snippet should get you started in this direction. Look for
axis function that provides the desired behavior:
par(mar=c(6,2,1,1)) # bottom, left, top, right margins
plot(tsTrain, type="l", xlab="", xaxt="n")
axis(side=1, at=seq(1,1000,100), label=format(rDateTime[seq(1,1000,100)], "%Y-%m-%d"), las=2)