In R, what is the difference between class ts and class timeSeries? I think I am getting a problem in HoltWinters because of that. I'm getting:
data(LakeHuron)
x <- LakeHuron
before <- window(x, end=1935)
after <- window(x, start=1935)
a <- .2
b <- 0
g <- 0
model <- HoltWinters(before, alpha=a, beta=b, gamma=g)
"Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) :
time series has no or less than 2 periods"
even though gamma=0.
Running R 2.11.1 (win32 x86) on a Windows 7 x64 machine.
ts comes from the stats package included with base R. It is useful for regular time series such as monthly, quarterly, annual, ... series common in goverment statistics. ts is used by arima() and other time series methods provided by base R and its stats packages. HoltWinters which you used here is one such example.
timeSeries is one of many add-on time series classes; this one comes from Rmetrics. Several CRAN Task Views discuss these more: TimeSeries, Econometrics as well as Finance.
Try the documentation on ts and/or HoltWinters to come to grips with the required format. ts uses either a fixed delta (eg 1/12 for monthly data) or frequency.
I've found the problem, studying the HoltWinters source code.
It turns out that the HoltWinters function, (for gamma=0, and if there is no seasonal component), expects gamma to be logical!! (zero = FALSE)
So, entering gamma as.logical(0) solves the bug.
Joris: thank you for the answer, that was illuminating.
It's two separate classes. ts is contained in the basic R installation, and the function HoltWinters() demands a ts time series.
timeSeries has a completely different structure. It's also specifically directed towards finances. The big difference with ts is that it allows for irregular timeseries. The class ts can only hold equispaced series.
Internally, ts has a slot "tsp" which contains the start, end and frequency of the timeseries.
> test <- ts(1:10, frequency = 4, start = c(1959, 2))
> slotNames(test)
[1] ".Data" "tsp" ".S3Class"
> slot(test,"tsp")
[1] 1959.25 1961.50 4.00
It's this slot that HoltWinters() needs but lacks in timeSeries. There the information on the times is contained in two slots, a position slot and a format slot. Together they define the times as a timeDate object.
> data = as.matrix(MSFT[, 4])
> charvec = rownames(MSFT)
> Close = timeSeries(data, charvec, units = "Close")
> slotNames(Close)
[1] ".Data" "units" "positions" "format" "FinCenter" "recordIDs" "title" "documentation"
> head(slot(Close,"positions"))
[1] 970012800 970099200 970185600 970444800 970531200 970617600
> slot(Close,"format")
[1] "%Y-%m-%d"
Related
I have a doubt related to the forecast () function from the package Forecast.
I am using this function for forecasting the closing price of a stock given an ARIMAX model (with xreg). The doubt is: when it is forecasting, the closing price at time t depends on the external regressors at time t-1 or it (closing price) depends on the external regressors at time t?
In other words, today I still don't know the high price (i.e.) so the closing price of today cannot depend on the high price of today, but on the one of yesterday.
This function works like that or in a different way?
I hope I have been clear. Thanks!
you can setup the function to work like this yes! Though there are some steps to take:
lag the regressor as you want yesterdays value to explain todays
clean values without regressor (first value of timeseries got no regressor as it will be used for the second value of the ts)
build the regressor for prediction
model and predict
Below I wrangled something together from a few links that shows how it can be done and thus should explain how prediction with regressor in your case works with forecast:
library(quantmod)
library(forecast)
library(dplyr)
# get some finance data to play with
quantmod::getSymbols("AAPL", from = '2017-01-01',
to = "2018-03-01",warnings = FALSE,
auto.assign = TRUE)
# I prefer working with df and then convert to ts objects later
new_AAPL <- as.data.frame(AAPL)%>%
# select close values and lag high values
dplyr::transmute(AAPL.Close,
AAPL.High = lag(AAPL.High)) %>%
# keep only complete values
dplyr::filter(across(everything(), ~!is.na(.x)))
# set up new time series, regressor (watch the starting points)
AAPL.Close <- ts(new_AAPL$AAPL.Close, start = as.Date("2017-01-04"), frequency = 365)
AAPL.High <- ts(new_AAPL$AAPL.High, start = as.Date("2017-01-04"), frequency = 365)
# set up the future regressor (last value of original high values
AAPL.futureg <- ts(as.data.frame(AAPL)$AAPL.High[291], start = as.Date("2018-03-02"), frequency = 365)
# I will use a arima model here
modArima <- forecast::auto.arima(AAPL.Close, xreg=AAPL.High)
# forecast with regressor
forecast::forecast(modArima, h = 1, xreg = AAPL.futureg)
Here is where I got the infos from:
https://www.codingfinance.com/post/2018-03-27-download-price/
https://stats.stackexchange.com/questions/41070/how-to-setup-xreg-argument-in-auto-arima-in-r
The complete R data and code for my question is here: https://pastebin.com/QtG6A7ZX.
I am new to R and still a beginner when it comes to time series analysis, so please forgive my ignorance.
I am attempting to model and forecast some enrollment data with 2 dummy-coded regressors. I have already used auto.arima to fit the model:
model <- auto.arima(enroll, xreg=x)
Before I forecast with this model, I am attempting to test its accuracy by selecting only a part of the time series (1:102 instead of 1:112), and likewise, a partial list of regressors.
Based on auto.arima, I fit the partial model as follows:
model_par <-arima((enroll_partial), c(1, 1, 1),seasonal = list(order = c(1, 0, 0), period = 5), xreg=x_par)
I have tried three different ways to forecast and get essentially the same error:
fcast_par <- forecast(model_par, h=10) #error
fcast_par <- forecast(model_par, h=10, xreg=x_par) #error
fcast_par <- forecast(model_par, h=10, xreg=forecast(x_par,h=10)) #error
'xreg' and 'newxreg' have different numbers of columns
I have tested using auto.arima with the partial data. That works, but gives me a different model and, although I specified 10 predictions, I get over 50:
model_par2 <- auto.arima(enroll_partial, xreg=x_par)
fcast_par <- forecast(model_par2, h=12, xreg=x_par)
fcast_par
So, my main question is, how do I specify an exact model and predict using more than 1 regressor given my data (see Paste Bin link above)?
The forecast() function is from the forecast package, and works with model functions that are from that package. This is why it is possible to produce forecasts from auto.arima() using forecast(model_par2,xreg=x_fcst).
The arima() function comes from the stats package, and so there are no guarantees that it would work with forecast(). To specify your own ARIMA model, you can use the Arima() function, which behaves very similarly to arima(), but you will be able to produce forecasts from it using forecast(model_par2,xreg=x_fcst).
You have two problems. One of them is that the various forecasting functions in R are making it (intentionally?) difficult on you.
The first problem is that you need to define the values of your regressors for the forecasting period. Extract the relevant data from x by using window():
x_fcst <- window(x,start=c(2017,4))
The second problem is that forecast() (which dispatches to forecast.Arima()) is a red herring here. You need to use predict() (which dispatches to predict.Arima() - note the capitalization in both cases!):
predict(model_par,newxreg=x_fcst,h=nrow(x_fcst))
which yields
$pred
Time Series:
Start = c(2017, 3)
End = c(2019, 1)
Frequency = 5
[1] 52.00451 52.00451 52.00451 52.00451 52.00451 52.00451 52.00451 52.00451
[9] 52.00451
$se
Time Series:
Start = c(2017, 3)
End = c(2017, 3)
Frequency = 5
[1] 17.13345
You can also use auto.arima(). Confusingly enough, this time forecast() (which still dispatches to forecast.Arima()) does work:
model_par2 <- auto.arima(enroll_partial, xreg=x_par)
forecast(model_par2,xreg=x_fcst)
which yields
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2017.40 39.91035 17.612358 62.20834 5.808514 74.01219
2017.60 59.51003 32.783451 86.23661 18.635254 100.38481
2017.80 69.81000 39.290834 100.32917 23.134962 116.48505
2018.00 57.49140 23.601444 91.38136 5.661183 109.32162
2018.20 55.45759 18.503034 92.41214 -1.059524 111.97470
2018.40 34.57866 -7.306747 76.46406 -29.479541 98.63686
2018.60 52.30199 6.702068 97.90192 -17.437074 122.04106
2018.80 61.61591 12.582055 110.64977 -13.374900 136.60672
2019.00 50.47661 -1.765945 102.71917 -29.421485 130.37471
And yes, you do get five times as many predictions. The first column is an expectation forecast, and the others give prediction intervals. These are governed by the level parameter to forecast().
Q: What is the right way to set the frequency in an xts object given a set of dates? Ideally, auto.arima() called on this xts object would yield the same results as when called on an analogous ts object.
Detail: I was surprised to find different results from an auto.arima() fit based on whether I passed a ts or xts object. I found the difference had to do with the frequency (which, in the case of xts, was being reset to 1 despite my setting it to 12 in the construction). Below, setting up sim_ts_12 and estimating the intended model was relatively straightforward. But in my initial attempts at working with xts (sim_xts and sim_xts_not) I estimated the wrong model. I finally estimated the right model using xts (sim_xts_12, sim_ts2xts), but both of those approaches seem wrong in some way. I'd expect working with xts to be simpler than ts. But that doesn't seem to be the case here. Am I missing something?
sim <- scan(file="./sim.dat")
sim_ts_12 <- ts(sim, start=c(2016,1), frequency=12)
sim_ts2xts_12 <- as.xts(sim_ts_12)
sim_xts <- xts(x=sim, order.by=seq.Date(from=as.Date("2016-01-01"), by="month", length.out = length(sim)))
sim_xts_12_not <- xts(x=sim, order.by=seq.Date(from=as.Date("2016-01-01"), by="month", length.out = length(sim)), frequency=12)
sim_xts_12 <- sim_xts
attr(sim_xts_12, 'frequency') <- 12
auto.arima(sim_ts_12) # ARIMA(0,1,1)(0,1,0)[12]
auto.arima(sim_ts2xts_12) # ARIMA(0,1,1)(0,1,0)[12]
auto.arima(sim_xts) # ARIMA(0,1,1) with drift
auto.arima(sim_xts_12_not) # ARIMA(0,1,1) with drift
auto.arima(sim_xts_12) # ARIMA(0,1,1)(0,1,0)[12]
txt <- "0.04767597 0.07217235 0.03954613 0.03698637 0.04283896
0.03534811 0.04198519 0.04129214 0.04576022 0.03966146
0.03656881 0.04396736 0.04459328 0.07062732 0.03477407
0.0340033 0.039136 0.0347761 0.03819997 0.03634627
0.03966617 0.03455635 0.03009606 0.03927688 0.03959629
0.06554147 0.02908742 0.02619443 0.03179742 0.02468108
0.02612955 0.02300656 0.02988827 0.01878513 0.01399028
0.02601922 0.0250159 0.05610426 0.01537538 0.01231939
0.01330564 0.008744173 0.01296571 0.005741129 0.01674992
0.003210812 -0.007936987 0.01018758"
sim.dat <- scan(text=txt, what=numeric() )
UPDATE, NOT A DUPLICATE: The possible duplicate question/answer does not address the best practice method for handling frequency in an xts. The question does not ask for it, nor does the answer address it. The answer handles ts.
I have hourly snapshot of an event starting from 2012-05-15-0700 to 2013-05-17-1800. How can I create a Timeseries on this data and perform HoltWinters to it?
I tried the following
EventData<-ts(Eventmatrix$X20030,start=c(2012,5,15),frequency=8000)
HoltWinters(EventData)
But I got Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) : time series has no or less than 2 periods
What value should I put from Frequency?
I think you should consider using ets from the package forecast to perform exponential smoothing. Read this post to have a comparison between HoltWinters and ets .
require(xts)
require(forecast)
time_index <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by = "hour")
set.seed(1)
value <- rnorm(n = length(time_index))
eventdata <- xts(value, order.by = time_index)
ets(eventdata)
Now if you want to know more about the syntax of ets check the help of this function and the online book of Rob Hyndman (Chap 7 section 6)
Please take a look at the following post which might answer the question:
Decompose xts hourly time series
Its explains how you can create a xts object using POSIXct objects. This xts object can have its frequency attribute set manually and you will probably then be able to use HoltWinters
I have only started playing around with time series in R so I have fallen at the first hurdle! I have a vector of daily temperature readings (with no date stamp) and I am having problems creating such an object.
data<-rnorm(3650, m=10, sd=2)
data_ts<-as.ts(data, frequency=365, start=c(1919, 1))
attributes(data_ts)
dcomp<-decompose(data_ts, type=c("additive"))
I think this code should be instructing R to make a ts object with daily measurements (frequency=365) starting at 1-1-1919. I dont understand the error message in the decompose command, I have a feeling I have not created the ts object correctly because data_ts$tsp does not look correct!
data <- rnorm(3650, m=10, sd=2)
# change is below, use ts() to create time series
data_ts <- ts(data, frequency=365, start=c(1919, 1))
attributes(data_ts)
dcomp<-decompose(data_ts, type=c("additive"))
plot(dcomp)
Produces: