How to Create a R TimeSeries for Hourly data - r

I have hourly snapshot of an event starting from 2012-05-15-0700 to 2013-05-17-1800. How can I create a Timeseries on this data and perform HoltWinters to it?
I tried the following
EventData<-ts(Eventmatrix$X20030,start=c(2012,5,15),frequency=8000)
HoltWinters(EventData)
But I got Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) : time series has no or less than 2 periods
What value should I put from Frequency?

I think you should consider using ets from the package forecast to perform exponential smoothing. Read this post to have a comparison between HoltWinters and ets .
require(xts)
require(forecast)
time_index <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by = "hour")
set.seed(1)
value <- rnorm(n = length(time_index))
eventdata <- xts(value, order.by = time_index)
ets(eventdata)
Now if you want to know more about the syntax of ets check the help of this function and the online book of Rob Hyndman (Chap 7 section 6)

Please take a look at the following post which might answer the question:
Decompose xts hourly time series
Its explains how you can create a xts object using POSIXct objects. This xts object can have its frequency attribute set manually and you will probably then be able to use HoltWinters

Related

Forecast() function in R: how it works?

I have a doubt related to the forecast () function from the package Forecast.
I am using this function for forecasting the closing price of a stock given an ARIMAX model (with xreg). The doubt is: when it is forecasting, the closing price at time t depends on the external regressors at time t-1 or it (closing price) depends on the external regressors at time t?
In other words, today I still don't know the high price (i.e.) so the closing price of today cannot depend on the high price of today, but on the one of yesterday.
This function works like that or in a different way?
I hope I have been clear. Thanks!
you can setup the function to work like this yes! Though there are some steps to take:
lag the regressor as you want yesterdays value to explain todays
clean values without regressor (first value of timeseries got no regressor as it will be used for the second value of the ts)
build the regressor for prediction
model and predict
Below I wrangled something together from a few links that shows how it can be done and thus should explain how prediction with regressor in your case works with forecast:
library(quantmod)
library(forecast)
library(dplyr)
# get some finance data to play with
quantmod::getSymbols("AAPL", from = '2017-01-01',
to = "2018-03-01",warnings = FALSE,
auto.assign = TRUE)
# I prefer working with df and then convert to ts objects later
new_AAPL <- as.data.frame(AAPL)%>%
# select close values and lag high values
dplyr::transmute(AAPL.Close,
AAPL.High = lag(AAPL.High)) %>%
# keep only complete values
dplyr::filter(across(everything(), ~!is.na(.x)))
# set up new time series, regressor (watch the starting points)
AAPL.Close <- ts(new_AAPL$AAPL.Close, start = as.Date("2017-01-04"), frequency = 365)
AAPL.High <- ts(new_AAPL$AAPL.High, start = as.Date("2017-01-04"), frequency = 365)
# set up the future regressor (last value of original high values
AAPL.futureg <- ts(as.data.frame(AAPL)$AAPL.High[291], start = as.Date("2018-03-02"), frequency = 365)
# I will use a arima model here
modArima <- forecast::auto.arima(AAPL.Close, xreg=AAPL.High)
# forecast with regressor
forecast::forecast(modArima, h = 1, xreg = AAPL.futureg)
Here is where I got the infos from:
https://www.codingfinance.com/post/2018-03-27-download-price/
https://stats.stackexchange.com/questions/41070/how-to-setup-xreg-argument-in-auto-arima-in-r

Imputed predictions for missing time-series data nearly stationary (flat line)

I have player over time data that is missing player counts over several years. I'm trying to fill in/predict the missing player count data over different intervals.
Data available here: https://1drv.ms/u/s!AvEZ_QPY7OZuhJAlKJN89rH185SUhA
I'm following the instructions below that use KalmanRun to impute the missing values. I've tried 3 different approaches to transforming the data- using an xts object, and 2 approaches to converting it into time series data
https://stats.stackexchange.com/questions/104565/how-to-use-auto-arima-to-impute-missing-values
require(forecast)
library(xts)
library(anytime)
library(DescTools)
df_temp = read.csv("r_share.csv")
df_temp[['DateTime']] <- as.Date(strptime(df_temp[['DateTime']], format='%Y-%m-%d %H:%M:%S'))
3 approaches to convert data; xts seems to work best by returning non-zero data that is interpretable.
#Convert df_temp to TimeSeries object
df_temp = xts(df_temp$Players, df_temp$DateTime)
#df_temp = as.ts(log(df_temp$Players), start = start(df_temp$DateTime), end = end(df_temp$DateTime), frequency = 365)
#df_temp = ts(df_temp$Players, start = c(2013, 02, 02), end = c(2016, 01, 31), frequency = 365)
Fitting and plotting:
fit <- auto.arima(df_temp, seasonal = TRUE)
id.na <- which(is.na(df_temp))
kr <- KalmanRun(index(df_temp), fit$model, update = FALSE)
#?KalmanRun$tol
for (i in id.na)
df_temp[i] <- fit$model$Z %*% kr$states[i,]
plot(df_temp)
The expected output is data that mimics the variability seen in the actual data and is different for each interval, whereas the actual output is relatively stationary and unchanging (both intervals have nearly the same prediction).
It needs to be with model arima()?.
Maybe you could try with another model, developed by Facebook named Prophet.
Here you can find the guide and github page.
If I understood you want something like this:
# Import library
library(prophet)
# Read data
df = read.csv("C:/Users/Downloads/r_share.csv",sep = ";")
# Transform to date
df["DateTime"] = as.Date(df$DateTime,format = "%d/%m/%Y")
# Change names for the model
colnames(df) = c("ds","y")
# call model
m = prophet(df)
# make "future" just one day greater than past
future = make_future_dataframe(m,periods = 1)
# predict the points
forecast = predict(m,future)
# plot results
plot(m,forecast)

Frequency in xts vs ts for auto.arima

Q: What is the right way to set the frequency in an xts object given a set of dates? Ideally, auto.arima() called on this xts object would yield the same results as when called on an analogous ts object.
Detail: I was surprised to find different results from an auto.arima() fit based on whether I passed a ts or xts object. I found the difference had to do with the frequency (which, in the case of xts, was being reset to 1 despite my setting it to 12 in the construction). Below, setting up sim_ts_12 and estimating the intended model was relatively straightforward. But in my initial attempts at working with xts (sim_xts and sim_xts_not) I estimated the wrong model. I finally estimated the right model using xts (sim_xts_12, sim_ts2xts), but both of those approaches seem wrong in some way. I'd expect working with xts to be simpler than ts. But that doesn't seem to be the case here. Am I missing something?
sim <- scan(file="./sim.dat")
sim_ts_12 <- ts(sim, start=c(2016,1), frequency=12)
sim_ts2xts_12 <- as.xts(sim_ts_12)
sim_xts <- xts(x=sim, order.by=seq.Date(from=as.Date("2016-01-01"), by="month", length.out = length(sim)))
sim_xts_12_not <- xts(x=sim, order.by=seq.Date(from=as.Date("2016-01-01"), by="month", length.out = length(sim)), frequency=12)
sim_xts_12 <- sim_xts
attr(sim_xts_12, 'frequency') <- 12
auto.arima(sim_ts_12) # ARIMA(0,1,1)(0,1,0)[12]
auto.arima(sim_ts2xts_12) # ARIMA(0,1,1)(0,1,0)[12]
auto.arima(sim_xts) # ARIMA(0,1,1) with drift
auto.arima(sim_xts_12_not) # ARIMA(0,1,1) with drift
auto.arima(sim_xts_12) # ARIMA(0,1,1)(0,1,0)[12]
txt <- "0.04767597 0.07217235 0.03954613 0.03698637 0.04283896
0.03534811 0.04198519 0.04129214 0.04576022 0.03966146
0.03656881 0.04396736 0.04459328 0.07062732 0.03477407
0.0340033 0.039136 0.0347761 0.03819997 0.03634627
0.03966617 0.03455635 0.03009606 0.03927688 0.03959629
0.06554147 0.02908742 0.02619443 0.03179742 0.02468108
0.02612955 0.02300656 0.02988827 0.01878513 0.01399028
0.02601922 0.0250159 0.05610426 0.01537538 0.01231939
0.01330564 0.008744173 0.01296571 0.005741129 0.01674992
0.003210812 -0.007936987 0.01018758"
sim.dat <- scan(text=txt, what=numeric() )
UPDATE, NOT A DUPLICATE: The possible duplicate question/answer does not address the best practice method for handling frequency in an xts. The question does not ask for it, nor does the answer address it. The answer handles ts.

ets: Error in ets(timeseries, model = "MAM") : Nonseasonal data

I'm trying to create a forecast using an exponential smoothing method, but get the error "nonseasonal data". This is clearly not true - see code below.
Why am I getting this error? Should I use a different function (it should be able to perform simple, double, damped trend, seasonal, Winters method)?
library(forecast)
timelen<-48 # use 48 months
dates<-seq(from=as.Date("2008/1/1"), by="month", length.out=timelen)
# create seasonal data
time<-seq(1,timelen)
season<-sin(2*pi*time/12)
constant<-40
noise<-rnorm(timelen,mean=0,sd=0.1)
trend<-time*0.01
values<-constant+season+trend+noise
# create time series object
timeseries<-as.ts(x=values,start=min(dates),end=max(dates),frequency=1)
plot(timeseries)
# forecast MAM
ets<-ets(timeseries,model="MAM") # ANN works, why MAM not?
ets.forecast<-forecast(ets,h=24,level=0.9)
plot(ets.forecast)
Thanks&kind regards
You should use ts simply to create a time series from a numeric vector. See the help file for more details.
Your start and end values aren't correctly specified.
And setting the frequency at 1 is not a valid seasonality, it's the same as no seasonality at all.
Try:
timeseries <- ts(data=values, frequency=12)
ets <- ets(timeseries, model="MAM")
print(ets)
#### ETS(M,A,M)
#### Call:
#### ets(y = timeseries, model = "MAM")
#### ...
The question in your comments, why ANN works is because the third N means no seasonnality, so the model can be computed even with a non-seasonal timeseries.

Error when doing linear regression using zoo objects ... Error in `$<-.zoo`(`*tmp*`

I am new to R and slowly getting acquainted. My question refers to the following piece of code.
I am creating a zoo object with the following headers and then filtering by date. On the filtered dates I am subtracting two columns (Tom from Elena). Everything works fine until here.
Code below:
b <- read.zoo(b1, header = TRUE, index.column = 1, format = "%d/%m/%Y")
startDate = "2013/11/02"
endDate = "2013/12/20"
dates <- seq(as.Date(startDate), as.Date(endDate), by=1)
TE = b[dates]$Tom - b[dates]$Elena
However I am then regressing the results from my subtraction (see above TE) on Elena. However i get an error message every time i try and to this regression
TE$model <- lm(TE ~ b[dates]$Elena)
Error in $<-.zoo(*tmp*, "model", value = list(coefficients = c(-0.0597128230859905, :
not possible for univariate zoo series
I have tried creating a data frame and then doing the regression but with no avail. Any help would be appreciated. Thanks.
You can not add the outcome of a regression (a list of class lm) to a time series of class zoo.
I recommend saving the model in a separate object, e.g.,
fit <- lm(TE ~ b[dates]$Elena)

Resources