I have time series with week period (7 days) across two years. I have 58 values. Start is: 2017-08-05, end: 2018-09-08. I need work with this time series in R - make predictions with SARIMA model etc. But I have problem with define period/frequency in R. When I use decompose function I get error: "time series has no or less than 2 periods". Arima function does not work properly. Detailed information are bellow. What way I can import my data for use in R with requested frequency please?
My data (short example):
File: sessions2.csv
date count
11.11.2017 55053
18.11.2017 45256
25.11.2017 59091
2.12.2017 50030
9.12.2017 41769
16.12.2017 63042
23.12.2017 51838
30.12.2017 47652
6.1.2018 18731
13.1.2018 54470
20.1.2018 22514
27.1.2018 63818
3.2.2018 51605
10.2.2018 26312
17.2.2018 11111
data1.csv contains only values. For example:
53053
45256
59091
50045
41769
65042
51838
I tried in R:
sessions1 <- scan("data1.csv")
sessionsTS <- ts(sessions1, frequency=52, start=decimal_date(ymd("2017-11-11")))
Output sessionsTS and errors:
> sessionsTS
Time Series:
Start = 2017.59178082192
End = 2018.68418328598
Frequency = 52
What time format represent these numbers (Start, End) please? And what way I can use for convert to decimal date?
> sessionsComponents <- decompose(sessionsTS)
Error in decompose(sessionsTS) :
time series has no or less than 2 periods
> arima(sessionsTS, order = c(0, 1, 0), seasonal = list(order = c(2, 0, 0), period = 52), xreg = NULL, include.mean = TRUE)
Error in optim(init[mask], armaCSS, method = optim.method, hessian = FALSE, :
initial value in 'vmmin' is not finite
> fit <- Arima(sessionsTS, order = c(0, 1, 0), seasonal = list(order = c(2, 0, 0), period = 52))
Error in optim(init[mask], armaCSS, method = optim.method, hessian = FALSE, :
initial value in 'vmmin' is not finite
> sarima(sessionsTS,1,1,0,2,0,0,52)
Error in sarima(sessionsTS, 1, 1, 0, 2, 0, 0, 52) :
unused arguments (0, 0, 52)
Next I tried:
dataSeries <- read.table("sessions2.csv", header=TRUE, sep = ";", row.names=1)
dataTS <- as.xts(dataSeries , frequency=52, start=decimal_date(ymd("2017-11-11")))
> sessionsComponents2 <- decompose(dataTS)
Error in decompose(dataTS) : time series has no or less than 2 periods
> model = Arima(dataTS, order=c(0,1,0), seasonal = c(2,0,0))
> model
Series: dataTS
ARIMA(0,1,0)
In this case Arima is used without seasonality...
Many thanks for help.
Your data is sampled weekly, so if the period also is one week you need to set frequency=1, but at that point there is no point in doing seasonal modeling. It makes sense to have a yearly period, as you have done by setting frequency=52, but then you don't have enough periods for doing any estimations, you'd need at least 104 observations (at least two periods, as the error message explains) for that.
So in short, you can't do what you want to do do unless you get more data.
A partial answer for your questions about ts() and the time format. If you do it like this:
tt <- read.table(text="
date count
11.11.2017 55053
18.11.2017 45256
25.11.2017 59091
2.12.2017 50030
9.12.2017 41769
16.12.2017 63042
23.12.2017 51838
30.12.2017 47652
6.1.2018 18731
13.1.2018 54470
20.1.2018 22514
27.1.2018 63818
3.2.2018 51605
10.2.2018 26312
17.2.2018 11111", header=TRUE)
tt$date <- as.Date(tt$date, format="%d.%m.%Y")
ts(tt$count, frequency=52, start=c(2017, 45))
# Time Series:
# Start = c(2017, 45)
# End = c(2018, 7)
# Frequency = 52
# [1] 55053 45256 59091 50030 41769 63042 51838 47652 18731
# 54470 22514 63818 51605 26312 11111
The start is at the 45'th week of 2017, and the end is at the 7'th week of 2018.
You can find the weeknumbers using format(tt$date, "%W"). Look at ?strptime for more details and to see what %W means.
Related
For the following time series data:
#1. dates of 15 day frequency:
dates = seq(as.Date("2016-09-01"), as.Date("2020-07-30"), by=15) #96 times observation
#2. water content in crops corresponding to the times given.
water <- c(0.5702722, 0.5631781, 0.5560839, 0.5555985, 0.5519783, 0.5463459,
0.5511598, 0.546652, 0.5361545, 0.530012, 0.5360571, 0.5396569,
0.5683526, 0.6031535, 0.6417821, 0.671358, 0.7015542, 0.7177007,
0.7103561, 0.7036985, 0.6958607, 0.6775161, 0.6545367, 0.6380155,
0.6113306, 0.5846186, 0.5561815, 0.5251135, 0.5085149, 0.495352,
0.485819, 0.4730029, 0.4686458, 0.4616468, 0.4613918, 0.4615532,
0.4827496, 0.5149105, 0.5447824, 0.5776764, 0.6090217, 0.6297454,
0.6399422, 0.6428941, 0.6586344, 0.6507473, 0.6290631, 0.6011123,
0.5744375, 0.5313527, 0.5008027, 0.4770338, 0.4564025, 0.4464508,
0.4309046, 0.4351668, 0.4490393, 0.4701232, 0.4911582, 0.5162941,
0.5490387, 0.5737573, 0.6031149, 0.6400073, 0.6770058, 0.7048311,
0.7255012, 0.739107, 0.7338938, 0.7265202, 0.6940718, 0.6757214,
0.6460862, 0.6163091, 0.5743775, 0.5450822, 0.5057753, 0.4715266,
0.4469859, 0.4303232, 0.4187793, 0.4119401, 0.4201316, 0.426369,
0.4419331, 0.4757525, 0.5070846, 0.5248457, 0.5607567, 0.5859825,
0.6107531, 0.6201754, 0.6356589, 0.6336177, 0.6275579, 0.6214981)
I want to fit a double-logistic function curve to the data.
I found some examples and packages that can be of help,
https://greenbrown.r-forge.r-project.org/man/FitDoubleLogElmore.html
and an example here - Indexes overlap error when using dplyr to run a function.
However, the examples given only consider annual time series.
I have tried to fit the function as:
x <- ts(water, start = c(2016,17), end = c(2020, 16), frequency = 24)
smooth.water = FitDoubleLogBeck(x, weighting = T, hessian = F, plot = T, ninit = 10)
plot(water)
plot(smooth.water$predicted)
plot(water- smooth.water$predicted)
However, this function does not seem to fit the entire time series. How can I run the function to fit the entire time series? Also, I noticed the output is different at different run, and I am not sure what makes that happen.
FitDoubleLogBeck can deal only with 1-year data, so you need analyze the data year by year. To do it just take window for 1 year then fit the data separately for each year.
As for different results at different runs the algorithm randomly chooses the initial parameters. The graph of double logistic curve is bell shaped. However you applying the algorithm to "sine"-like data but the algorithm expects to have "bell". Then it treats the water data as a cloud of points so the results are meaningless and very sensetive to intial parameter setting.
Code:
set.seed(123)
par(mfrow = c(1, 3))
# water vector taken from question above
x <- ts(water, start = c(2016,17), end = c(2020, 16), frequency = 24)
res <- sapply((2017:2019), function(year) {
x2 <- as.vector(window(x, start=c(year, 1), end=c(year, 24)))
smooth.water2 = FitDoubleLogBeck(x2, weighting = T, hessian = F, plot = T, ninit = 10)
title(main = year)
c(year = year, smooth.water2$params)
})
t(res)
Output:
year mn mx sos rsp eos rau
[1,] 2017 -0.7709318 0.17234293 16.324163 -0.6133117 6.750885 -0.7618376
[2,] 2018 -0.8900971 0.09398673 7.529345 0.6701200 17.319465 0.8277409
[3,] 2019 -4.7669470 -0.34648434 15.930455 -0.2570877 10.690043 -0.2267284
I'm trying to build a forecast to predict future values of a keyword from Google Trends data.
My data is the daily indexed search volume from Jan 1 to Jun 30, 2020 for a keyword, saved in a CSV file:
Date | Keyword
2020-01-01 | 55
2020-01-02 | 79
2020-01-03 | 29
...
2020-06-29 | 19
2020-06-30 | 32
My R code seems to work okay until it generates the forecasts.
library(forecast)
data <- read.csv("<file path>.csv", header=TRUE)
#build time series data
#start and end periods of observed data
inds <- seq(as.Date("2020-01-01"), as.Date("2020-06-30"), by = "day")
#the frequency = 7 days (i.e. week)
sts <- ts(data$Keyword, start = c(2020, as.numeric(format(inds[1], "%j"))), frequency = 7)
#generate the forecast
model.ets <- ets(sts, model = "ANA")
fc.ets <- forecast(model.ets, h = 60)
plot(fc.ets)
The problem I'm having is that the forecast simply repeats the same pattern (doesn't seem to take into account the error, trend and/or seasonality to adjust the predictions).
I think I need to adjust the forecast() function but not sure how to do it.
In this case we have a daily series spanning less than a year that appears to display a weekly seasonality. Please note as is given here: https://otexts.com/fpp2/ts-objects.html [2.1 - ts objects], the frequency given to the ts object is 52.18 which is 365.25/7, the number of weeks in a year (taking into account leap years). This seasonality rules out the use of ets models which can't handle data with frequency greater than 24, unless used in combination with STL (Seasonal and Trend decomposition using Loess). As such I would recommend exploring other models. STL + ETS(A, Ad, N) [2nd best model] point forecasts look most realistic but the range in our prediction intervals is much larger when compared against the TBATS(1, {0,0}, 0.92, {<52.18, 6>}) model [best model] please see and play around with the below:
ts_ausAirBnb <- ts(ausAirBnb$airbnb_australia_, start = min(ausAirBnb$day), frequency = 52.18)
plot(decompose(ts_ausAirBnb))
snaivefit <- snaive(ts_ausAirBnb)
snaivefcast <- forecast(snaivefit, h = 60)
aafit <- auto.arima(ts_ausAirBnb)
aafcast <- forecast(aafit, h = 60)
stlffit <- stlf(ts_ausAirBnb, h = 60)
stlfcast <- forecast(stlffit, h = 60)
stlmfit <- stlm(ts_ausAirBnb)
stlmfcast <- forecast(stlmfit, h = 60)
tbatsfit <- tbats(ts_ausAirBnb)
tbatsfcast <- forecast(tbatsfit, h = 60)
nnetfit <- nnetar(ts_ausAirBnb)
nnetfcast <- forecast(nnetfit, h = 60)
autoplot(snaivefcast)
autoplot(aafcast)
autoplot(etsfcast)
autoplot(stlfcast)
autoplot(stlffit)
autoplot(stlmfcast)
autoplot(tbatsfcast)
autoplot(nnetfcast)
I am having some forecast::Arima-syntax issues. If I know that a seasonal ARIMA is statistically ok because it is the result of auto.arima, how can I fix the following Arima-function to have the same order as the auto.arima result:
library(forecast)
set.seed(1)
y <- sin((1:40)) * 10 + 20 + rnorm(40, 0, 2)
my_ts <- ts(y, start = c(2000, 1), freq = 12)
fit_auto <- auto.arima(my_ts, max.order = 2)
plot(forecast(fit_auto, h = 24))
# Arima(0,0,1)(1,0,0) with non-zero mean
fit_arima <- Arima(my_ts,
order = c(0, 0, 1),
seasonal = list(c(1, 0, 0)))
#Error in if ((order[2] + seasonal$order[2]) > 1 & include.drift) { :
# argument is of length zero
Thx & kind regards
The argument to seasonal must be either a numeric vector giving the seasonal order, or a list with two named elements: order, the numeric vector giving the seasonal order, and period, an integer giving the seasonal periodicity.
You gave a list with only the seasonal order, so Arima is complaining it couldn't find the period value. If you give a numeric vector, period will default to frequency(my_ts) like it says in the function's documentation. While it does make sense that giving just the order as a numeric or as a list should have the same result, it doesn't. Just a quirk of this function.
A rewrite of your call that works:
fit_arima <- Arima(my_ts,
order = c(0, 0, 1),
seasonal = c(1, 0, 0)) # vector, not a list
I have a seasonal (7 days interval) time series, daily data for 30 days.
What is the best approach for a reasonable forecast?
The time series contains orders made with a app, it shows a seasonality of 1 week (lower sales at the beginning of the week).
I try the holt winters approach with this code:
(m <- HoltWinters(ts,seasonal = "mult"))
plot(m)
plot(fitted(m))
but it gives me an error like: Error in decompose(ts(x[1L:wind], start = start(x), frequency = f),seasonal) :
time series has no or less than 2 periods
What do you suggest?
EDIT:
data here
You must first determine a ts object. Assuming your data is called df:
ts <- ts(df$install, frequency = 7)
(m <- HoltWinters(ts,seasonal = "mult"))
plot(m)
plot(fitted(m))
Then you can make prediction like (10 steps-ahead):
predict(m, n = 10)
Time Series:
Start = c(4, 5)
End = c(5, 7)
Frequency = 7
fit
[1,] 1028.8874
[2,] 1178.4244
[3,] 1372.5466
[4,] 1165.2337
[5,] 866.6185
[6,] 711.6965
[7,] 482.2550
[8,] 719.0593
[9,] 807.6147
[10,] 920.3250
The question about the best method is too difficult to answer. Usually one compares the performance of different models considering their out-of-sample accuracy and chooses the one whith the best result.
You can use df$data to keep the dates that correspond to each day in the ts series.
ts_series <- ts(df$install, frequency = 7)
ts_dates <- as.Date(df$data, format = "%d/%m/%Y")
In a similar way, dates for the forecasted values can be kept in another sequence
m <- HoltWinters(ts_series, seasonal = "mult")
predict_values <- predict(m, 10)
predict_dates <- seq.Date(tail(ts_dates, 1) + 1, length.out = 10, by = "day")
With the dates sequence, the daily series can be plot with dates in x axis with the right format. More control on the x axis ticks can be obtained with the axis.Date function
plot(ts_dates, ts_series, typ = "o"
, ylim = c(0, 4000)
, xlim = c(ts_dates[1], tail(predict_dates, 1))
, xlab = "Date", ylab = "install", las = 1)
lines(predict_dates, predict_values, lty = 2, col = "blue", lwd = 2)
grid()
In the following example, I am trying to use Holt-Winters smoothing on daily data, but I run into a couple of issues:
# generate some dummy daily data
mData = cbind(seq.Date(from = as.Date('2011-12-01'),
to = as.Date('2013-11-30'), by = 'day'), rnorm(731))
# convert to a zoo object
zooData = as.zoo(mData[, 2, drop = FALSE],
order.by = as.Date(mData[, 1, drop = FALSE], format = '%Y-%m-%d'),
frequency = 7)
# attempt Holt-Winters smoothing
hw(x = zooData, h = 10, seasonal = 'additive', damped = FALSE,
initial = 'optimal', exponential = FALSE, fan = FALSE)
# no missing values in the data
sum(is.na(zooData))
This leads to the following error:
Error in ets(x, "AAA", alpha = alpha, beta = beta, gamma = gamma,
damped = damped, : You've got to be joking. I need more data! In
addition: Warning message: In ets(x, "AAA", alpha = alpha, beta =
beta, gamma = gamma, damped = damped, : Missing values encountered.
Using longest contiguous portion of time series
Emphasis mine.
Couple of questions:
1. Where are the missing values coming from?
2. I am assuming that the "need more data" arises from attempting to estimate 365 seasonal parameters?
Update 1:
Based on Gabor's suggestion, I have recreated a fractional index for the data where whole numbers are weeks.
I have a couple of questions.
1. Is this is an appropriate way of handling daily data when the periodicity is assumed to be weekly?
2. Is there is a more elegant way of handling the dates when working with daily data?
library(zoo)
library(forecast)
# generate some dummy daily data
mData = cbind(seq.Date(from = as.Date('2011-12-01'),
to = as.Date('2013-11-30'), by = 'day'), rnorm(731))
# conver to a zoo object with weekly frequency
zooDataWeekly = as.zoo(mData[, 2, drop = FALSE],
order.by = seq(from = 0, by = 1/7, length.out = 731))
# attempt Holt-Winters smoothing
hwData = hw(x = zooDataWeekly, h = 10, seasonal = 'additive', damped = FALSE,
initial = 'optimal', exponential = FALSE, fan = FALSE)
plot(zooDataWeekly, col = 'red')
lines(fitted(hwData))
hw requires a ts object not a zoo object. Use
zooDataWeekly <- ts(mData[,2], frequency=7)
Unless there is a good reason for specifying the model exactly, it is usually better to let R select the best model for you:
fit <- ets(zooDataWeekly)
fc <- forecast(fit)
plot(fc)