Related
For the following time series data:
#1. dates of 15 day frequency:
dates = seq(as.Date("2016-09-01"), as.Date("2020-07-30"), by=15) #96 times observation
#2. water content in crops corresponding to the times given.
water <- c(0.5702722, 0.5631781, 0.5560839, 0.5555985, 0.5519783, 0.5463459,
0.5511598, 0.546652, 0.5361545, 0.530012, 0.5360571, 0.5396569,
0.5683526, 0.6031535, 0.6417821, 0.671358, 0.7015542, 0.7177007,
0.7103561, 0.7036985, 0.6958607, 0.6775161, 0.6545367, 0.6380155,
0.6113306, 0.5846186, 0.5561815, 0.5251135, 0.5085149, 0.495352,
0.485819, 0.4730029, 0.4686458, 0.4616468, 0.4613918, 0.4615532,
0.4827496, 0.5149105, 0.5447824, 0.5776764, 0.6090217, 0.6297454,
0.6399422, 0.6428941, 0.6586344, 0.6507473, 0.6290631, 0.6011123,
0.5744375, 0.5313527, 0.5008027, 0.4770338, 0.4564025, 0.4464508,
0.4309046, 0.4351668, 0.4490393, 0.4701232, 0.4911582, 0.5162941,
0.5490387, 0.5737573, 0.6031149, 0.6400073, 0.6770058, 0.7048311,
0.7255012, 0.739107, 0.7338938, 0.7265202, 0.6940718, 0.6757214,
0.6460862, 0.6163091, 0.5743775, 0.5450822, 0.5057753, 0.4715266,
0.4469859, 0.4303232, 0.4187793, 0.4119401, 0.4201316, 0.426369,
0.4419331, 0.4757525, 0.5070846, 0.5248457, 0.5607567, 0.5859825,
0.6107531, 0.6201754, 0.6356589, 0.6336177, 0.6275579, 0.6214981)
I want to fit a double-logistic function curve to the data.
I found some examples and packages that can be of help,
https://greenbrown.r-forge.r-project.org/man/FitDoubleLogElmore.html
and an example here - Indexes overlap error when using dplyr to run a function.
However, the examples given only consider annual time series.
I have tried to fit the function as:
x <- ts(water, start = c(2016,17), end = c(2020, 16), frequency = 24)
smooth.water = FitDoubleLogBeck(x, weighting = T, hessian = F, plot = T, ninit = 10)
plot(water)
plot(smooth.water$predicted)
plot(water- smooth.water$predicted)
However, this function does not seem to fit the entire time series. How can I run the function to fit the entire time series? Also, I noticed the output is different at different run, and I am not sure what makes that happen.
FitDoubleLogBeck can deal only with 1-year data, so you need analyze the data year by year. To do it just take window for 1 year then fit the data separately for each year.
As for different results at different runs the algorithm randomly chooses the initial parameters. The graph of double logistic curve is bell shaped. However you applying the algorithm to "sine"-like data but the algorithm expects to have "bell". Then it treats the water data as a cloud of points so the results are meaningless and very sensetive to intial parameter setting.
Code:
set.seed(123)
par(mfrow = c(1, 3))
# water vector taken from question above
x <- ts(water, start = c(2016,17), end = c(2020, 16), frequency = 24)
res <- sapply((2017:2019), function(year) {
x2 <- as.vector(window(x, start=c(year, 1), end=c(year, 24)))
smooth.water2 = FitDoubleLogBeck(x2, weighting = T, hessian = F, plot = T, ninit = 10)
title(main = year)
c(year = year, smooth.water2$params)
})
t(res)
Output:
year mn mx sos rsp eos rau
[1,] 2017 -0.7709318 0.17234293 16.324163 -0.6133117 6.750885 -0.7618376
[2,] 2018 -0.8900971 0.09398673 7.529345 0.6701200 17.319465 0.8277409
[3,] 2019 -4.7669470 -0.34648434 15.930455 -0.2570877 10.690043 -0.2267284
I'm trying to build a forecast to predict future values of a keyword from Google Trends data.
My data is the daily indexed search volume from Jan 1 to Jun 30, 2020 for a keyword, saved in a CSV file:
Date | Keyword
2020-01-01 | 55
2020-01-02 | 79
2020-01-03 | 29
...
2020-06-29 | 19
2020-06-30 | 32
My R code seems to work okay until it generates the forecasts.
library(forecast)
data <- read.csv("<file path>.csv", header=TRUE)
#build time series data
#start and end periods of observed data
inds <- seq(as.Date("2020-01-01"), as.Date("2020-06-30"), by = "day")
#the frequency = 7 days (i.e. week)
sts <- ts(data$Keyword, start = c(2020, as.numeric(format(inds[1], "%j"))), frequency = 7)
#generate the forecast
model.ets <- ets(sts, model = "ANA")
fc.ets <- forecast(model.ets, h = 60)
plot(fc.ets)
The problem I'm having is that the forecast simply repeats the same pattern (doesn't seem to take into account the error, trend and/or seasonality to adjust the predictions).
I think I need to adjust the forecast() function but not sure how to do it.
In this case we have a daily series spanning less than a year that appears to display a weekly seasonality. Please note as is given here: https://otexts.com/fpp2/ts-objects.html [2.1 - ts objects], the frequency given to the ts object is 52.18 which is 365.25/7, the number of weeks in a year (taking into account leap years). This seasonality rules out the use of ets models which can't handle data with frequency greater than 24, unless used in combination with STL (Seasonal and Trend decomposition using Loess). As such I would recommend exploring other models. STL + ETS(A, Ad, N) [2nd best model] point forecasts look most realistic but the range in our prediction intervals is much larger when compared against the TBATS(1, {0,0}, 0.92, {<52.18, 6>}) model [best model] please see and play around with the below:
ts_ausAirBnb <- ts(ausAirBnb$airbnb_australia_, start = min(ausAirBnb$day), frequency = 52.18)
plot(decompose(ts_ausAirBnb))
snaivefit <- snaive(ts_ausAirBnb)
snaivefcast <- forecast(snaivefit, h = 60)
aafit <- auto.arima(ts_ausAirBnb)
aafcast <- forecast(aafit, h = 60)
stlffit <- stlf(ts_ausAirBnb, h = 60)
stlfcast <- forecast(stlffit, h = 60)
stlmfit <- stlm(ts_ausAirBnb)
stlmfcast <- forecast(stlmfit, h = 60)
tbatsfit <- tbats(ts_ausAirBnb)
tbatsfcast <- forecast(tbatsfit, h = 60)
nnetfit <- nnetar(ts_ausAirBnb)
nnetfcast <- forecast(nnetfit, h = 60)
autoplot(snaivefcast)
autoplot(aafcast)
autoplot(etsfcast)
autoplot(stlfcast)
autoplot(stlffit)
autoplot(stlmfcast)
autoplot(tbatsfcast)
autoplot(nnetfcast)
I have time series with week period (7 days) across two years. I have 58 values. Start is: 2017-08-05, end: 2018-09-08. I need work with this time series in R - make predictions with SARIMA model etc. But I have problem with define period/frequency in R. When I use decompose function I get error: "time series has no or less than 2 periods". Arima function does not work properly. Detailed information are bellow. What way I can import my data for use in R with requested frequency please?
My data (short example):
File: sessions2.csv
date count
11.11.2017 55053
18.11.2017 45256
25.11.2017 59091
2.12.2017 50030
9.12.2017 41769
16.12.2017 63042
23.12.2017 51838
30.12.2017 47652
6.1.2018 18731
13.1.2018 54470
20.1.2018 22514
27.1.2018 63818
3.2.2018 51605
10.2.2018 26312
17.2.2018 11111
data1.csv contains only values. For example:
53053
45256
59091
50045
41769
65042
51838
I tried in R:
sessions1 <- scan("data1.csv")
sessionsTS <- ts(sessions1, frequency=52, start=decimal_date(ymd("2017-11-11")))
Output sessionsTS and errors:
> sessionsTS
Time Series:
Start = 2017.59178082192
End = 2018.68418328598
Frequency = 52
What time format represent these numbers (Start, End) please? And what way I can use for convert to decimal date?
> sessionsComponents <- decompose(sessionsTS)
Error in decompose(sessionsTS) :
time series has no or less than 2 periods
> arima(sessionsTS, order = c(0, 1, 0), seasonal = list(order = c(2, 0, 0), period = 52), xreg = NULL, include.mean = TRUE)
Error in optim(init[mask], armaCSS, method = optim.method, hessian = FALSE, :
initial value in 'vmmin' is not finite
> fit <- Arima(sessionsTS, order = c(0, 1, 0), seasonal = list(order = c(2, 0, 0), period = 52))
Error in optim(init[mask], armaCSS, method = optim.method, hessian = FALSE, :
initial value in 'vmmin' is not finite
> sarima(sessionsTS,1,1,0,2,0,0,52)
Error in sarima(sessionsTS, 1, 1, 0, 2, 0, 0, 52) :
unused arguments (0, 0, 52)
Next I tried:
dataSeries <- read.table("sessions2.csv", header=TRUE, sep = ";", row.names=1)
dataTS <- as.xts(dataSeries , frequency=52, start=decimal_date(ymd("2017-11-11")))
> sessionsComponents2 <- decompose(dataTS)
Error in decompose(dataTS) : time series has no or less than 2 periods
> model = Arima(dataTS, order=c(0,1,0), seasonal = c(2,0,0))
> model
Series: dataTS
ARIMA(0,1,0)
In this case Arima is used without seasonality...
Many thanks for help.
Your data is sampled weekly, so if the period also is one week you need to set frequency=1, but at that point there is no point in doing seasonal modeling. It makes sense to have a yearly period, as you have done by setting frequency=52, but then you don't have enough periods for doing any estimations, you'd need at least 104 observations (at least two periods, as the error message explains) for that.
So in short, you can't do what you want to do do unless you get more data.
A partial answer for your questions about ts() and the time format. If you do it like this:
tt <- read.table(text="
date count
11.11.2017 55053
18.11.2017 45256
25.11.2017 59091
2.12.2017 50030
9.12.2017 41769
16.12.2017 63042
23.12.2017 51838
30.12.2017 47652
6.1.2018 18731
13.1.2018 54470
20.1.2018 22514
27.1.2018 63818
3.2.2018 51605
10.2.2018 26312
17.2.2018 11111", header=TRUE)
tt$date <- as.Date(tt$date, format="%d.%m.%Y")
ts(tt$count, frequency=52, start=c(2017, 45))
# Time Series:
# Start = c(2017, 45)
# End = c(2018, 7)
# Frequency = 52
# [1] 55053 45256 59091 50030 41769 63042 51838 47652 18731
# 54470 22514 63818 51605 26312 11111
The start is at the 45'th week of 2017, and the end is at the 7'th week of 2018.
You can find the weeknumbers using format(tt$date, "%W"). Look at ?strptime for more details and to see what %W means.
Hi I accessed the datasets from the UCI repository http://archive.ics.uci.edu/ml/datasets/Air+Quality
I am trying to predict the next 24hrs temperature.Below is the code which I have written
filling the missing values by NA
library(plyr)
AirQualityUCI[AirQualityUCI==-200.0]<-NA
Replacing the NA by mean of each columns
for(i in 1:ncol(AirQualityUCI)){
AirQualityUCI[is.na(AirQualityUCI[,i]),i] <- mean(AirQualityUCI[,i], na.rm = TRUE)
}
plot time series
plot(AirQualityUCI$T, type = "l")
How do I set the frequency in hours and predict the temperature of next 24hrs ?
Tempts <- ts(AirQualityUCI)
Temprforecasts <- HoltWinters(Tempts, beta=FALSE, gamma=FALSE)
library(forecast)
accuracy(Temprforecasts,24)
Getting the below error
Error in attr(x, "tsp") <- value :
invalid time series parameters specified
library(readxl)
AirQualityUCI <- read_excel("AirQualityUCI.xlsx")
library(plyr)
AirQualityUCI[AirQualityUCI==-200.0]<-NA
#First, limit to the one column you are interested in (make sure data is sorted by time variable before doing this)
library(data.table)
temp <- setDT(AirQualityUCI)[,c("T")]
#Replace NA with mean
temp$T <- ifelse(is.na(temp$T), mean(temp$T, na.rm=TRUE), temp$T)
#Create time series object...in this case freq = 365 * 24 (hours in year)
Tempts <- ts(temp, frequency = 365*24)
#Model
Temprforecasts <- HoltWinters(Tempts, beta = FALSE, gamma = FALSE)
#Generate next 24 hours forecast
library(forecast)
output.forecast <- forecast.HoltWinters(Temprforecasts, h = 24)
I just started with R and time series forecasting.
I am doing forecasting for 1 variable (consumption) and one exogenous variable (income). This is quarterly data.
When I ran the model with R code,
#train_exp <- exp_trial[,1][1:150]
#train_inc <- exp_trial[,2][1:150]
model_train_exp <- arima(train_exp,order = c(0,2,6),seasonal = list(order=c(0,1,1),period = 4), xreg = train_inc)
this model has no errors. but, when I forecast it, i get an error xreg' and 'newxreg' have different numbers of columns
forcasted_arima <- forecast.Arima(model_train_exp, h=14)
there are so many arguments for forecast.arima. I am not familiar with those.
Can someone please tell me what should be the code for it?
The model used train_inc to make the model. It needs more train_inc values in order to finish the prediction. Think of it this way, you built the model in the form train_exp_t0 = b1 + b2*train_exp_t-1 + b3*train_inc_t0. With that model in hand, if someone provides a value for train_exp_t-1 (which is yesterday's consumption) and one for train_inc_t0 (today's income value) the model will return a train_exp_t0 (today's consumption). You need to provide it with some train_inc values to get a y out.
Example
train_exp = rnorm(20)
train_inc = 1 + rnorm(20)
fit <- arima(train_exp, xreg=train_inc)
predict(fit, h=14)
# Error in predict.Arima(fit, h = 14) :
# 'xreg' and 'newxreg' have different numbers of columns
We get the same error that you got. But when we supply new values for train_inc it works!
new_train_inc <- rnorm(14)
predict(fit, newxreg=new_train_inc)
# $pred
# Time Series:
# Start = 21
# End = 34
# Frequency = 1
# [1] -0.2444872 -0.1583624 -0.2042488 -0.2143231 -0.1992276 -0.2047153 -0.2431517 -0.1887002 -0.2480745 -0.2118920
# [11] -0.1281492 -0.2067001 -0.2202669 -0.2166019
#
# $se
# Time Series:
# Start = 21
# End = 21
# Frequency = 1
# [1] 1.153433
If it still doesn't make sense, remember that you are predicting train_exp, not train_inc.
If you would like a more formal discussion see here at Cross Validated