I am working with time series 551 of the monthly data of the M3 competition.
So, my data is :
library(forecast)
library(Mcomp)
# Time Series
# Subset the M3 data to contain the relevant series
ts.data<- subset(M3, 12)[[551]]
print(ts.data)
I want to implement time series cross-validation for the last 18 observations of the in-sample interval.
Some people would normally call this “forecast evaluation with a rolling origin” or something similar.
How can i achieve that ? Whats means the in-sample interval ? Which is the timeseries i must evaluate?
Im quite confused , any help in order to light up this would be welcome.
The tsCV function of the forecast package is a good place to start.
From its documentation,
tsCV(y, forecastfunction, h = 1, window = NULL, xreg = NULL, initial = 0, .
..)
Let ‘y’ contain the time series y[1:T]. Then ‘forecastfunction’ is
applied successively to the time series y[1:t], for t=1,...,T-h,
making predictions f[t+h]. The errors are given by e[t+h] =
y[t+h]-f[t+h].
That is first tsCV fit a model to the y[1] and then forecast y[1 + h], next fit a model to y[1:2] and forecast y[2 + h] and so on for T-h steps.
The tsCV function returns the forecast errors.
Applying this to the training data of the ts.data
# function to fit a model and forecast
fmodel <- function(x, h){
forecast(Arima(x, order=c(1,1,1), seasonal = c(0, 0, 2)), h=h)
}
# time-series CV
cv_errs <- tsCV(ts.data$x, fmodel, h = 1)
# RMSE of the time-series CV
sqrt(mean(cv_errs^2, na.rm=TRUE))
# [1] 778.7898
In your case, it maybe that you are supposed to
fit a model to ts.data$x and then forecast ts.data$xx[1]
fit mode the c(ts.data$x, ts.data$xx[1]) and forecast(ts.data$xx[2]),
so on.
This is my first question on stack overflow.
Situation: I have 2 time series. Both series have the same values but the second series has 5 NAs at the start. Hence, first series has 105 observations, where 2nd series has 110 observations. I have fitted an ARIMA(0,1,0) using the Arima function to both series separately. And then I used the forecast package to predict 10 steps to the future.
Issue: Even though the ARIMA coefficient for both series are the same, the projections (10 steps) appear to be different. I am uncertain why this is the case. Has anyone come across this before? Any guidance is highly appreciated.
Tried: I tried setting seed, creating index manually, and using auto.ARIMA for the model fitting. However, none of the steps has helped me to reconcile the difference.
I have added a picture to show you what I see. Please note I have hidden the mid part of the series so that you can see the start and the end of the series. The yellow highlighted cells are the projection outputs from the 'Forecast' package. I have manually added the index to be years after extracting the results from R.
Time series projected and base in excel
Rates <- read.csv("Rates_for_ARIMA.csv")
set.seed(123)
#ARIMA with NA
Simple_Arima <- Arima(
ts(Rates$Rates1),
order = c(0,1,0),
include.drift = TRUE)
fcasted_Arima <- forecast(Simple_Arima, h = 10)
fcasted_Arima$mean
#ARIMA Without NA
Rates2 <- as.data.frame(Rates$Rates2)
##Remove the final spaces from the CSV
Rates2 <- Rates2[-c(106,107,108,109,110),]
Simple_Arima2 <- Arima(
ts(Rates2),
order = c(0,1,0),
include.drift = TRUE)
fcasted_Arima2 <- forecast(Simple_Arima2, h = 10)
fcasted_Arima2$mean
The link to data is here, CSV format
Could you share your data and code such that others can see if there is any issue with it?
I tried to come up with an example and got the same results for both series, one that includes NAs and one that doesn't.
library(forecast)
library(xts)
set.seed(123)
ts1 <- arima.sim(model = list(0, 1, 0), n = 105)
ts2 <- ts(c(rep(NA, 5), ts1), start = 1)
fit1 <- forecast::Arima(ts1, order = c(0, 1, 0))
fit2 <- forecast::Arima(ts2, order = c(0, 1, 0))
pred1 <- forecast::forecast(fit1, 10)
pred2 <- forecast::forecast(fit2, 10)
forecast::autoplot(pred1)
forecast::autoplot(pred2)
> all.equal(as.numeric(pred1$mean), as.numeric(pred2$mean))
[1] TRUE
I am using the stats::stl function for first time in order to identify and delete the tecnological signal of a crop yields serie. I am not familiar with this method and I am a newbie on programming, in advance I apologize for any mistaken.
These are the original data I am working with:
dat <- data.frame(year= seq(1962,2014,1),yields=c(1100,1040,1130,1174,1250,1350,1450,1226,1070,1474,1526,1719,1849,1766,1342,2000,1750,1750,2270,1550,1220,2400,2750,3200,2125,3125,3737,2297,3665,2859,3574,4519,3616,3247,3624,2964,4326,4321,4219,2818,4052,3770,4170,2854,3598,4767,4657,3564,4340,4573,3834,4700,4168))
This is the ts with frequency =1 (annual) created as input for STL function:
time.series <- ts(data=dat$yields, frequency = 1, start=c(1962, 1), end=c(2014, 1))
plot(time.series, xlab="Years", ylab="Kg/ha", main="Crop yields")
When I try to run the function I get the following error message:
decomposed <- stl(time.series, s.window='periodic')
> Error in stl(time.series, s.window = "periodic") : series is not periodic or has less than two periods
I know that my serie is annual and therefore I can not vary the frequency in the ts which it is seems what causes the error because when I change the frequency I get the seasonal, trend and remainder signals:
time.series <- ts(data=dat$yields, frequency = 12, start=c(1962, 1), end=c(2014, 1))
decomposed <- stl(time.series, s.window='periodic')
plot(decomposed)
I would like to know if there is a method to apply STL function with annual data with a frequency of observation per unit of time = 1.
On the other hand, to remove the tecnological signal, it is only necessary to obviate the trend and remainder signal from the original serie or I am mistaken?
Many thanks for your help.
Since your using annual data, there is no seasonal component, therefore seasonal decomposition of time series would not be appropriate. However, the stats::stl function calls the loess function to estimate trend, which is a local polynomial regression you can adjust to your liking. You can call loess directly and estimate your own trend as followings.
dat <- data.frame(year= seq(1962,2014,1),yields=c(1100,1040,1130,1174,1250,1350,1450,1226,1070,1474,1526,1719,1849,1766,1342,2000,1750,1750,2270,1550,1220,2400,2750,3200,2125,3125,3737,2297,3665,2859,3574,4519,3616,3247,3624,2964,4326,4321,4219,2818,4052,3770,4170,2854,3598,4767,4657,3564,4340,4573,3834,4700,4168))
dat$trend <- loess(yields ~ year, data = dat)$fitted
plot(y = dat$yields, x = dat$year, type = "l", xlab="Years", ylab="Kg/ha", main="Crop yields")
lines(y = dat$trend, x = dat$year, col = "blue", type = "l")
I have 3 time series and I want to predict future values for each of them.
I am using VARS! Package in R.
So this is the approach:
Decompose multiplicative time series and take out the trend, seasonality, and Random part.
time_series1_components = decompose(time_series1,type="mult")
Do this for all the time series.
Apply the VAR Model on the random parts and predict the future values:
random_part1 = time_series1_components$random
random_part2 = time_series2_components$random
random_part3 = time_series3_components$random
merged_df = ts.union(random_part1, random_part2,random_part3, dframe = TRUE)
merged_mat <- data.matrix(merged_df)
merged_mat = na.exclude(merged_mat)
checklag = VARselect(merged_mat)
EstimateModel=VAR(merged_mat, p = 2, type = "const", season = NULL, exogen = NULL)
summary(EstimateModel)
roots(EstimateModel)
predict(EstimateModel)`
Now, I should combine the predicted values of the random part with the trend and seasonality. And Plot a graph showing the past values and predicted values (highlighted separately).
How can I achieve this?
Any pointers will be helpful.
I have fit the model below to my time series data. The xreg consists of a time vector that goes from 1 through 1000 and of 12 indicator variables (1 or 0) that represent the month. The data that I'm dealing with has some strong weekly and monthly seasonal patterns.
fit <- arima(x, order = c(3, 0, 0),
seasonal = list(order = c(1, 0, 1), period = 7),
xreg = cbind(t, M1, M2, M3, M4, M5,
M6, M7, M8, M9, M10, M11, M12), include.mean = FALSE,
transform.pars = TRUE,
fixed = NULL, init = NULL,
method = c("CSS-ML", "ML", "CSS"),
optim.method = "BFGS",
optim.control = list(), kappa = 1e6)
At this time I'm trying to figure out how I can predict 14 values for the month of January (M1=1).
So when I use the predict function in R, I think I need to specify in the newxreg portion that I want M1=1 and M2,...,M12=0 for my prediction - correct?
I've played around with the code, but I couldn't get it to work and I was not able to find very detailed information about the newxreg portion of the predict formula online.
Can anyone explain to me how I can get predictions for one partigular month, say January?
And how do I need to note that in the newxreg part of the predict function?
Many thanks in advance!
I have finally found a way out and wanted to post it - in case it helps someone else.
So basically, newxreg should be a matrix that contains values of the regressors that you want predictions for.
So in my case, my regressors were all 1 or 0 (coded variables) to specify a particular month.
So what I did is I created a matrix of 0's and 1's to be used as my newxreg.
What I did is I defined a matrix mx, and then in the predict function I set newxreg=mx. I made sure that the number of rows of mx>= number of rows of n.ahead.
pred <- predict(fit,n.ahead=n, newxreg=mx)
Hope this is helpful for others as well!