Auto.arima is not showing any order - r

I am trying to fit arima model using auto.arima function in R. The result is showing order (0,0,0) even though the data is non-stationary.
auto.arima(x,approximation=TRUE)
ARIMA(0,0,0) with non-zero mean
Can someone advice why such results are coming? Btw i am running this function on only 10 data points.

10 data points is a very low number of observations for estimating an ARIMA model. I doubt that you can make any sensible estimation based on this. Moreover, the estimated model may depend strongly on the part of a time series you looked at and adding only very few observations can change the characteristics of the estimated model significantly. For example:
When I take a time series with only 10 observations, I also get a ARIMA(0,0,0) model:
library(forecast)
vec1 <- ts(c(10.26063, 10.60462, 10.37365, 11.03608, 11.19136, 11.13591, 10.84063, 10.66458, 11.06324, 10.75535), frequency = 12)
fit1 <- auto.arima(vec1)
summary(fit1)
However, if I use about 30 observations, it an ARIMA(1,0,0) model is estimated:
vec2 <- ts(c(10.260626, 10.604616, 10.373652, 11.036079, 11.191359, 11.135914, 10.840628, 10.664575, 11.063239, 10.755350,
10.158032, 10.653669, 10.659231, 10.483478, 10.739133, 10.400146, 10.205993, 10.827950, 11.018257, 11.633930,
11.287756, 11.202727, 11.244572, 11.452180, 11.199706, 10.970823, 10.386131, 10.184201, 10.209338, 9.544736), frequency = 12)
fit1 <- auto.arima(vec2)
summary(fit1)
If I use the whole time series (413 observations), the auto.arima function estimates a "ARIMA(2,1,4)(0,0,1)[12] with drift".
Thus, I would think that 10 observation is indeed not enough information for fitting a model.

Related

How to lower model sensitivity for the starting value of a weighting function for a MIDAS regression in R

I'm using the midas_r package and I'm wondering if there is a possibility to lower the MIDAS model sensitivity for the starting value of a weighting function to minimize my error metric.
I did a simulation with different starting values and I observe that the forecasting results are quite sensitive to the initial values. There is around 30% difference between the min and max Root Mean Square Forecast Error (RMSFE) for the simulation.
I simulated the starting value distribution below :
df<-setNames(data.frame(matrix(ncol=2,nrow=n_simulation)),c('Starting_value','RMSFE'))
for ( i in 1:n_simulation){
randomvalue_1 <- runif(1,-5.0,5.0)
randomvalue_2 <- runif(1,-5.0,5.0)
randomvalue_3 <- runif(1,-5.0,5.0)
random_vecteur=c(randomvalue_1,randomvalue_2)
mod1 <- midas_r(target_data ~ mls(daily_data, 1:2, 25, nealmon) + mls(target_data, 1:1, 1),
start=list(daily_data=random_vecteur),Ofunction = 'optim',method='Nelder-Mead')
##Calculate average forecasts
avgf <- average_forecast(list(mod1),
data=list(daily_data=daily_data,target_data=target_data),
insample=1:132,outsample=133:180,
type="rolling",
measures=c("MSE","MAPE","MASE"),
fweights=c("EW","BICW","MSFE","DMSFE"))
df$Starting_value[i]=paste('(',paste(toString(random_vecteur),')'))
df$`RMSFE`[i]=sqrt(avgf$accuracy$individual$MSE.out.of.sample[1])}
Is there something that I can do to lower the model sensitivity, Or I'm doing something wrong?
I tried to use the update function #update(Ofunction='nls') as suggested in Mixed Frequency Data Sampling Regression (2016) Models: The R Package midasr, but I still observe the sensitivity.
I'm willing to share my data if needed
Thank you!

Auto.Arima fits well except for a single spike

I'm an engineering grad student and as a small part of my thesis I'm trying to analyze some groundwater data in r using the auto.arima function. The fitted functions for my data fit well except for one spike in the data and I cannot figure out for the life of me why they go off the rails here. There are no oddities or missing values in the data. The data is the elevation of the groundwater and has one recorded point per day.
My raw unfitted data looks like this:
#load tseries library
library(tseries)
# RESERVOIR ONLY ANALYSIS#
#Daily Piezometric Data from PS13-01
PS1301 = read.csv("PS13-01.csv",TRUE,",")
#impute missing data from data set
PS1301 = imputeTS::na_interpolation(PS1301)
#Create Time Series
PS1301 = ts(PS1301[,2],frequency = (365),start = c(2013,116))
plot(PS1301, xlab='Time', ylab = 'Piezometric Head')
And then after running Auto.Arima it fits this:
#Auto Arima of only piezometers
#PS1301
AAPS1301 = auto.arima(PS1301)
AAPS1301
summary(AAPS1301)
## Series: PS1301
## ARIMA(2,1,0)(0,1,0)[365]
##
## Coefficients:
## ar1 ar2
## 0.3362 0.5722
## s.e. 0.0643 0.0625
##
## sigma^2 estimated as 0.02372: log likelihood=2779.3
## AIC=-5552.61 AICc=-5552.59 BIC=-5536.39
plot(PS1301,col="red")
lines(fitted(AAPS1301),col="blue")
Any help would be appreciated, I'm pretty unsure as to what to do from here. I feel like this has to be an error because of how well the fit is(visually) for the rest of the time series. I'm also more than happy to provide the raw data but I am not sure how to put it in this post other than as a dropbox link https://www.dropbox.com/sh/563nu3daeid0agb/AAB6NSddVUKgBCCbQtuqXPsZa?dl=0
The problem here is that the seasonal period is very long (365) and R is trying to fit a diffuse prior to the corresponding state space model -- which becomes increasingly difficult with very long periods. There appears to be some numerical instability as a result, giving inaccurate fitted values at the 366th and 367th observations.
I am not convinced that using a seasonal ARIMA with such a long period makes any sense, but if you want to do it, use the CSS estimation method instead of full likelihood:
fit_css <- auto.arima(PS1301, method='CSS')
It is also much faster.

Auto.arima() function does not result in white noise. How else should I go about modeling data

Here is the plot of the initial data (after performing a log transformation).
It is evident there is both a linear trend as well as a seasonal trend. I can address both of these by taking the first and twelfth (seasonal) difference: diff(diff(data), 12). After doing so, here is the plot of the resulting data
.
This data does not look great. While the mean in constant, we see a funneling effect as time progresses. Here are the ACF/PACF:.
Any suggestions for possible fits to try. I used the auto.arima() function which suggested an ARIMA(2,0,2)xARIMA(1,0,2)(12) model. However, once I took the residuals from the fit, it was clear there was still some sort of structure in them. Here is the plot of the residuals from the fit as well as the ACF/PACF of the residuals.
There does not appear to be a seasonal pattern regarding which lags have spikes in the ACF/PACF of residuals. However, this is still something not captured by the previous steps. What do you suggest I do? How could I go about building a better model that has better model diagnostics (which at this point is just a better looking ACF and PACF)?
Here is my simplified code thus far:
library(TSA)
library(forecast)
beer <- read.csv('beer.csv', header = TRUE)
beer <- ts(beer$Production, start = c(1956, 1), frequency = 12)
# transform data
boxcox <- BoxCox.ar(beer) # 0 in confidence interval
beer.log <- log(beer)
firstDifference <- diff(diff(beer.log), 12) # get rid of linear and
# seasonal trend
acf(firstDifference)
pacf(firstDifference)
eacf(firstDifference)
plot(armasubsets(firstDifference, nar=12, nma=12))
# fitting the model
auto.arima(firstDifference, ic = 'bic') # from forecasting package
modelFit <- arima(firstDifference, order=c(1,0,0),seasonal
=list(order=c(2, 0, 0), period = 12))
# assessing model
resid <- modelFit$residuals
acf(resid, lag.max = 15)
pacf(resid, lag.max = 15)
Here is the data, if interested (I think you can use an html to csv converter if you would like): https://docs.google.com/spreadsheets/d/1S8BbNBdQFpQAiCA4J18bf7PITb8kfThorMENW-FRvW4/pubhtml
Jane,
There are a few things going on here.
Instead of logs, we used the tsay variance test which shows that the variance increased after period 118. Weighted least squares deals with it.
March becomes higher beginning at period 111. An alternative to an ar12 or seasonal differencing is to identify seasonal dummies. We found that 7 of the 12 months were unusual with a couple level shifts, an AR2 with 2 outliers.
Here is the fit and forecasts.
Here are the residuals.
ACF of residuals
Note: I am a developer of the software Autobox. All models are wrong. Some are useful.
Here is Tsay's paper
http://onlinelibrary.wiley.com/doi/10.1002/for.3980070102/abstract

Forecasting using support vector regression in R

I want to forecast the future energy consumption using support vector regression in R.I have this code but I'am not sure weather it is correct or not.
`#gathering the data
data<-read.csv("C:\\2003_smd_hourly.csv",header=TRUE) #these are the values which are used to train the given model#
data
#data1<-read.csv("C:\\pr.csv",header=TRUE)#this file/ddata is used for checking the accuracy of prediction#
#data1
#y1<-data1[,15]
#x0<-data1[,2]
y<-data[,15] #sysload
x1<-data[,2] #houroftheday
x2<-data[,13] #drybulb temp(actualtemp)
x3<-data[,14] #dewpnttemp
#train<-sample(744,447)
#train
library(e1071)
model<-svm(y~x1+x2+x3,data=data[1:48,],cost=2.52*10^11,epsilon=0.0150,gamma=1)
model
#pr<-data[-train,]
#pr
predict1<-predict(model,newdata=data[49:72,])
predict1
par(mfrow=c(2,2))
plot(x1,y,col="red",pch=4)
#par(new=TRUE)
plot(x1,predict1,col="blue",pch=5) #plotting the values that have been predicted
#par(new=TRUE)
plot(x0,y1,col="black",pch=1)
error=y1-predict1
error
mae <- function(error)
{
mean(abs(error))
}
mae(error)
error <- y1 - predict1
error
rmse <- function(error)
{
sqrt(mean(error^2))
}
svrPredictionRMSE <- rmse(error)
svrPredictionRMSE
max(error)
min(error)
mape <- function(y1,predict1)
mape
mean(abs((y1 - predict1)/y1))*100
mape
`Eg:data can be found here http://pastebin.com/MUfWFCPM
Use the newdata parameter for prediction (you newdata for test should have the same set of features as the training data). e.g., with mtcars dataset
library(e1071)
model<-svm(mpg~wt+disp+qsec,data=mtcars[1:24,],cost=512,epsilon=0.01)
model
predict1<-predict(model,newdata=mtcars[25:32,])
predict1 # prediction for the new 8 data points
Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
28.514002 31.184527 23.022863 22.603601 6.228431 30.482475 6.801507 22.939945
If you want to predict what happens in the next two days, you have to train a model to predict two days ahead. Let's pick a simple example, then I'll move to an SVR. Suppose we use a linear AR-direct forecasting model and, through some method we determined that two lags are enough. So we have this model:
y_{t+h} = alpha + phi_1 y_{t} + phi_2 y_{t-1} + e_{t+h}
The literature in economics calls this an AR-direct forecast because it directly ouputs y_{t+h}, as opposed to indirectly producing y_{t+h} by providing a recursive relationship across forecasts. Say that 'y' is the temperature in celcius degrees, so you want to forecast the temperature in two days using temperature data up until -- and including -- today. Suppose we use daily temperatures of the last month.
We know that ordinary least squares is a convergent estimator of alpha, phi_1 and phi_2, so we can form a matrix, X, containing a column of ones, one column of temperatures lagged h times and a column of temperatures lagged h + 1 times. Then, compute a linear projection of our temperature vector, y, on X like so: estimated [alpha, phi_1, phi_2] = (X'X)^-1X'Y.
Now, we have estimated parameters for the whole sample. If I want to know y_{t+h}, I need a constant (we arbitrarily picked '1' to estimate the model, so we'll use '1'), the temperature today and the temperature yesterday. Suppose h=2 here:
predicted temperate in two days = alpha + phi_1 x temperature today + phi_2 x temperature yesterday
You see, the difference between training the model and applying the model lies in a simple shift: y_{t} = alpha + phi_1 y_{t-h} + phi_2 y_{t-h-1} + e_{t} is what we fitted in the training sample. The last in-sample prediction we made using this model is the temperate today, using the temperatre 3 and 4 days ago, respectively. We also produced least square forecasts for all other observed temperatures, except the first three observations -- to forecast with this model, we need two observations plus a two day gap.
Now, with SVMs and SVRs, the point is very similar. Your predicted output is a real-valued label in the case of a regression problem. Suppose we also want to forecast the temperature, also two days in advance, using the same data and using the same regressors. Then, the input space of our SVR is defined by two vectors -- the same two lagged vectors of temperatures we used.
When we train the SVR on the whole dataset, we produce forecasts for each observations in the dataset -- again, except for the first three observations.
For e-insensitive SVR, let K() be the kernel we use, x_i is a support vector (it's one point in the y_{t}, y_{t-1} space) and n_sv is the number of support vectors:
y_{t+h} = sum_{i=1}^{n_sv} (alpha_i - alpha_i*) K(x_i, x)
Forecasting y_{t+h} is like asking what is the real-valued label of x: you input the last p (in this case, p=2) observations in the trained decision rule of the SVR and it gives you a label. If it was a support vector machine for classification, the training would result in a sperating hyperplane and you would decide on the label of any point which has coordinates in the input space by asking 'on what side of the plane is it?'... It's the exact same thing here, except you are looking for a real value.
So, programming-wise, you just need to provide a vector with the right dimension to 'predict': predict(best_model_you_picked, newdata=appropriate_input_space_vector)
Note that if you trained your model on the 'whole sample', but some of the variables you used are lagged variables, the model is not fitted on the last few observations of the non-lagged variables... just like the AR model estimated by OLS does not use the last h observations to forecast in-sample.

how to use forecast function for simple moving average model in r?

I want to predict the future values for my simple moving average model. I used the following procedure:
x <- c(14,10,11,7,10,9,11,19,7,10,21,9,8,16,21,14,6,7)
df <- data.frame(x)
dftimeseries <- ts(df)
library(TTR)
smadf <- SMA(dftimeseries, 4) # lag is 4
library(forecast)
forecasteddf <- forecast(smadf, 4) # future 4 values
When run the above code, my forecast values are the same for all the next 4 days. Am I coding it correctly? Or, am I conceptually wrong?
The same is the case with exponential moving average, weighted moving average, and ARIMA also.
For a moving average model you can read here
"Since the model assumes a constant underlying mean, the forecast for any number of periods in the future is the same...".
So, your result are to be expected considering the characteristics of the moving average mode.
The forecast is from the fpp2 package and the moving average function is from the smooth package.
This is an example:
library(smooth)
library(fpp2)
library(readxl)
setwd("C:\Users\lferreira\Desktop\FORECASTING")
data<- read_xlsx("BASE_TESTE.xlsx")
ts <- ts(data$1740,start=c(2014,1),frequency=4)
fc <- forecast(sma(ts),h=3)
Error: The provided model is not Simple Moving Average!

Resources