ARFIMA model and accurancy function - r

I am foresting with data sets from fpp2 package and forecast package. So my intention is to make automatic forecasting with a several time series. So for that reason I am forecasting with function. You can see code below:
# CODE
library(fpp2)
library(dplyr)
library(forecast)
df<-qauselec
# Forecasting function
fct_fun <- function(Z, hrz = forecast_horizon) {
timeseries <- msts(Z, start = 1956, seasonal.periods = 4)
forecast <- arfima(timeseries)
}
acc_list <- lapply(X = df, fct_fun)
So next step is to check accuracy of model. So for that reason I am trying with this line of code you can see below
accurancy_arfima <- lapply(acc_list, accuracy)
Until now this line of code or function accuracy worked perfectly with other models like snaive,ets etc. but with arfima can’t work properly.
So can anybody help me how to resolve this problem with accuracy function?

Follow R-documentation, Returns range of summary measures of the forecast accuracy. If x is provided, the function measures test set forecast accuracy based on x-f . If x is not provided, the function only produces training set accuracy measures of the forecasts based on f["x"]-fitted(f).
And usage summary can be seen :
accuracy(f, x, test = NULL, d = NULL, D = NULL,
...)
So :
accuracy(acc_list[[1]]$fitted, df)
If you want to evaluate separately accuracy, It will work.
a <- c()
for (i in 1:4) {
b <- accuracy(df[i], acc_list[[1]]$fitted[i])
a <- rbind(a,b)
}

Related

Grey-Markov method in R

In R, I have loaded the built-in time series: AirPassengers and split it in train- and testdata like this:
rm(list = ls())
data = AirPassengers
traindata = ts(data[1:(0.75*length(data))], frequency = 12)
testdata = ts(data[((0.75*length(data))+1):length(data)], frequency = 12)
from here I want to estimate future values of a time series with the traindata using the Grey-Markov method. I know the Grey-Markov method consist of a Grey GM(1, 1) forecasting model followed by a Markov chain forecasting model refinement. But is there a function in R that performs this Grey-Markov method on its own, just like, for example, the auto.arima function?

Is it possible to adapt standard prediction interval code for dlm in R with other distribution?

Using the dlm package in R I fit a dynamic linear model to a time series data set, consisting of 20 observations. I then use the dlmForecast function to predict future values (which I can validate against the genuine data for said period).
I use the following code to create a prediction interval;
ciTheory <- (outer(sapply(fut1$Q, FUN=function(x) sqrt(diag(x))), qnorm(c(0.05,0.95))) +
as.vector(t(fut1$f)))
However my data does not follow a normal distribution and I wondered whether it would be possible to
adapt the qnorm function for other distributions. I have tried qt, but am unable to apply qgamma.......
Just wondered if anyone knew how you would go about sorting this.....
Below is a reproduced version of my code...
library(dlm)
data <- c(20.68502, 17.28549, 12.18363, 13.53479, 15.38779, 16.14770, 20.17536, 43.39321, 42.91027, 49.41402, 59.22262, 55.42043)
mod.build <- function(par) {
dlmModPoly(1, dV = exp(par[1]), dW = exp(par[2]))
}
# Returns most likely estimate of relevant values for parameters
mle <- dlmMLE(a2, rep(0,2), mod.build); #nileMLE$conv
if(mle$convergence==0) print("converged") else print("did not converge")
mod1 <- dlmModPoly(dV = v, dW = c(0, w))
mod1Filt <- dlmFilter(a1, mod1)
fut1 <- dlmForecast(mod1Filt, n = 7)
Cheers

How to extract the baseline hazard function h0(t) from glmnet object in R?

Extract the baseline hazard function h0(t) from glmnet object
I want to know the hazard function at time t >> h(t,X) = h0(t) exp[Σ βi*Xi]. How can I extract the baseline hazard function h0(t) from glmnet object in R?
What I know is that function "basehaz()" in Survival Packages can extract the baseline hazard function from coxph object only.
I also found a function, glmnet.basesurv(time, event, lp, times.eval = NULL, centered = FALSE). But when I try to use this function, there is an error.
Error: could not find function "glmnet.basesurv"
Below is my code, using glmnet to fit the cox model and obtained the coefficients of selected variables. Is it possible to get the baseline hazard function h0(t) from this glmnet object?
Code
# Split data into training data and testing data
set.seed(101)
train_ratio = 2/3
sample <- sample.int(nrow(x), floor(train_ratio*nrow(x)), replace = F)
x.train <- x[sample, ]
x.test <- x[-sample, ]
y.train <- y[sample, ]
y.test <- y[-sample, ]
surv_obj <- Surv(y.train[,1],y.train[,2])
#
my_alpha = 0.5
fit = glmnet(x = x.train, y = surv_obj, family = "cox",alpha = my_alpha) # fit the model with elastic net method
plot(fit,xvar="lambda", main="cox model coefficient paths(glmnet.fit)\n\n") # Plot the paths for the fit
fit
# cross validation to find out best lambda
cv_fit = cv.glmnet(x = x.train,y = surv_obj , family = "cox",nfolds = 10,alpha = my_alpha)
tencrossfit <- cv_fit$glmnet.fit
plot(cv_fit, main="Cross-validated Deviance(10 folds cv.glmnet.fit)\n\n")
plot(tencrossfit, main="cox model coefficient paths(10 folds cv.glmnet.fit)\n\n")
max(cv_fit$cvm)
summary(cv_fit$cvm)
cv_fit$lambda.min
cv_fit$lambda.1se
coef.min = coef(cv_fit, s = "lambda.1se")
pred_min_value2 <- predict(cv_fit, s=cv_fit$lambda.min, newx=x.test,type="link")
I really appreciate any help you can provide.
The glmnet.basesurv function is part of the hdnom package (which is available on CRAN), not glmnet itself. So install that, and then call it.
I had similar question and after installing hdnom install.packages("hdnom"), if you check inside the function list library(help = "hdnom")
you can see that the function is actually glmnet_survcurve(). I made it working as hdnom:::glmnet_survcurve(), example is here:
S <- Surv(data$survtimed, data$outcome)
X_glm<-model.matrix(S~.,data[, c("factor1", "factor2")])
cox_model <- glmnet(X_glm, S, family="cox", alpha=1, lambda=0.2)
times = c (1,2) #for predict of survival and
linearpredictors at times = 1 and 2
predictions = hdnom:::glmnet_survcurve(cox_model, S[,1], S[,2], X_glm, survtime = times)
predictions$p[,1] #survival probability at time 1

arima model for multiple seasonalities in R

I'm learning to create a forecasting model for time series that has multiple seasonalities. Following is the subset of dataset that I'm refering to. This dataset includes hourly data points and I wish to include daily as well as weekly seasonalities in my arima model. Following is the subset of dataset:
data= c(4,4,1,2,6,21,105,257,291,172,72,10,35,42,77,72,133,192,122,59,29,25,24,5,7,3,3,0,7,15,91,230,284,147,67,53,54,55,63,73,114,154,137,57,27,31,25,11,4,4,4,2,7,18,68,218,251,131,71,43,55,62,63,80,120,144,107,42,27,11,10,16,8,10,7,1,4,3,12,17,58,59,68,76,91,95,89,115,107,107,41,40,25,18,14,15,6,12,2,4,1,6,9,14,43,67,67,94,100,129,126,122,132,118,68,26,19,12,9,5,4,2,5,1,3,16,89,233,304,174,53,55,53,52,59,92,117,214,139,73,37,28,15,11,8,1,2,5,4,22,103,258,317,163,58,29,37,46,54,62,95,197,152,58,32,30,17,9,8,1,3,1,3,16,109,245,302,156,53,34,47,46,54,65,102,155,116,51,30,24,17,10,7,4,8,0,11,0,2,225,282,141,4,87,44,60,52,74,135,157,113,57,44,26,29,17,8,7,4,4,2,10,57,125,182,100,33,27,41,39,35,50,69,92,66,30,11,10,11,9,6,5,10,4,1,7,9,17,24,21,29,28,48,38,30,21,26,25,35,10,9,4,4,4,3,5,4,4,4,3,5,10,16,28,47,63,40,49,28,22,18,27,18,10,5,8,7,3,2,2,4,1,4,19,59,167,235,130,57,45,46,42,40,49,64,96,54,27,17,18,15,7,6,2,3,1,2,21,88,187,253,130,77,47,49,48,53,77,109,147,109,45,41,35,16,13)
The code I'm trying to use is following:
tsdata = ts (data, frequency = 24)
aicvalstemp = NULL
aicvals= NULL
for (i in 1:5) {
for (j in 1:5) {
xreg1 = fourier(tsdata,i,24)
xreg2 = fourier(tsdata,j,168)
xregs = cbind(xreg1,xreg2)
armodel = auto.arima(bike_TS_west, xreg = xregs)
aicvalstemp = cbind(i,j,armodel$aic)
aicvals = rbind(aicvals,aicvalstemp)
}
}
The cbind command in the above command fails because the number of rows in xreg1 and xreg2 are different. I even tried using 1:length(data) argument in the fourier function but that also gave me an error. If someone can rectify the mistakes in the above code to produce a forecast of next 24 hours using an arima model with minimum AIC values, it would be really helpful. Also if you can include datasplitting in your code by creating training and testing data sets, it would be totally awesome. Thanks for your help.
I don't understand the desire to fit a weekly "season" to these data as there is no evidence for one in the data subset you provided. Also, you should really log-transform the data because they do not reflect a Gaussian process as is.
So, here's how you could fit models with a some form of hourly signals.
## the data are not normal, so log transform to meet assumption of Gaussian errors
ln_dat <- log(tsdata)
## number of hours to forecast
hrs_out <- 24
## max number of Fourier terms
max_F <- 5
## empty list for model fits
mod_res <- vector("list", max_F)
## fit models with increasing Fourier terms
for (i in 1:max_F) {
xreg <- fourier(ln_dat,i)
mod_res[[i]] <- auto.arima(tsdata, xreg = xreg)
}
## table of AIC results
aic_tbl <- data.frame(F=seq(max_F), AIC=sapply(mod_res, AIC))
## number of Fourier terms in best model
F_best <- which(aic_tbl$AIC==min(aic_tbl$AIC))
## forecast from best model
fore <- forecast(mod_res[[F_best]], xreg=fourierf(ln_dat,F_best,hrs_out))

Hierarchical Time Series

I used the hts package in R to fit an HTS model on train data, used "arima" option to forecast and computed the accuracy on the holdout/test data.
Here is my code:
library(hts)
data<-read.csv("C:/TS.csv")
ts_train <- ts(data[,-1],frequency=12, start=c(2000,1))
hts_train <- hts(ts_train, nodes=list(2, c(4, 2)))
data.test<-read.csv("C:/TStest.csv")
ts_test <- ts(data.test[,-1],frequency=12, start=c(2003,1))
hts_test <- hts(ts_test, nodes=list(2, c(4, 2)))
forecast <- forecast(hts_train, h=15, method="bu", fmethod="arima", keep.fitted = TRUE, keep.resid = TRUE)
accuracy<-accuracy.gts(forecast, hts_test)
Now, let's suppose I'm happy with the accuracy on the holdout sample and I'd like to lump the test data back with the train data and re-forecast using the full set.
I tried using this code:
data.full<-read.csv("C:/TS_full.csv")
ts_full <- ts(data.full[,-1],frequency=12, start=c(2000,1))
hts_full <- hts(ts_full, nodes=list(2, c(4, 2)))
forecast.full <- forecast(hts_full, h=15, method="bu", fmethod="arima", keep.fitted = TRUE, keep.resid = TRUE)
Now, I'm not sure that this is really the right way to do it as I don't know if ARIMA models that were used to estimate my train data are the same ARIMA models that I'm now using to forecast the full data set (I presume fmethod="arima" utilizes auto.arima) . I'd like them to remain the same models, otherwise the models evaluated by my out of sample accuracy measures are different from the models I used for the final forecast.
I see there is a FUN argument that represents "a user-defined function that returns an object which can be passed to the forecast function". Perhaps that argument can be used in the last line of my code somehow to make sure the models I fit on the train data are used to forecast the full data set?
Any suggestions on what sort of R code would help would be much appreciated.
The functions are not set up to do that. However, it is not too difficult to do what you want. Here is some sample code
library(hts)
data <- htseg2
# Split data into training and test sets
hts_train <- window(data, end=2004)
hts_test <- window(data, start=2005)
# Fit models and compute forecasts on all nodes using training data
train <- aggts(hts_train)
fmodels <- list()
fc <- matrix(0, ncol=ncol(train), nrow=3)
for(i in 1:ncol(train))
{
fmodels[[i]] <- auto.arima(train[,i])
fc[,i] <- forecast(fmodels[[i]],h=3)$mean
}
forecast <- combinef(fc, nodes=data$nodes)
accuracy <- accuracy.gts(forecast, hts_test)
# Forecast on full data set without re-estimating parameters
full <- aggts(data)
fcfull <- matrix(0, ncol=ncol(full), nrow=15)
for(i in 1:ncol(full))
{
fcfull[,i] <- forecast(Arima(full[,i], model=fmodels[[i]]),
h=15)$mean
}
forecast.full <- combinef(fcfull, nodes=data$nodes)
# Forecast on full data set with same models but re-estimated parameters
full <- aggts(data)
fcfull <- matrix(0, ncol=ncol(full), nrow=15)
for(i in 1:ncol(full))
{
fcfull[,i] <- forecast(Arima(full[,i],
order=fmodels[[i]]$arma[c(1,6,2)],
seasonal=fmodels[[i]]$arma[c(3,7,4)]),
h=15)$mean
}
forecast.full <- combinef(fcfull, nodes=data$nodes)

Resources