Generation of ARIMA.sim - r

I have generated an ARIMA Model for data I have and need to simulate the model generated into the future by 10 years (approximately 3652 days as the data is daily). This was the best fit model for the data generated by auto.arima, my question is how to simulate it into the future?
mydata.arima505 <- arima(d.y, order=c(5,0,5))

The forecast package has the simulate.Arima() function which does what you want. But first, use the Arima() function rather than the arima() function to fit your model:
library(forecast)
mydata.arima505 <- arima(d.y, order=c(5,0,5))
future_y <- simulate(mydata.arima505, 100)
That will simulate 100 future observations conditional on the past observations using the fitted model.

If your question is to simulate an specific arima process you can use the function arima.sim(). But I am not sure if that really is what you want. Usually you would use your model for predictions.
library(forecast)
# True Data Generating Process
y <- arima.sim(model=list(ar=0.4, ma = 0.5, order =c(1,0,1)), n=100)
#Fit an Model arima model
fit <- auto.arima(y)
#Use the estimaes for a simulation
arima.sim(list(ar = fit$coef["ar1"], ma = fit$coef["ma1"]), n = 50)
#Use the model to make predictions
prediced values <- predict(fit, n.ahead = 50)

Related

Extimate prediction accuracy of cox ph

i would like to develop a cox proportional hazard model with r, use it to predict input and evaluate the accuracy of the model. For the evaluation I would like to use the Brior score.
# import various packages, needed at some point of the script
library("survival")
library("survminer")
library("prodlim")
library("randomForestSRC")
library("pec")
library("rpart")
library("mlr")
library("Hmisc")
library("ipred")
# load lung cancer data
data("lung")
head(lung)
# recode status variable
lung$status <- lung$status-1
# Delete rows with missing values
lung <- na.omit(lung)
# split data into training and testing
## 80% of the sample size
smp_size <- floor(0.8 * nrow(lung))
## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(lung)), size = smp_size)
# training and testing data
train.lung <- lung[train_ind, ]
test.lung <- lung[-train_ind, ]
# time and failure event
s <- Surv(train.lung$time, train.lung$status)
# create model
cox.ph2 <- coxph(s~age+meal.cal+wt.loss, data=train.lung)
# predict
pred <- predict(cox.ph2, newdata = train.lung)
# evaluate
sbrier(s, pred)
as an outcome of the prediction I would expect the time (as in "when does this individuum experience failure). Instead I get values like this
[1] 0.017576359 -0.135928959 -0.347553969 0.112509137 -0.229301199 -0.131861582 0.044589175 0.002634008
[9] 0.345966978 0.209488560 0.002418358
What does that mean?
Furthermore sbrier does not work. Apparently it can not work with the prediction pred (no surprise there)
How do I solve this? How do I make a prediction with cox.ph2? How can I evaluate the model afterwards?
The predict() function won't return a time value, you have to specify the argument type = c("lp", "risk","expected","terms","survival") in the predict() function.
If you want to get the hazard ratios :
predict(cox.ph2, newdata = test.lung, type = "risk")
Note that you want to predict the values on the test set not the training set.
I have read that you can use AFT models in your case :
https://stats.stackexchange.com/questions/79362/how-to-get-predictions-in-terms-of-survival-time-from-a-cox-ph-model
You also can read this post :
Calculate the Survival prediction using Cox Proportional Hazard model in R
Hope it will help

Prediction interval for ACP model in R

I'm trying to teach myself a bit about modeling time series for 'counts' data. I found a pretty simple model, the Autoregressive Conditional Poisson model (ACP) (Heinen 2003), that has an accompanying R package {acp}. I'm having trouble finding information about how to construct n-step-ahead prediction intervals for predictions made from an ACP model. Inconveniently, forecast doesn't work with these ACP objects. Any thoughts on how to construct these?
Additionally, when using predict() with an ACP model, you have to include an argument, newydata, that is a data frame of the values you want to predict...? Maybe I'm misinterpreting this, but it seems like you need to already have y when predicting yhat. Why?
Below I copy/pasted the example code from the {acp} package.
library(acp)
data(polio)
trend=(1:168/168)
cos12=cos((2*pi*(1:168))/12)
sin12=sin((2*pi*(1:168))/12)
cos6=cos((2*pi*(1:168))/6)
sin6=sin((2*pi*(1:168))/6)
#Autoregressive Conditional Poisson Model with explaning covariates
polio_data<-data.frame(polio, trend , cos12, sin12, cos6, sin6)
mod1 <- acp(polio~-1+trend+cos12+sin12+cos6+sin6,data=polio_data, p = 1 ,q = 2)
summary(mod1)
#Static out-of-sample fit example
train<-data.frame(polio_data[c(1: 119),])
mod1t <- acp(polio~-1+trend+cos12+sin12+cos6+sin6,data=train, p = 1 ,q = 2)
xpolio_data<-data.frame(trend , cos12, sin12, cos6, sin6)
test<-xpolio_data[c(120:nrow(xpolio_data)),]
yfor<-polio_data[120:nrow(polio_data),1]
predict(mod1t,yfor,test)
#Autoregressive Conditional Poisson Model without explaning covariates
polio_data<-data.frame(polio)
mod2 <- acp(polio~-1,data=polio_data, p = 3 ,q = 1)
summary(mod2)
The second argument in the predict() command is the vector of observed y values that confuses me.
Thanks!

Probability predictions with model averaged Cumulative Link Mixed Models fitted with clmm in ordinal package

I found that the predict function is currently not implemented in cumulative link mixed models fitted using the clmm function in ordinal R package. While predict is implemented for clmm2 in the same package, I chose to apply clmm instead because the later allows for more than one random effects. Further, I also fitted several clmm models and performed model averaging using model.avg function in MuMIn package. Ideally, I want to predict probabilities using the average model. However, while MuMIn supports clmm models, predict will also not work with the average model.
Is there a way to hack the predict function so that the function not only could predict probabilities from a clmm model, but also predict using model averaged coefficients from clmm (i.e. object of class "averaging")? For example:
require(ordinal)
require(MuMIn)
mm1 <- clmm(SURENESS ~ PROD + (1|RESP) + (1|RESP:PROD), data = soup,
link = "probit", threshold = "equidistant")
## test random effect:
mm2 <- clmm(SURENESS ~ PROD + (1|RESP) + (1|RESP:PROD), data = soup,
link = "logistic", threshold = "equidistant")
#create a model selection object
mm.sel<-model.sel(mm1,mm2)
##perform a model average
mm.avg<-model.avg(mm.sel)
#create new data and predict
new.data<-soup
##predict with indivindual model
predict(mm1, new.data)
I got the following error message:
In UseMethod("predict") :
no applicable method for predict applied to an object of class "clmm"
##predict with model average
predict(mm.avg, new.data)
Another error is returned:
Error in predict.averaging(mm.avg, new.data) :
predict for models 'mm1' and 'mm2' caused errors
I've been using clmm as well and yes I confirm predict.clmm is NOT (yet?) implemented. I didn't yet check the source code for fake.predict.clmm. It might work. If it doesn't, you're stuck with doing stuff by hand or using predict.clmm2.
I found a potential solution (pasted below) but have not been able to make work for my data.
Solution here: https://gist.github.com/mainambui/c803aaf857e54a5c9089ea05f91473bc
I think the problem is the number of coefficients I am using but am not experienced enough to figure it out. Hopefully this helps someone out though.
This is the model and newdata that I am using, though it is actually a model averaged version. Same predictors though.
ma10 <- clmm(Location3 ~ Sex * Grass3 + Sex * Forb3 + (1|Tag_ID), data =
IP_all_dunes)
ma_1 <- model.avg(ma10, ma8, ma5)##top 3 models
new_ma<- data.frame(Sex = c("m","f","m","f","m","f","m","f"),
Grass3 = c("1","1","1","1","0","0","0","0"),
Forb3 = c("0","0","1","1","0","0","1","1"))
# Arguments:
# - model = a clmm model
# - modelAvg = a clmm model average (object of class averaging)
# - newdata = a dataframe of new data to apply the model to
# Returns a dataframe of predicted probabilities for each row and response level
fake.predict.clmm <- function(modelAvg, newdata) {
# Actual prediction function
pred <- function(eta, theta, cat = 1:(length(theta) + 1), inv.link = plogis) {
Theta <- c(-1000, theta, 1000)
sapply(cat, function(j) inv.link(Theta[j + 1] - eta) - inv.link(Theta[j] -
eta))
}
# Multiply each row by the coefficients
#coefs <- c(model$beta, unlist(model$ST))##turn off if a model average is used
beta <- modelAvg$coefficients[2,3:12]
coefs <- c(beta, unlist(modelAvg$ST))
xbetas <- sweep(newdata, MARGIN=2, coefs, `*`)
# Make predictions
Theta<-modelAvg$coefficients[2,1:2]
#pred.mat <- data.frame(pred(eta=rowSums(xbetas), theta=model$Theta))
pred.mat <- data.frame(pred(eta=rowSums(xbetas), theta=Theta))
#colnames(pred.mat) <- levels(model$model[,1])
a<-attr(modelAvg, "modelList")
colnames(pred.mat) <- levels(a[[1]]$model[,1])
pred.mat
}

Error in arima of R: too few non-missing observations

I am using arima() and auto.arima() of R to get the prediction of sales. The data is at week level for three years.
my code looks like:
x<-c(1571,1501,895,1335,2306,930,2850,1380,975,1080,990,765,615,585,838,555,1449,615,705,465,165,630,330,825,555,720,615,360,765,1080,825,525,885,507,884,1230,342,615,1161,
1585,723,390,690,993,1025,1515,903,990,1510,1638,1461.67,1082,1075,2315,1014,2140,1572,794,1363,1184,1248,1344,1056,816,720,896,608,624,560,512,304,640,640,704,1072,768,
816,640,272,1168,736,1003,864,658.67,768,841,1727,944,848,432,704,850.67,1205,592,1104,976,629,814,1626,933.33,1100.33,1730,2742,1552,1038,826,1888,1440,1372,824,1824,1392,1424,768,464,
960,320,384,512,478,1488,384,338.67,176,624,464,528,592,288,544,418.67,336,752,400,1232,477.67,416,810.67,1256,1040,823,240,1422,704,718,1193,1541,1008,640,752,
1008,864,1507,4123,2176,899,1717,935)
length_data<-length(x)
length_train<-round(length_data*0.80)
forecast_period<-length_data-length_train
train_data<-x[1:length_train]
train_data<-ts(train_data,frequency=52,start=c(1,1))
validation_data<-x[(length_train+1):length_data]
validation_data<-ts(validation_data,frequency=52,start=c(ceiling((length_train)/52),((length_train)%%52+1)))
arima_output<-auto.arima(train_data) # fit the ARIMA Model
arima_validate <- Arima(x=validation_data,model=arima_output)
Error:
Error in stats::arima(x = x, order = order, seasonal = seasonal, include.mean = include.mean, :
too few non-missing observations
What I am doing wrong?
What does it mean by "too few non-missing observations"? I have searched it now net, but did not get any better explanation.
Thanks for any kind of help!
arima_output is a seasonal ARIMA model:
> arima_output
Series: train_data
ARIMA(1,0,1)(0,1,0)[52]
Arima() then attempts to refit this particular model to validation_data. But to fit a seasonal model to a time series, you need at least one full year of observations, since seasonal ARIMA depends on seasonal differencing.
As an illustration, note that Arima() will happily and without errors refit a time series that is double as long as validation_data:
validation_data <- x[(length_train+1):length_data]
validation_data<-ts(rep(validation_data,2),frequency=52,
start=c(ceiling((length_train)/52),((length_train)%%52+1)))
arima_validate <- Arima(x=validation_data,model=arima_output)
One way of dealing with this would be to force auto.arima() to use a nonseasonal model, by specifying D=0:
validation_data <- x[(length_train+1):length_data]
validation_data<-ts(validation_data,frequency=52,
start=c(ceiling((length_train)/52),((length_train)%%52+1)))
arima_output<-auto.arima(train_data, D=0) # fit the ARIMA Model
arima_validate <- Arima(x=validation_data,model=arima_output)
So this did turn out to be more of a CrossValidated question...
Your chosen model is ARIMA(1,0,1)(0,1,0)[52]. That is, it has a seasonal difference of lag 52. Your validation data has 32 observations. So you cannot take the seasonal differences on the validation data without knowing what the training data is.
One way around this is to fit the model to the full time series, and then extract what you want (presumably residuals from the validation portion).
You can also improve the readability of your code:
x <- ts(x, frequency=52, start=c(1,1))
length_data <- length(x)
length_train <- round(length_data*0.80)
train_data <- ts(head(x, length_train),
frequency=frequency(x), start=start(x))
validation_data <- ts(tail(x, length_data-length_train),
frequency=frequency(x), end=end(x))
library(forecast)
arima_train <- auto.arima(train_data)
arima_full <- Arima(x, model=arima_train)
res <- window(residuals(arima_full), start=start(validation_data))

SARIMAX model in R

I would fit a SARIMAX model with temperature as exogenous variable in R. Can I do that with xreg function present in the package TSA?
I thought to fit the model as:
fit1 = arima(x, order=c(p,d,q), seasonal=list(order=c(P,D,Q), period=S), xreg=temp)
is that correct or I have to use other function of R?
if it itsn't correct: which steps should I use?
Thanks.
Check out the forecast package, it's great:
# some random data
x <- ts(rnorm(120,0,3) + 1:120 + 20*sin(2*pi*(1:120)/12), frequency=12)
temp = rnorm(length(x), 20, 30)
require(forecast)
# build the model (check ?auto.arima)
model = auto.arima(x, xreg = data.frame(temp = temp))
# some random predictors
temp.reg = data.frame(temp = rnorm(10, 20, 30))
# forecasting
forec = forecast(model, xreg = temp.reg)
# quick way to visualize things
plot(forec)
# model diagnosis
tsdiag(model)
# model info
summary(forec)
I won't suggest you to use auto.arima(). Depending on the model you want to fit it may return poor results, as for example when working with some complex SARIMA models the difference between the models done manually and with auto.arima() were noticeable, auto.arima() do not even returned white noise innovations (as it is expected), while manual fits, of course, did.

Resources