Refit an arima model with new training data in fable package - r

I have a function which takes a fitted model and then refits that model to new training data (this is for step-ahead cross validation). For lm models it works like this:
#create data
training_data <-
data.frame(date = seq.Date(
from = as.Date("2020-01-01"),
by = 1, length.out = 365
), x = 1:365, y = 1:365 + rnorm(n = 365))
# specify and fit model
lm_formula <- as.formula(y ~ x)
my_lm <- lm(lm_formula, data = training_data)
# refit on new training data
update(my_lm, data = new_training_data)
Is there a way to do the same thing for arima models fitted with the fable package? I'm creating the models like this
library(fable)
library(forecast)
arima_formula <- as.formula(y ~ x + PDQ(0, 0, 0))
my_arima <- as_tsibble(training_data) %>% model(ARIMA(arima_formula))
But I can't figure out a way to take the my_arima model that I've already fitted and pass it new_training_data, either using update or by extracting the formula and refitting as a new model. Note that although I've included the model formula in the reprex above, my function takes a fitted model rather than a formula. So just fitting a new model using arima_formula is not an option.
Thank you.

Related

plotting semivariograms with non-nlme package models

I am trying to plot a semivariogram of my model residuals for a generalised mixed effect model in R. Doing this for a mixed effect model with normal distribution is straightforward with the nlme package, and using the quakes dataset as an example.
library(nlme)
data(quakes)
head(quakes)
model1 <- lme(mag ~ depth , random = ~1|stations, data = quakes)
summary(model1)
semivario <- Variogram(model1, form = ~long+lat,resType = "normalized")
plot(semivario, smooth = TRUE)
I want to create a model with a non-normal distribution, which I can't do with nlme, so I have tried glmer and glmmPQL. I have turned the 'mag' into a binomial variable, then try to reapply the Variogram function to make a plot with models.
quakes$thresh <- ifelse(quakes$mag > "5", 0, 1)
library(MASS)
model2 <- glmmPQL(as.factor(thresh) ~ depth , random = ~1|stations, family = binomial, data = quakes)
summary(model2)
semivario <- Variogram(model2, form = ~long+lat,resType = "normalized")
plot(semivario, smooth = TRUE)
library(lme4)
model3 <-glmer(as.factor(thresh) ~ depth + (1|stations), data = quakes, family = binomial)
summary(model3)
semivario <- Variogram(model3, form = ~long+lat,resType = "normalized")
plot(semivario, smooth = TRUE)
Neither of these appear to work for plotting the variogram. The glmmPQL says that lat and long isn't found, and the glmer says distance isn't specified.
How can I code a plot of semivariogram of these models? Is the Variogram function from the nlme package unusable for them? And if so what alternatives can I use?

Is it possible to update a BSTS model?

I am working with time-series data which updates daily. Therefore, I thought that the Bayesian framework would fit perfectly because theoretically, it is possible to update the model as new data comes in. Thus, I proceed to model my data with the bsts package in R, which give me quite promising results. Now, I have two questions:
1 - How can I save my current model and then update it with the new data?
Alternatively,
2 - Is it possible to extract the coefficient distribution so that I can plug-in as a prior of a new model based on new data?
Here is a minimal example:
library(bsts)
library(magrittr)
# Use data included in the bsts package for reproducible example
data(iclaims)
plot(initial.claims)
# Let's assume I only have the data until 2011
t0 <- window(initial.claims, end = "2011-01-01")
t0
# Model data
ss <- AddLocalLinearTrend(list(), t0$iclaimsNSA)
ss <- AddSeasonal(ss, t0$iclaimsNSA, nseasons = 52)
model1 <- bsts(iclaimsNSA ~ .,
state.specification = ss,
data = t0,
niter = 1000)
predict(model1, horizon = 30) %>%
plot(.)
# What I would like to do next
saveRDS(model1, file = "model_t0")
# Next month/year
load("model_t0")
NEW_DATA <- window(initial.claims, start = "2011-01-01")
# Option 1, get the priors manually
priors <- get_priors(model_t0) # imaginary function of what I want to do
updated_model <- bsts(iclaimsNSA ~ .,
state.specification = ss,
data = NEW_DATA,
niter = 1000,
prior = priors)
# Option 2, update the model directly
updated_model <- bsts(iclaimsNSA ~ .,
state.specification = ss,
data = NEW_DATA,
niter = 1000,
old_model = model_t0)
I have seen this kind of approach with the brms package, but unfortunately, that package cannot deal with time-series related data.
Kind regards

Caret returns different predictions with caret train object than it does with the extracted final model

I prefer to use caret when fitting models because of its relative speed and preprocessing capabilities. However, I'm slightly confused on how it makes predictions. When comparing predictions made directly from the train object and predictions made from the extracted final model, I'm seeing very different numbers. The predictions from the train object appear to be more accurate.
library(caret)
library(ranger)
x1 <- rnorm(100)
x2 <- rbeta(100, 1, 1)
y <- 2*x1 + x2 + 5*x1*x2
data <- data.frame(x1, x2, y)
fitRanger <- train(y ~ x1 + x2, data = data,
method = 'ranger',
tuneLength = 1,
preProcess = c('knnImpute', 'center', 'scale'))
predict.data <- data.frame(x1 = rnorm(10), x2 = rbeta(10, 1, 1))
prediction1 <- predict(fitRanger, newdata = predict.data)
prediction2 <- predict(fitRanger$finalModel, data = predict.data)$prediction
results <- data.frame(prediction1, prediction2)
results
I'm positive it has something to do with how I preprocess the data in the train object, but even when I preprocess the test data and use the Ranger model to make predictions, the values are different
predict.data.processed <- predict.data %>%
preProcess(method = c('knnImpute',
'center',
'scale')) %>% .$data
results3 <- predict(fitRanger$finalModel, data = predict.data.processed)$prediction
results <- cbind(results, results3)
results
I want to extract the predictions from each individual tree in the ranger model, which I can't do in caret. Any thoughts?
In order to get the same predictions from the final model as with caret train you should pre-process the data in the same way. Using your example with set.seed(1):
caret predict:
prediction1 <- predict(fitRanger,
newdata = predict.data)
ranger predict on the final model. caret pre process was used on predict.data
prediction2 <- predict(fitRanger$finalModel,
data = predict(fitRanger$preProcess,
predict.data))$prediction
all.equal(prediction1,
prediction2)
#output
TRUE

Pass model formula as argument in R

I need to cross-validate several glmer models on the same data so I've made a function to do this (I'm not interested in preexisting functions for doing this). I want to pass an arbitrary glmer model to my function as the only argument. Sadly, I can't figure out how to do this, and the interwebz won't tell me.
Ideally, I would like to do something like:
model = glmer(y ~ x + (1|z), data = train_folds, family = "binomial"
model2 = glmer(y ~ x2 + (1|z), data = train_folds, family = "binomial"
And then call cross_validation_function(model) and cross_validation_function(model2). The training data within the function is called train_fold.
However, I suspect I need to pass the model formula in different way using reformulate.
Here is an example of my function. The project is about predicting autism(ASD) from behavioral features. The data variable is da.
library(pacman)
p_load(tidyverse, stringr, lmerTest, MuMIn, psych, corrgram, ModelMetrics,
caret, boot)
cross_validation_function <- function(model){
#creating folds
participants = unique(da$participant)
folds <- createFolds(participants, 10)
cross_val <- sapply(seq_along(folds), function(x) {
train_folds = filter(da, !(as.numeric(participant) %in% folds[[x]]))
predict_fold = filter(da, as.numeric(participant) %in% folds[[x]])
#model to be tested should be passed as an argument here
train_model <- model
predict_fold <- predict_fold %>%
mutate(predictions_perc = predict(train_model, predict_fold, allow.new.levels = T),
predictions_perc = inv.logit(predictions_perc),
predictions = ifelse(predictions_perc > 0.5, "ASD","control"))
conf_mat <- caret::confusionMatrix(data = predict_fold$predictions, reference = predict_fold$diagnosis, positive = "ASD")
accuracy <- conf_mat$overall[1]
sensitivity <- conf_mat$byClass[1]
specificity <- conf_mat$byClass[2]
fixed_ef <- fixef(train_model)
output <- c(accuracy, sensitivity, specificity, fixed_ef)
})
cross_df <- t(cross_val)
return(cross_df)
}
Solution developed from the comment: Using as.formula strings can be converted into a formula which can passed as arguments to my function in the following way:
cross_validation_function <- function(model_formula){
...
train_model <- glmer(model_formula, data = da, family = "binomial")
...}
formula <- as.formula( "y~ x + (1|z"))
cross_validation_function(formula)
If you aim is to extract the model formula from a fitted model, the you can use
attributes(model)$call[[2]]. Then you can use this formula when fitting model with the cv folds.
mod_formula <- attributes(model)$call[[2]]
train_model = glmer(mod_formula , data = train_data,
family = "binomial")

SARIMAX model in R

I would fit a SARIMAX model with temperature as exogenous variable in R. Can I do that with xreg function present in the package TSA?
I thought to fit the model as:
fit1 = arima(x, order=c(p,d,q), seasonal=list(order=c(P,D,Q), period=S), xreg=temp)
is that correct or I have to use other function of R?
if it itsn't correct: which steps should I use?
Thanks.
Check out the forecast package, it's great:
# some random data
x <- ts(rnorm(120,0,3) + 1:120 + 20*sin(2*pi*(1:120)/12), frequency=12)
temp = rnorm(length(x), 20, 30)
require(forecast)
# build the model (check ?auto.arima)
model = auto.arima(x, xreg = data.frame(temp = temp))
# some random predictors
temp.reg = data.frame(temp = rnorm(10, 20, 30))
# forecasting
forec = forecast(model, xreg = temp.reg)
# quick way to visualize things
plot(forec)
# model diagnosis
tsdiag(model)
# model info
summary(forec)
I won't suggest you to use auto.arima(). Depending on the model you want to fit it may return poor results, as for example when working with some complex SARIMA models the difference between the models done manually and with auto.arima() were noticeable, auto.arima() do not even returned white noise innovations (as it is expected), while manual fits, of course, did.

Resources