I am trying to do a cross validation using the cvts function from the forecastHybrid package using an "an" model (ARIMA + NNETAR) with external regressors.
I have two variables with 100 observations : Y and X
Note that:
length(Y) == length(X)
TRUE
I did this:
crossv =cvts(Y,
FUN=hybridModel, models="an", a.args=list(xreg=X,n.args=list(xreg=X),
rolling = TRUE, windowSize = 84, maxHorizon = 1, horizonAverage = FALSE)
and got this error
Error in { :
task 1 failed - "variable lengths differ (found for 'xregg')"
if I try to pass it as a function.
CUSTOM=function(x){hybridModel(x, models="an", a.args=list(xreg=X),n.args=list(xreg=X))}
crossv2 = cvts(Y,
FUN=CUSTOM,
rolling = TRUE,
windowSize = 84,
maxHorizon = 1,
horizonAverage = FALSE)
I get:
Error in { : task 1 failed - "object 'X' not found"
I can, of course do cross validation separately for nnetar and arima and then averaging the cross validation forecasting but any ideas why it’s not working via cvts + hybrid model ?
Thanks a lot.
I finally found the answer. We should add an other xreg argument in the FUN function. Then, a second xreg argument (from the cvts function) should be written outside the function. The value of the second xreg are then passed to the first xreg.
y = ts(rnorm(100), start = c(1999,1), frequency = 12)
x = ts(rnorm(100), start = c(1999,1), frequency = 12)
cv = cvts(y, FUN = function(z, xreg = xreg),
forecastHybrid::hybridModel(z,models = "an")},
xreg=as.matrix(x),rolling = TRUE, windowSize = 84,
maxHorizon = 1, horizonAverage = FALSE)
It worked!
accuracy(cv)
ME RMSE MAE
Forecast Horizon 1 -0.2173456 0.9328878 0.7368945
Related
I want to do Impulse response analysis by using Vector Error Correlation Model. Then I saw a this error message.
library(urca)
library(vars)
log_mort = as.data.frame(log1p(morts)) # morts is a vector of 5 values
log_mort
colnames(log_mort) = c("IandP","Cancer","Respiratory","Circulatory","Covid")
jo.mort <- ca.jo(log_mort,ecdet = "none",type = "eigen",K =2 ,spec = "transitory")
summary(jo.mort)
vec2var.mort = vec2var(jo.mort, r = 2)
irf(vec2var.mort,ortho = TRUE, cumulative = FALSE, boot = TRUE, ci = 0.95,
runs = 100, seed = NULL)
Error message:
Error in chol.default(sigma.u) : the leading minor of order 5 is not positive definite
Please teach me what happened and how to solve it
I've been trying to run a simulation using the GillespieSSA package in R. My model has 3 state variables, 2 of which have initial values of 1, and the last of which, I want to initialise at a time step after the beginning of the simulation.
x0 <- c(Pi = 1, Pj = 0, I = 1)
When running the same simulation using deSolve, someone on here suggested I initialise the last state variable by using an event, and that worked great!
eventdat <- data.frame(var = "Pj", time = 10, value = 1, method = "rep")
results <- lsoda(N0, TT, Model, p = parms, events=list(data=eventdat), verbose = TRUE)
I've tried implementing the same strategy within the ssa() function, with no success.
# method = "OTL"; optimised tau leap
TestOutput2 <- ssa(x0, a, nu, parms1, TT, method = "OTL", simName, events=list(data=eventdat),
verbose = TRUE,
consoleInterval = 0,
censusInterval = 0.1,
maxWallTime = 30,
ignoreNegativeState = TRUE)
...
Error in ssa(x0, a, nu, parms1, TT, method, simName, events = list(data = eventdat), :
unused argument (events = list(data = eventdat))
I suspect that events aren't recognised by the ssa() function because of how ssa() implements tau-leaping? That said, I'm wondering if anyone has any ideas about how to initialise one state variable after the beginning of a simulation using the ssa() function?
Any help would be greatly appreciated!!
I am currently using the R package ParBayesianOptimization to tune parameters for ML methods. While searching for an optimal cost parameter for the svmLinear2 model (contained in caret), the optimization stopped with a sudden error after successfully completing 15 iterations.
Here is the error traceback:
Error in rbindlist(l, use.names, fill, idcol) :
Item 2 has 9 columns, inconsistent with item 1 which has 10 columns. To fill missing columns use fill=TRUE.
7.
rbindlist(l, use.names, fill, idcol)
6.
rbind(deparse.level, ...)
5.
rbind(scoreSummary, data.table(Epoch = rep(Epoch, nrow(NewResults)),
Iteration = 1:nrow(NewResults) + nrow(scoreSummary), inBounds = rep(TRUE,
nrow(NewResults)), NewResults))
4.
addIterations(optObj, otherHalting = otherHalting, iters.n = iters.n,
iters.k = iters.k, parallel = parallel, plotProgress = plotProgress,
errorHandling = errorHandling, saveFile = saveFile, verbose = verbose,
...)
3.
ParBayesianOptimization::bayesOpt(FUN = ...
So somehow the data tables storing the summary information each iteration suddenly differ in the number of columns present. Is this a common bug with the ParBayesianOptimization package? Has anyone else encountered a similar problem? Did you find a fix - other than rewriting the addIterations function to fill the missing columns?
EDIT:I don't have an explanation for why the error may suddenly occur after a number of successful iterations. However, this issue has reoccurred when using svmLinear and svmRadial. I was able to reconstruct a similar case with the same error on the iris dataset:
library(data.table)
library(caret)
library(ParBayesianOptimization)
set.seed(1234)
bayes.opt.bounds = list()
bayes.opt.bounds[["svmRadial"]] = list(C = c(0,1000),
sigma = c(0,500))
svmRadScore = function(...){
grid = data.frame(...)
mod = caret::train(Species~., data=iris, method = "svmRadial",
trControl = trainControl(method = "repeatedcv",
number = 7, repeats = 5),
tuneGrid = grid)
return(list(Score = caret::getTrainPerf(mod)[, "TrainAccuracy"], Pred = 0))
}
bayes.create.grid.par = function(bounds, n = 10){
grid = data.table()
params = names(bounds)
grid[, c(params) := lapply(bounds, FUN = function(minMax){
return(runif(n, minMax[1], minMax[2]))}
)]
return(grid)
}
prior.grid.rad = bayes.create.grid.par(bayes.opt.bounds[["svmRadial"]])
svmRadOpt = ParBayesianOptimization::bayesOpt(FUN = svmRadScore,
bounds = bayes.opt.bounds[["svmRadial"]],
initGrid = prior.grid.rad,
iters.n = 100,
acq = "ucb", kappa = 1, parallel = FALSE,plotProgress = TRUE)
Using this example, the error occurred on the 9th epoch.
Thanks!
It appears that the scoring function returned NAs in place of accuracy measures leading to the error later downstream. This has been described by the library's creator at
https://github.com/AnotherSamWilson/ParBayesianOptimization/issues/33.
It looks like the SVM is trying a cost of 0 during the 9th iteration. Given the problem statement the SVM is solving, the cost parameter should probably be positive.
According to AnotherSamWilson, this error may commonly occur when the scoring function "returns something unexpected".
I am making a forecasting model for multidimensional data that uses mean and naive methods for forecasting dimensions with small number of observations.
I am saving all resultst into a dataframe. When I try to do that with snaive model, I get an error:
Error in { : task 1 failed - "number of items to replace is not a
multiple of replacement length"
This is the part of code that is failing:
if(length(timeseries) < 54){
fc.resutl <- meanf(timeseries, h = 20, level = c(80, 95))
} else fc.result <- snaive(timeseries, h = 20, level = c(80, 95))
fc.result <- as.data.frame(fc.result)
loop.output <- rbind(loop.output, fc.result)
I tried to print results from meanf and snaive functions and both seem to be in same format:
Point Forecast Lo80 Hi80 Lo95 Hi95
If I change both to meanf, it works fine, so only snaive is returning an error. Any idea what could be the problem?
I checked execution of code line by line and found out that the error is indeed in snaive(). The error trackback is:
9.
.cbind.ts(list(e1, e2), c(deparse(substitute(e1))[1L],
deparse(substitute(e2))[1L]),union = FALSE)
8.
Ops.ts(r, tsLag(r, -lag))
7.
diff.ts(y, lag = lag)
6.
diff(y, lag = lag)
5.
is.data.frame(x)
4.
var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm)
3.
sd(diff(y, lag = lag), na.rm = TRUE)
2.
lagwalk(x, lag = frequency(x), h = h, drift = FALSE, level = level,
fan = fan, lambda = lambda, biasadj = biasadj)
1.
snaive(timeseries, h = 20, level = c(80, 95))
Sounds like you figured it out but you also have a spelling error in your code on line 2, "fc.resutl" should be "fc.result".
In the following example, I am trying to use Holt-Winters smoothing on daily data, but I run into a couple of issues:
# generate some dummy daily data
mData = cbind(seq.Date(from = as.Date('2011-12-01'),
to = as.Date('2013-11-30'), by = 'day'), rnorm(731))
# convert to a zoo object
zooData = as.zoo(mData[, 2, drop = FALSE],
order.by = as.Date(mData[, 1, drop = FALSE], format = '%Y-%m-%d'),
frequency = 7)
# attempt Holt-Winters smoothing
hw(x = zooData, h = 10, seasonal = 'additive', damped = FALSE,
initial = 'optimal', exponential = FALSE, fan = FALSE)
# no missing values in the data
sum(is.na(zooData))
This leads to the following error:
Error in ets(x, "AAA", alpha = alpha, beta = beta, gamma = gamma,
damped = damped, : You've got to be joking. I need more data! In
addition: Warning message: In ets(x, "AAA", alpha = alpha, beta =
beta, gamma = gamma, damped = damped, : Missing values encountered.
Using longest contiguous portion of time series
Emphasis mine.
Couple of questions:
1. Where are the missing values coming from?
2. I am assuming that the "need more data" arises from attempting to estimate 365 seasonal parameters?
Update 1:
Based on Gabor's suggestion, I have recreated a fractional index for the data where whole numbers are weeks.
I have a couple of questions.
1. Is this is an appropriate way of handling daily data when the periodicity is assumed to be weekly?
2. Is there is a more elegant way of handling the dates when working with daily data?
library(zoo)
library(forecast)
# generate some dummy daily data
mData = cbind(seq.Date(from = as.Date('2011-12-01'),
to = as.Date('2013-11-30'), by = 'day'), rnorm(731))
# conver to a zoo object with weekly frequency
zooDataWeekly = as.zoo(mData[, 2, drop = FALSE],
order.by = seq(from = 0, by = 1/7, length.out = 731))
# attempt Holt-Winters smoothing
hwData = hw(x = zooDataWeekly, h = 10, seasonal = 'additive', damped = FALSE,
initial = 'optimal', exponential = FALSE, fan = FALSE)
plot(zooDataWeekly, col = 'red')
lines(fitted(hwData))
hw requires a ts object not a zoo object. Use
zooDataWeekly <- ts(mData[,2], frequency=7)
Unless there is a good reason for specifying the model exactly, it is usually better to let R select the best model for you:
fit <- ets(zooDataWeekly)
fc <- forecast(fit)
plot(fc)