ParBayesianOptimization suddenly fails while logging epoch results - r

I am currently using the R package ParBayesianOptimization to tune parameters for ML methods. While searching for an optimal cost parameter for the svmLinear2 model (contained in caret), the optimization stopped with a sudden error after successfully completing 15 iterations.
Here is the error traceback:
Error in rbindlist(l, use.names, fill, idcol) :
Item 2 has 9 columns, inconsistent with item 1 which has 10 columns. To fill missing columns use fill=TRUE.
7.
rbindlist(l, use.names, fill, idcol)
6.
rbind(deparse.level, ...)
5.
rbind(scoreSummary, data.table(Epoch = rep(Epoch, nrow(NewResults)),
Iteration = 1:nrow(NewResults) + nrow(scoreSummary), inBounds = rep(TRUE,
nrow(NewResults)), NewResults))
4.
addIterations(optObj, otherHalting = otherHalting, iters.n = iters.n,
iters.k = iters.k, parallel = parallel, plotProgress = plotProgress,
errorHandling = errorHandling, saveFile = saveFile, verbose = verbose,
...)
3.
ParBayesianOptimization::bayesOpt(FUN = ...
So somehow the data tables storing the summary information each iteration suddenly differ in the number of columns present. Is this a common bug with the ParBayesianOptimization package? Has anyone else encountered a similar problem? Did you find a fix - other than rewriting the addIterations function to fill the missing columns?
EDIT:I don't have an explanation for why the error may suddenly occur after a number of successful iterations. However, this issue has reoccurred when using svmLinear and svmRadial. I was able to reconstruct a similar case with the same error on the iris dataset:
library(data.table)
library(caret)
library(ParBayesianOptimization)
set.seed(1234)
bayes.opt.bounds = list()
bayes.opt.bounds[["svmRadial"]] = list(C = c(0,1000),
sigma = c(0,500))
svmRadScore = function(...){
grid = data.frame(...)
mod = caret::train(Species~., data=iris, method = "svmRadial",
trControl = trainControl(method = "repeatedcv",
number = 7, repeats = 5),
tuneGrid = grid)
return(list(Score = caret::getTrainPerf(mod)[, "TrainAccuracy"], Pred = 0))
}
bayes.create.grid.par = function(bounds, n = 10){
grid = data.table()
params = names(bounds)
grid[, c(params) := lapply(bounds, FUN = function(minMax){
return(runif(n, minMax[1], minMax[2]))}
)]
return(grid)
}
prior.grid.rad = bayes.create.grid.par(bayes.opt.bounds[["svmRadial"]])
svmRadOpt = ParBayesianOptimization::bayesOpt(FUN = svmRadScore,
bounds = bayes.opt.bounds[["svmRadial"]],
initGrid = prior.grid.rad,
iters.n = 100,
acq = "ucb", kappa = 1, parallel = FALSE,plotProgress = TRUE)
Using this example, the error occurred on the 9th epoch.
Thanks!

It appears that the scoring function returned NAs in place of accuracy measures leading to the error later downstream. This has been described by the library's creator at
https://github.com/AnotherSamWilson/ParBayesianOptimization/issues/33.
It looks like the SVM is trying a cost of 0 during the 9th iteration. Given the problem statement the SVM is solving, the cost parameter should probably be positive.
According to AnotherSamWilson, this error may commonly occur when the scoring function "returns something unexpected".

Related

Parallel processing in R : Error in checkForRemoteErrors(val) : 7 nodes produced errors; first error: unused argument (MARGIN = base::quote(2))

I am trying to do parallel computing using the apply family of functions with the following lines of code.
the objective is to fit each column of the silum_ matrix to my specification and when I check
dim((simul_[,1]))
I get "NULL" which causes the problem with the apply function.
The complete code is the following:
## Library
lib_vec = c("MSGARCH", "matrixStats", "parallel")
invisible(lapply(lib_vec, f_install_load_lib))
## seed
set.seed(1234)
MSGARCH model Specification from the MSGARCH package
MSGARCH_spec <- CreateSpec(variance.spec = list(model = c("sGARCH", "sGARCH")),
distribution.spec = list(distribution = c("norm",
"norm")),
switch.spec = list(do.mix = FALSE, K = NULL),
constraint.spec = list(fixed = list(),
regime.const = NULL),
prior = list(mean = list(), sd = list()))
MSGARCH fitting: the sp500_logrets are just log-returns.
MSgarch_fit <- FitML(data = sp500_logrets, spec = MSGARCH_spec)
Simulating MSGARCH Log_returns
nsim <- 100 # number of simulations
nahead <- 1000 # size of each simualtion
MS_simul <- simulate(MSgarch_fit, nsim = nsim, nahead = nahead, n.start = 500,
nburn = 100)
simul_ <- MS_simul$draw # retrieving the simulated data
Parallel computing settings
n_cores <- detectCores()
cl <- makeCluster(n_cores[1] - 1)
fitting each simulation by parallel computing with the apply functions
fitt_ <- parSapply(cl, X = simul_, MARGIN = 2, FUN = FitML, spec = MSGARCH_spec)
stopCluster(cl)
The error I get is :
7 nodes produced errors; first error: unused argument (MARGIN = base::quote(2))
I think that
I am quite lost and would very much appreciate any help :)
The error is quite explicit:
7 nodes produced errors; first error: unused argument (MARGIN = base::quote(2))
There is no parameter MARGIN in parSapply() / sapply().
You're maybe mistaking with apply().

How to use cvts on an hybridModel with XREG?

I am trying to do a cross validation using the cvts function from the forecastHybrid package using an "an" model (ARIMA + NNETAR) with external regressors.
I have two variables with 100 observations : Y and X
Note that:
length(Y) == length(X)
TRUE
I did this:
crossv =cvts(Y,
FUN=hybridModel, models="an", a.args=list(xreg=X,n.args=list(xreg=X),
rolling = TRUE, windowSize = 84, maxHorizon = 1, horizonAverage = FALSE)
and got this error
Error in { :
task 1 failed - "variable lengths differ (found for 'xregg')"
if I try to pass it as a function.
CUSTOM=function(x){hybridModel(x, models="an", a.args=list(xreg=X),n.args=list(xreg=X))}
crossv2 = cvts(Y,
FUN=CUSTOM,
rolling = TRUE,
windowSize = 84,
maxHorizon = 1,
horizonAverage = FALSE)
I get:
Error in { : task 1 failed - "object 'X' not found"
I can, of course do cross validation separately for nnetar and arima and then averaging the cross validation forecasting but any ideas why it’s not working via cvts + hybrid model ?
Thanks a lot.
I finally found the answer. We should add an other xreg argument in the FUN function. Then, a second xreg argument (from the cvts function) should be written outside the function. The value of the second xreg are then passed to the first xreg.
y = ts(rnorm(100), start = c(1999,1), frequency = 12)
x = ts(rnorm(100), start = c(1999,1), frequency = 12)
cv = cvts(y, FUN = function(z, xreg = xreg),
forecastHybrid::hybridModel(z,models = "an")},
xreg=as.matrix(x),rolling = TRUE, windowSize = 84,
maxHorizon = 1, horizonAverage = FALSE)
It worked!
accuracy(cv)
ME RMSE MAE
Forecast Horizon 1 -0.2173456 0.9328878 0.7368945

Problem with Over- and Under-Sampling with ROSE in R

I have a dataset to classify between won cases (14399) and lost cases (8677). The dataset has 912 predicting variables.
I am trying to oversample the lost cases in order to reach almost the same number as the won cases (so having 14399 cases for each of the won and lost cases).
TARGET is the column with lost (0) and won (1) cases:
table(dat_train$TARGET)
0 1
8677 14399
Now I am trying to balance them using ROSE ovun.sample
dat_train_bal <- ovun.sample(dat_train$TARGET~., data = dat_train, p=0.5, seed = 1, method = "over")
I get this error:
Error in parse(text = x, keep.source = FALSE) :
<text>:1:17538: unexpected symbol
1: PPER_409030143+BP_RESPPER_9639064007+BP_RESPPER_7459058285+BP_RESPPER_9339059882+BP_RESPPER_9339058664+BP_RESPPER_5209073603+BP_RESPPER_5209061378+CRM_CURRPH_Initiation+Quotation+CRM_CURRPH_Ne
Can anyone help?
Thanks :-)
Reproducing your code from a sham example I found an error in your formula dat_train$TARGET~. needs to be corrected as TARGET~.
dframe <- tibble::tibble(val = sample(c("a", "b"), size = 100, replace = TRUE, prob = c(.1, .9))
, xvar = rnorm(100)
)
# Use oversampling
dframe_os <- ROSE::ovun.sample(formula = val ~ ., data = dframe, p=0.5, seed = 1, method = "over")
table(dframe_os$data$val)

R snaive() - number of items to replace is not a multiple of replacement length

I am making a forecasting model for multidimensional data that uses mean and naive methods for forecasting dimensions with small number of observations.
I am saving all resultst into a dataframe. When I try to do that with snaive model, I get an error:
Error in { : task 1 failed - "number of items to replace is not a
multiple of replacement length"
This is the part of code that is failing:
if(length(timeseries) < 54){
fc.resutl <- meanf(timeseries, h = 20, level = c(80, 95))
} else fc.result <- snaive(timeseries, h = 20, level = c(80, 95))
fc.result <- as.data.frame(fc.result)
loop.output <- rbind(loop.output, fc.result)
I tried to print results from meanf and snaive functions and both seem to be in same format:
Point Forecast Lo80 Hi80 Lo95 Hi95
If I change both to meanf, it works fine, so only snaive is returning an error. Any idea what could be the problem?
I checked execution of code line by line and found out that the error is indeed in snaive(). The error trackback is:
9.
.cbind.ts(list(e1, e2), c(deparse(substitute(e1))[1L],
deparse(substitute(e2))[1L]),union = FALSE)
8.
Ops.ts(r, tsLag(r, -lag))
7.
diff.ts(y, lag = lag)
6.
diff(y, lag = lag)
5.
is.data.frame(x)
4.
var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm)
3.
sd(diff(y, lag = lag), na.rm = TRUE)
2.
lagwalk(x, lag = frequency(x), h = h, drift = FALSE, level = level,
fan = fan, lambda = lambda, biasadj = biasadj)
1.
snaive(timeseries, h = 20, level = c(80, 95))
Sounds like you figured it out but you also have a spelling error in your code on line 2, "fc.resutl" should be "fc.result".

R Caret's rfe [Error in { : task 1 failed - "rfe is expecting 184 importance values but only has 2"]

I am using Caret's rfe for a regression application. My data (in data.table) has 176 predictors (including 49 factor predictors). When I run the function, I get this error:
Error in { : task 1 failed - "rfe is expecting 176 importance values but only has 2"
Then, I used model.matrix( ~ . - 1, data = as.data.frame(train_model_sell_single_bid)) to convert the factor predictors to dummy variables. However, I got similar error:
Error in { : task 1 failed - "rfe is expecting 184 importance values but only has 2"
I'm using R version 3.1.1 on Windows 7 (64-bit), Caret version 6.0-41. I also have Revolution R Enterprise version 7.3 (64-bit) installed.
But the same error was reproduced on Amazon EC2 (c3.8xlarge) Linux instance with R version 3.0.1 and Caret version 6.0-24.
Datasets used (to reproduce my error):
https://www.dropbox.com/s/utuk9bpxl2996dy/train_model_sell_single_bid.RData?dl=0
https://www.dropbox.com/s/s9xcgfit3iqjffp/train_model_bid_outcomes_sell_single.RData?dl=0
My code:
library(caret)
library(data.table)
library(bit64)
library(doMC)
load("train_model_sell_single_bid.RData")
load("train_model_bid_outcomes_sell_single.RData")
subsets <- seq(from = 4, to = 184, by= 4)
registerDoMC(cores = 32)
set.seed(1015498)
ctrl <- rfeControl(functions = lmFuncs,
method = "repeatedcv",
repeats = 1,
#saveDetails = TRUE,
verbose = FALSE)
x <- as.data.frame(train_model_sell_single_bid[,!"security_id", with=FALSE])
y <- train_model_bid_outcomes_sell_single[,bid100]
lmProfile_single_bid100 <- rfe(x, y,
sizes = subsets,
preProc = c("center", "scale"),
rfeControl = ctrl)
It seems that you might have highly correlated predictors.
Prior to feature selection you should run:
crrltn = findCorrelation(correlations, cutoff = .90)
if (length(crrltn) != 0)
x <- x[,-crrltn]
If after this the problem persists, it might be related to high correlation of the predictors within folds automatically generated, you can try to control the generated folds with:
set.seed(12213)
index <- createFolds(y, k = 10, returnTrain = T)
and then give these as arguments to the rfeControl function:
lmctrl <- rfeControl(functions = lmFuncs,
method = "repeatedcv",
index = index,
verbose = TRUE)
set.seed(111333)
lrprofile <- rfe( z , x,
sizes = sizes,
rfeControl = lmctrl)
If you keep having the same problem, check if there are highly correlated between predictors within each fold:
for(i in 1:length(index)){
crrltn = cor(x[index[[i]],])
findCorrelation(crrltn, cutoff = .90, names = T, verbose = T)
}

Resources