R package mlr exhausts memory with multicore

R package mlr exhausts memory with multicore - r

I am trying to run a reproducible example with the mlr R package in parallel, for which I have found the solution of using parallelStartMulticore (link). The project runs with packrat as well.
The code runs properly on workstations and small servers, but running it in an HPC with the torque batch system runs into memory exhaustion. It seems that R threads are spawned ad infinitum, contrary to regular linux machines. I have tried to switch to parallelStartSocket, which works fine, but then I cannot reproduce the results with RNG seeds.
Here is a minimal example:
library(mlr)
library(parallelMap)
M <- data.frame(x = runif(1e2), y = as.factor(rnorm(1e2) > 0))
# Example with random forest
parallelStartMulticore(parallel::detectCores())
plyr::l_ply(
seq(100),
function(x) {
message("Iteration number: ", x)
set.seed(1, "L'Ecuyer")
tsk <- makeClassifTask(data = M, target = "y")
num_ps <- makeParamSet(
makeIntegerParam("ntree", lower = 10, upper = 50),
makeIntegerParam("nodesize", lower = 1, upper = 5)
)
ctrl <- makeTuneControlGrid(resolution = 2L, tune.threshold = TRUE)
# define learner
lrn <- makeLearner("classif.randomForest", predict.type = "prob")
rdesc <- makeResampleDesc("CV", iters = 2L, stratify = TRUE)
# Grid search in parallel
res <- tuneParams(
lrn, task = tsk, resampling = rdesc, par.set = num_ps,
measures = list(auc), control = ctrl)
# Fit optimal params
lrn.optim <- setHyperPars(lrn, par.vals = res$x)
m <- train(lrn.optim, tsk)
# Test set
pred_rf <- predict(m, newdata = M)
pred_rf
}
)
parallelStop()
The hardware of the HPC is an HP Apollo 6000 System ProLiant XL230a Gen9 Server blade 64-bit, with Intel Xeon E5-2683 processors. I ignore if the issue comes from the torque batch system, the hardware or any flaw in the above code. The sessionInfo() of the HPC:
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /cm/shared/apps/intel/parallel_studio_xe/2017/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] parallelMap_1.3 mlr_2.11 ParamHelpers_1.10 RLinuxModules_0.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 splines_3.4.0 munsell_0.4.3
[4] colorspace_1.3-2 lattice_0.20-35 rlang_0.1.1
[7] plyr_1.8.4 tools_3.4.0 parallel_3.4.0
[10] grid_3.4.0 packrat_0.4.8-1 checkmate_1.8.2
[13] data.table_1.10.4 gtable_0.2.0 randomForest_4.6-12
[16] survival_2.41-3 lazyeval_0.2.0 tibble_1.3.1
[19] Matrix_1.2-12 ggplot2_2.2.1 stringi_1.1.5
[22] compiler_3.4.0 BBmisc_1.11 scales_0.4.1
[25] backports_1.0.5

The "multicore" parallelMap backend uses parallel::mcmapply which should create a new fork()ed child process for every evaluation inside tuneParams and then quickly kill that process. Depending on what you use to count memory usage / active processes, it is possible that memory gets mis-reported and that child processes that are already dead (and were only alive for the fraction of a second) are shown, or that killing of finished processes for some reason does not happen.
Possible problems:
The batch system does not correctly track memory usage and counts the parent process's memory for every child separately. Does /usr/bin/free actually report that 30GB are gone while the script is running? As an easier test case, consider (running in an empty R session)
xxx <- 1:1e9
parallel::mclapply(1:4, function(x) {
Sys.sleep(60)
}, mc.cores = 4)
which should use about 4 GB of memory. If, during the 60 seconds in the child process, the reported memory usage is about 16 GB, it is this problem.
Memory reporting is accurate, but for some reason the memory space is changed a lot inside the child processes (triggering many COW writes), e.g. because of garbage collection. Does calling gc() before the tuneParams() call help?
Some setting on the machine prevents the "parallel" package from killing child processes. The following:
parallel::mclapply(1:4, function(x) {
xxx <<- 1:1e9 ; NULL
}, mc.cores = 4)
Sys.sleep(60)
should grab about 16 GB of memory, but release it right away. If the memory remains used during the Sys.sleep (and the remaining R session), it might be this problem.

Related

Parallelization of Rcpp without inline/ creating a local package

I am creating a package that I hope to eventually put onto CRAN. I have coded much of the package in C++ with the help of Rcpp and now would like to enable parallelization of this C++ code. I am using the foreach package, however, I am open to switch to snow or a different library if this would work better.
I started by trying to parallelize a simple function:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
arma::vec rNorm_c(int length) {
return arma::vec(length, arma::fill::randn);
}
/*** R
n_workers <- parallel::detectCores(logical = F)
cl <- parallel::makeCluster(n_workers)
doParallel::registerDoParallel(cl)
n <- 10
library(foreach)
foreach(j = rep(n, n),
.noexport = c("rNorm_c"),
packages = "Rcpp") %dopar% {rNorm_c(j)}
*/
I added the .noexport because without, I get the error Error in { : task 1 failed - "NULL value passed as symbol address". This led me to this SO post which suggested doing this.
However, I now receive the error Error in { : task 1 failed - "could not find function "rNorm_c"", presumably because I have not followed the top answers instructions to load the function separately at each node. I am unsure of how to do this.
This SO post demonstrates how to do this by writing the C++ code inline, however, since the C++ code for my package is multiple functions, this is likely not the best solution. This SO post advises to create a local package for the workers to load and make calls to, however, since I am hoping to make this code available in a CRAN package, it does not seem as though a local package would be possible unless I wanted to attempt to publish two CRAN packages.
Any suggestions for how to approach this or references to resources for parallelization of Rcpp code would be appreciated.
EDIT:
I used the above function to create a package called rnormParallelization. In this package, I also included a couple of R functions, one of which made use of the snow package to parallelize a for loop using the rNorm_c function:
rNorm_samples_for <- function(num_samples, length){
sample_mat <- matrix(NA, length, num_samples)
for (j in 1:num_samples){
sample_mat[ , j] <- rNorm_c(length)
}
return(sample_mat)
}
rNorm_samples_snow1 <- function(num_samples, length){
clus <- snow::makeCluster(3)
snow::clusterExport(clus, "rNorm_c")
out <- snow::parSapply(clus, rep(length, num_samples), rNorm_c)
snow::stopCluster(clus)
return(out)
}
Both functions work as expected:
> rNorm_samples_for(2, 3)
[,1] [,2]
[1,] -0.82040308 -0.3284849
[2,] -0.05169948 1.7402912
[3,] 0.32073516 0.5439799
> rNorm_samples_snow1(2, 3)
[,1] [,2]
[1,] -0.07483493 1.3028315
[2,] 1.28361663 -0.4360829
[3,] 1.09040771 -0.6469646
However, the parallelized version works considerably slower:
> microbenchmark::microbenchmark(
+ rnormParallelization::rNorm_samples_for(1e3, 1e4),
+ rnormParallelization::rNorm_samples_snow1(1e3, 1e4)
+ )
Unit: milliseconds
expr min lq
rnormParallelization::rNorm_samples_for(1000, 10000) 217.0871 249.3977
rnormParallelization::rNorm_samples_snow1(1000, 10000) 1242.8315 1397.7643
mean median uq max neval
320.5456 285.9787 325.3447 802.7488 100
1527.0406 1482.5867 1563.0916 3411.5774 100
Here is my session info:
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rnormParallelization_1.0
loaded via a namespace (and not attached):
[1] microbenchmark_1.4-7 compiler_4.1.1 snow_0.4-4
[4] parallel_4.1.1 tools_4.1.1 Rcpp_1.0.7
GitHub repo with both of these scripts

Obtaining an error when running exact code from a blog

I am following a tutorial here. A few days ago I was able to run this code without error and run it on my own data set (it was always a little hit and miss with obtaining this error) - however now I try to run the code and I always obtain the same error.
Error in solve.QP(Dmat, dvec, Amat, bvec = b0, meq = 2) :
constraints are inconsistent, no solution!
I get that the solver cannot solve the equations but I am a little confused as to why it worked previously and now it does not... The author of the article has this code working...
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals <- sapply(er_vals, function(er) {
op <- portfolio.optim(as.matrix(df), er)
return(op$ps)
})
SessionInfo:
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] lpSolve_5.6.13.1 data.table_1.12.0 tseries_0.10-46 rugarch_1.4-0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 MASS_7.3-51.1 mclust_5.4.2
[4] lattice_0.20-38 quadprog_1.5-5 Rsolnp_1.16
[7] TTR_0.23-4 tools_3.5.3 xts_0.11-2
[10] SkewHyperbolic_0.4-0 GeneralizedHyperbolic_0.8-4 quantmod_0.4-13.1
[13] spd_2.0-1 grid_3.5.3 KernSmooth_2.23-15
[16] yaml_2.2.0 numDeriv_2016.8-1 Matrix_1.2-15
[19] nloptr_1.2.1 DistributionUtils_0.6-0 ks_1.11.3
[22] curl_3.3 compiler_3.5.3 expm_0.999-3
[25] truncnorm_1.0-8 mvtnorm_1.0-8 zoo_1.8-4

tseries::portfolio.optim disallows short selling by default, see argument short. If short = FALSE asset weights may not go below 0. And as the weights must sum up to 1, also no individual asset weight could be above 1. There's no leverage.
(Possibly, in an earlier version of tseries default could have been short = TRUE. This would explain why it previously worked for you.)
Your target return (pm) cannot exceed the highest return of any of the input assets.
Solution 1: Allow short selling, but remember that that's a different efficient frontier. (For reference, see any lecture or book discussing Markowitz optimization. There's a mathematical solution to the problem without short-selling restriction.)
op <- portfolio.optim(as.matrix(df), er, shorts = T)
Solution 2: Limit the target returns between the worst and the best asset's return.
er_vals <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
Here's a plot of the obtained efficient frontiers.
Here's the full script that gives both solutions.
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
# er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
er_vals1 <- seq(from = 0, to = 0.15, length.out = 1000)
er_vals2 <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals1 <- sapply(er_vals1, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = T)
return(op$ps)
})
sd_vals2 <- sapply(er_vals2, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = F)
return(op$ps)
})
plot(x = sd_vals1, y = er_vals1, type = "l", col = "red",
xlab = "sd", ylab = "er",
main = "red: allowing short-selling;\nblue: disallowing short-selling")
lines(x = sd_vals2, y = er_vals2, type = "l", col = "blue")

Can't use SharpeRatio in PortfolioAnalytics to optimize a portfolio

I try to use SharpeRatio has a objective function to optimize my portfolio, but i get the following error:
objective name SharpeRatio generated an error or warning: Error in t(w) %*% M3 : requires numeric/complex matrix/vector arguments
I've searched and it seems that the issue is related to the weights, but i can't find a way to solve it.
The next code replicates the error:
library(PortfolioAnalytics)
data(edhec)
asset_names <- colnames(edhec)
port_spec <- portfolio.spec(asset_names)
port_spec <- add.constraint(portfolio = port_spec, type = "weight_sum", min_sum = 0.99, max_sum = 1.01)
port_spec <- add.constraint(portfolio = port_spec, type = "long_only")
port_spec <- add.objective(portfolio = port_spec, type = "return", name = "SharpeRatio", FUN = "StdDev")
opt_DE <- optimize.portfolio(R = edhec, portfolio = port_spec, optimize_method = "DEoptim", search_size=5000, trace = TRUE, traceDE = 0)
Has requested, sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] FactoMineR_1.39 nFactors_2.3.3 lattice_0.20-35
[4] boot_1.3-20 psych_1.7.8 MASS_7.3-47
[7] PortfolioAnalytics_1.0.3636 PerformanceAnalytics_1.4.3541 foreach_1.4.4
[10] xts_0.10-1 zoo_1.8-0
loaded via a namespace (and not attached):
[1] cluster_2.0.6 leaps_3.0 mnormt_1.5-5 scatterplot3d_0.3-40
[5] quadprog_1.5-5 ROI_0.3-0 TTR_0.23-2 tools_3.4.3
[9] quantmod_0.4-12 parallel_3.4.3 grid_3.4.3 nlme_3.1-131
[13] registry_0.5 iterators_1.0.9 yaml_2.1.16 GenSA_1.1.7
[17] codetools_0.2-15 curl_3.1 slam_0.1-42 ROI.plugin.quadprog_0.2-5
[21] compiler_3.4.3 flashClust_1.01-2 DEoptim_2.2-4 foreign_0.8-69

I would recommend checking out the PortfolioAnalytics Demo files. One of them in particular
Demo Max Sharpe Ratio:
https://github.com/R-Finance/PortfolioAnalytics/blob/master/demo/demo_max_Sharpe.R
will be particularly useful to reference. After reading through some of the code and comments, you will see a few things. First you specified conflicting arguments, e.g. type = "return", name = "SharpeRatio", FUN = "StdDev".
"return" is a type of constraint, "StdDev" is a name of a "risk" constraint, and "SharpeRatio" is what you are trying to solve for.
If you use the "ROI" method to optimize, you would need to specify that you want to maximize the Sharpe Ratio in the optimization "maxSR=TRUE" if you want to use the "DEOptim" optimization method, you need to relax your leverage constraints.
Examples of each can be found below. They are directly taken from the referenced demo file above.
library(PortfolioAnalytics)
# Examples of solving optimization problems to maximize mean return per unit StdDev
data(edhec)
R <- edhec[, 1:8]
funds <- colnames(R)
# Construct initial portfolio
init.portf <- portfolio.spec(assets=funds)
init.portf <- add.constraint(portfolio=init.portf, type="full_investment")
init.portf <- add.constraint(portfolio=init.portf, type="long_only")
init.portf <- add.objective(portfolio=init.portf, type="return", name="mean")
init.portf <- add.objective(portfolio=init.portf, type="risk", name="StdDev")
init.portf
# The default action if "mean" and "StdDev" are specified as objectives with
# optimize_method="ROI" is to maximize quadratic utility. If we want to maximize
# Sharpe Ratio, we need to pass in maxSR=TRUE to optimize.portfolio.
maxSR.lo.ROI <- optimize.portfolio(R=R, portfolio=init.portf,
optimize_method="ROI",
maxSR=TRUE, trace=TRUE)
maxSR.lo.ROI
# Although the maximum Sharpe Ratio objective can be solved quickly and accurately
# with optimize_method="ROI", it is also possible to solve this optimization
# problem using other solvers such as random portfolios or DEoptim. These
# solvers have the added flexibility of using different methods to calculate
# the Sharpe Ratio (e.g. we could specify annualized measures of risk and return).
# For random portfolios and DEoptim, the leverage constraints should be
# relaxed slightly.
init.portf$constraints[[1]]$min_sum=0.99
init.portf$constraints[[1]]$max_sum=1.01
# Use DEoptim
maxSR.lo.DE <- optimize.portfolio(R=R, portfolio=init.portf,
optimize_method="DEoptim",
search_size=2000,
trace=TRUE)
Hopefully this helps; typically I find that many of the more complex packages in R will have demo files to help get you started.

Quantstrat WFA with intraday Data

I've been getting WFA to run on the full set of intraday GBPUSD 30min data, and have come across a couple of things that need addressing. The first is I believe the save function needs changing to remove the time from the string (as shown here as a pull request on the R-Finance/quantstrat repo on github). The walk.forward function throws this error:
Error in gzfile(file, "wb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "wb") :
cannot open compressed file 'wfa.GBPUSD.2002-10-21 00:30:00.2002-10-23 23:30:00.RData', probable reason 'Invalid argument'
The second is a rare case scenario where its ends up calling runSum on a data set with less rows than the period you are testing (n). This is the traceback():
8: stop("Invalid 'n'")
7: runSum(x, n)
6: runMean(x, n)
5: (function (x, n = 10, ...)
{
ma <- runMean(x, n)
if (!is.null(dim(ma))) {
colnames(ma) <- "SMA"
}
return(ma)
})(x = Cl(mktdata)[, 1], n = 25)
4: do.call(indFun, .formals)
3: applyIndicators(strategy = strategy, mktdata = mktdata, parameters = parameters,
...)
2: applyStrategy(strategy, portfolios = portfolio.st, mktdata = symbol[testing.timespan]) at custom.walk.forward.R#122
1: walk.forward(strategy.st, paramset.label = "WFA", portfolio.st = portfolio.st,
account.st = account.st, period = "days", k.training = 3,
k.testing = 1, obj.func = my.obj.func, obj.args = list(x = quote(result$apply.paramset)),
audit.prefix = "wfa", anchored = FALSE, verbose = TRUE)
The extended GBPUSD data used in the creation of the Luxor Demo includes an erroneous date (2002/10/27) with only 1 observation which causes this problem. I can also foresee this being an issue when testing longer signal periods on instruments like Crude where they have only a few trading hours on Sunday evenings (UTC).
Given that I have purely been following the Luxor demo with the same (extended) intra-day data set, are these genuine issues or have they been caused by package updates etc?
What is the preferred way for these things to be reported to the authors of QS, and find out if/when fixes are likely to be made?
SessionInfo():
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] quantstrat_0.9.1739 foreach_1.4.3 blotter_0.9.1741 PerformanceAnalytics_1.4.4000 FinancialInstrument_1.2.0 quantmod_0.4-5 TTR_0.23-1
[8] xts_0.9.874 zoo_1.7-13
loaded via a namespace (and not attached):
[1] compiler_3.3.0 tools_3.3.0 codetools_0.2-14 grid_3.3.0 iterators_1.0.8 lattice_0.20-33

quantstrat is on github here:
https://github.com/braverock/quantstrat
Issues and patches should be reported via github issues.

Parallel execution of train in caret fails with function not found

yesterday I updated my R packages and since then parallel execution of the train function fails.
It seems like some functions that are called from within the workers are not available. These functions are at least flatTable and probFunction.
I experiencing this issues on my production machine, and was able to reproduce it on a clean Windows 7 x64 VM.
I added a minimal working example below. Dear users of stackoverflow: Any help is appreciated!
# R 3.0.2 x64, RStudio Version 0.98.490, Windows 7 x64
data(iris)
library(caret) # 6.0-21
library(doParallel) # 1.0.6
model <- "rf"
# Fail
?probFunction
?flatTable
fitControl <- trainControl(
method = "repeatedcv"
, number = 5 ## 5-fold CV
, repeats = 1 ## repeated one times
, verboseIter =TRUE
)
#### Sequential Version ####
# Runs
train(Species ~ ., data = iris, method = model, trControl = fitControl)
#### Parallelized version ####
# Fails with
# Error in e$fun(obj, substitute(ex), parent.frame(), e$data) :
# worker initialization failed: Error in eval(expr, envir, enclos): could not find function "flatTable"
cl <- makeCluster(3)
registerDoParallel(cl)
train(Species ~ ., data = iris, method = model, trControl = fitControl)
stopCluster(cl)
# Fails with
# Error in { : task 1 failed - "could not find function "probFunction""
fitControl <- trainControl(
method = "repeatedcv"
, number = 5 ## 5-fold CV
, repeats = 1 ## repeated one times
, verboseIter =TRUE
, classProbs = TRUE
)
cl <- makeCluster(3)
registerDoParallel(cl)
train(Species ~ ., data = iris, method = model, trControl = fitControl)
stopCluster(cl)
#### Again sequential version ####
# Fails with
# Error in summary.connection(connection) : invalid connection
train(Species ~ ., data = iris, method = model, trControl = fitControl)
R Session Info
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] e1071_1.6-1 class_7.3-9 randomForest_4.6-7 doParallel_1.0.6 iterators_1.0.6
[6] foreach_1.4.1 caret_6.0-21 ggplot2_0.9.3.1 lattice_0.20-23
loaded via a namespace (and not attached):
[1] car_2.0-19 codetools_0.2-8 colorspace_1.2-4 compiler_3.0.2 dichromat_2.0-0
[6] digest_0.6.4 grid_3.0.2 gtable_0.1.2 labeling_0.2 MASS_7.3-29
[11] munsell_0.4.2 nnet_7.3-7 plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5
[16] reshape2_1.2.2 scales_0.2.3 stringr_0.6.2 tools_3.0.2

The error that you're getting is caused by a bug in caret 6.0-21 when using doParallel, doSNOW, and doMPI. It's been fixed in version 6.0-22 in R-forge, but hasn't been released to CRAN yet. If you don't want to wait for the new version to be released, you can:
Downgrade to caret 5.x
Install caret 6.0-22 from R-forge
Install and use doSNOW 1.0.10 from R-forge rather than doParallel
The problem was caused by a change in CRAN policy that forbids the use of the ::: operator, even when referencing non-exported functions from within the same package.
Update
Caret 6.0-22 was released to CRAN on 2014-01-18. This should resolve the reported problem using caret with doSNOW and similar parallel backends.

The first error (could not find function ...) disappears with newer versions, as suggested by #Steve Weston, but the second error (Error in summary.connection(connection) : invalid connection) persists.
With caret version 6.0.84, I could fix it by adding allowParallel = F to the trainControl arguments for the last sequential run.
The last part of the code in the question changes to:
#### Again sequential version (new) ####
fitControl_new <- trainControl(
method = "repeatedcv"
, number = 5
, repeats = 1
, verboseIter =TRUE
, classProbs = TRUE
, allowParallel = F ## add this argument to overwrite the default TRUE
)
train(Species ~ ., data = iris, method = model, trControl = fitControl_new)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R package mlr exhausts memory with multicore - r

Related

Parallelization of Rcpp without inline/ creating a local package

Obtaining an error when running exact code from a blog

Can't use SharpeRatio in PortfolioAnalytics to optimize a portfolio

Quantstrat WFA with intraday Data

Parallel execution of train in caret fails with function not found

Categories

Resources