I'm trying to use rfe function from the caret package in combination with PLS-DA model.
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] splines grid parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] mclust_4.4 Kendall_2.2 doBy_4.5-13 survival_2.37-7 statmod_1.4.20
[6] preprocessCore_1.26.1 sva_3.10.0 mgcv_1.8-4 nlme_3.1-119 corpcor_1.6.7
[11] car_2.0-22 reshape2_1.4.1 gplots_2.16.0 DMwR_0.4.1 mi_0.09-19
[16] arm_1.7-07 lme4_1.1-7 Matrix_1.1-5 MASS_7.3-37 randomForest_4.6-10
[21] plyr_1.8.1 pls_2.4-3 caret_6.0-41 ggplot2_1.0.0 lattice_0.20-29
[26] pcaMethods_1.54.0 Rcpp_0.11.4 Biobase_2.24.0 BiocGenerics_0.10.0
loaded via a namespace (and not attached):
[1] abind_1.4-0 bitops_1.0-6 boot_1.3-14 BradleyTerry2_1.0-5 brglm_0.5-9 caTools_1.17.1
[7] class_7.3-11 coda_0.16-1 codetools_0.2-10 colorspace_1.2-4 compiler_3.1.1 digest_0.6.8
[13] e1071_1.6-4 foreach_1.4.2 foreign_0.8-62 gdata_2.13.3 gtable_0.1.2 gtools_3.4.1
[19] iterators_1.0.7 KernSmooth_2.23-13 minqa_1.2.4 munsell_0.4.2 nloptr_1.0.4 nnet_7.3-8
[25] proto_0.3-10 quantmod_0.4-3 R2WinBUGS_2.1-19 ROCR_1.0-5 rpart_4.1-8 scales_0.2.4
[31] stringr_0.6.2 tools_3.1.1 TTR_0.22-0 xts_0.9-7 zoo_1.7-11
To practice I ran the following example using the iris data.
data(iris)
subsets <- 2:4
ctrl <- rfeControl(functions = caretFuncs, method = 'cv', number = 5, verbose=TRUE)
trctrl <- trainControl(method='cv', number=5)
mod <- rfe(Species ~., data = iris, sizes = subsets, rfeControl = ctrl, trControl = trctrl, method = 'pls')
All works well.
mod
Recursive feature selection
Outer resampling method: Cross-Validated (5 fold)
Resampling performance over subset size:
Variables Accuracy Kappa AccuracySD KappaSD Selected
2 0.6533 0.48 0.02981 0.04472
3 0.8067 0.71 0.06412 0.09618 *
4 0.7867 0.68 0.07674 0.11511
The top 3 variables (out of 3):
Sepal.Width, Petal.Length, Sepal.Length
However, if I try to replicate this on data I have generated I get the following error. I can't work out why! If you have any ideas I'd be really interested in hearing them.
x <- as.data.frame(matrix(0,10,10))
for(i in 1:9) {x[,i] <- rnorm(10,0,1)}
x[,10] <- as.factor(rbinom(10, 1, 0.5))
subsets <- 2:9
ctrl <- rfeControl(functions = caretFuncs, method = 'cv', number = 5, verbose=TRUE)
trctrl <- trainControl(method='cv', number=5)
mod <- rfe(V10 ~., data = x, sizes = subsets, rfeControl = ctrl, trControl = trctrl, method = 'pls')
Error in { : task 1 failed - "undefined columns selected"
In addition: Warning messages:
1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
2: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
3: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
4: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
5: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
I have worked out (after a lot of to-ing and fro-ing) that levels of the response factor variable have to be characters to combine PLS-DA with RFE in caret.
For example...
x <- data.frame(matrix(rnorm(1000),100,10))
y <- as.factor(c(rep('Positive',40), rep('Negative',60)))
data <- data.frame(x,y)
subsets <- 2:9
ctrl <- rfeControl(functions = caretFuncs, method = 'cv', number = 5, verbose=TRUE)
trctrl <- trainControl(method='cv', number=5)
mod <- rfe(y ~., data, sizes = subsets, rfeControl = ctrl, trControl = trctrl, method = 'pls')
Related
I am following a tutorial here. A few days ago I was able to run this code without error and run it on my own data set (it was always a little hit and miss with obtaining this error) - however now I try to run the code and I always obtain the same error.
Error in solve.QP(Dmat, dvec, Amat, bvec = b0, meq = 2) :
constraints are inconsistent, no solution!
I get that the solver cannot solve the equations but I am a little confused as to why it worked previously and now it does not... The author of the article has this code working...
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals <- sapply(er_vals, function(er) {
op <- portfolio.optim(as.matrix(df), er)
return(op$ps)
})
SessionInfo:
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] lpSolve_5.6.13.1 data.table_1.12.0 tseries_0.10-46 rugarch_1.4-0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 MASS_7.3-51.1 mclust_5.4.2
[4] lattice_0.20-38 quadprog_1.5-5 Rsolnp_1.16
[7] TTR_0.23-4 tools_3.5.3 xts_0.11-2
[10] SkewHyperbolic_0.4-0 GeneralizedHyperbolic_0.8-4 quantmod_0.4-13.1
[13] spd_2.0-1 grid_3.5.3 KernSmooth_2.23-15
[16] yaml_2.2.0 numDeriv_2016.8-1 Matrix_1.2-15
[19] nloptr_1.2.1 DistributionUtils_0.6-0 ks_1.11.3
[22] curl_3.3 compiler_3.5.3 expm_0.999-3
[25] truncnorm_1.0-8 mvtnorm_1.0-8 zoo_1.8-4
tseries::portfolio.optim disallows short selling by default, see argument short. If short = FALSE asset weights may not go below 0. And as the weights must sum up to 1, also no individual asset weight could be above 1. There's no leverage.
(Possibly, in an earlier version of tseries default could have been short = TRUE. This would explain why it previously worked for you.)
Your target return (pm) cannot exceed the highest return of any of the input assets.
Solution 1: Allow short selling, but remember that that's a different efficient frontier. (For reference, see any lecture or book discussing Markowitz optimization. There's a mathematical solution to the problem without short-selling restriction.)
op <- portfolio.optim(as.matrix(df), er, shorts = T)
Solution 2: Limit the target returns between the worst and the best asset's return.
er_vals <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
Here's a plot of the obtained efficient frontiers.
Here's the full script that gives both solutions.
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
# er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
er_vals1 <- seq(from = 0, to = 0.15, length.out = 1000)
er_vals2 <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals1 <- sapply(er_vals1, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = T)
return(op$ps)
})
sd_vals2 <- sapply(er_vals2, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = F)
return(op$ps)
})
plot(x = sd_vals1, y = er_vals1, type = "l", col = "red",
xlab = "sd", ylab = "er",
main = "red: allowing short-selling;\nblue: disallowing short-selling")
lines(x = sd_vals2, y = er_vals2, type = "l", col = "blue")
I try to use SharpeRatio has a objective function to optimize my portfolio, but i get the following error:
objective name SharpeRatio generated an error or warning: Error in t(w) %*% M3 : requires numeric/complex matrix/vector arguments
I've searched and it seems that the issue is related to the weights, but i can't find a way to solve it.
The next code replicates the error:
library(PortfolioAnalytics)
data(edhec)
asset_names <- colnames(edhec)
port_spec <- portfolio.spec(asset_names)
port_spec <- add.constraint(portfolio = port_spec, type = "weight_sum", min_sum = 0.99, max_sum = 1.01)
port_spec <- add.constraint(portfolio = port_spec, type = "long_only")
port_spec <- add.objective(portfolio = port_spec, type = "return", name = "SharpeRatio", FUN = "StdDev")
opt_DE <- optimize.portfolio(R = edhec, portfolio = port_spec, optimize_method = "DEoptim", search_size=5000, trace = TRUE, traceDE = 0)
Has requested, sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] FactoMineR_1.39 nFactors_2.3.3 lattice_0.20-35
[4] boot_1.3-20 psych_1.7.8 MASS_7.3-47
[7] PortfolioAnalytics_1.0.3636 PerformanceAnalytics_1.4.3541 foreach_1.4.4
[10] xts_0.10-1 zoo_1.8-0
loaded via a namespace (and not attached):
[1] cluster_2.0.6 leaps_3.0 mnormt_1.5-5 scatterplot3d_0.3-40
[5] quadprog_1.5-5 ROI_0.3-0 TTR_0.23-2 tools_3.4.3
[9] quantmod_0.4-12 parallel_3.4.3 grid_3.4.3 nlme_3.1-131
[13] registry_0.5 iterators_1.0.9 yaml_2.1.16 GenSA_1.1.7
[17] codetools_0.2-15 curl_3.1 slam_0.1-42 ROI.plugin.quadprog_0.2-5
[21] compiler_3.4.3 flashClust_1.01-2 DEoptim_2.2-4 foreign_0.8-69
I would recommend checking out the PortfolioAnalytics Demo files. One of them in particular
Demo Max Sharpe Ratio:
https://github.com/R-Finance/PortfolioAnalytics/blob/master/demo/demo_max_Sharpe.R
will be particularly useful to reference. After reading through some of the code and comments, you will see a few things. First you specified conflicting arguments, e.g. type = "return", name = "SharpeRatio", FUN = "StdDev".
"return" is a type of constraint, "StdDev" is a name of a "risk" constraint, and "SharpeRatio" is what you are trying to solve for.
If you use the "ROI" method to optimize, you would need to specify that you want to maximize the Sharpe Ratio in the optimization "maxSR=TRUE" if you want to use the "DEOptim" optimization method, you need to relax your leverage constraints.
Examples of each can be found below. They are directly taken from the referenced demo file above.
library(PortfolioAnalytics)
# Examples of solving optimization problems to maximize mean return per unit StdDev
data(edhec)
R <- edhec[, 1:8]
funds <- colnames(R)
# Construct initial portfolio
init.portf <- portfolio.spec(assets=funds)
init.portf <- add.constraint(portfolio=init.portf, type="full_investment")
init.portf <- add.constraint(portfolio=init.portf, type="long_only")
init.portf <- add.objective(portfolio=init.portf, type="return", name="mean")
init.portf <- add.objective(portfolio=init.portf, type="risk", name="StdDev")
init.portf
# The default action if "mean" and "StdDev" are specified as objectives with
# optimize_method="ROI" is to maximize quadratic utility. If we want to maximize
# Sharpe Ratio, we need to pass in maxSR=TRUE to optimize.portfolio.
maxSR.lo.ROI <- optimize.portfolio(R=R, portfolio=init.portf,
optimize_method="ROI",
maxSR=TRUE, trace=TRUE)
maxSR.lo.ROI
# Although the maximum Sharpe Ratio objective can be solved quickly and accurately
# with optimize_method="ROI", it is also possible to solve this optimization
# problem using other solvers such as random portfolios or DEoptim. These
# solvers have the added flexibility of using different methods to calculate
# the Sharpe Ratio (e.g. we could specify annualized measures of risk and return).
# For random portfolios and DEoptim, the leverage constraints should be
# relaxed slightly.
init.portf$constraints[[1]]$min_sum=0.99
init.portf$constraints[[1]]$max_sum=1.01
# Use DEoptim
maxSR.lo.DE <- optimize.portfolio(R=R, portfolio=init.portf,
optimize_method="DEoptim",
search_size=2000,
trace=TRUE)
Hopefully this helps; typically I find that many of the more complex packages in R will have demo files to help get you started.
I am completing the exercises from Applied Predictive Modeling, the R textbook for the caret package, by the authors. I cannot get the train function to work with methods M5P or M5Rules.
The code will run fine manually:
data("permeability")
trainIndex <- createDataPartition(permeability[, 1], p = 0.75,
list = FALSE)
fingerNZV <- nearZeroVar(fingerprints, saveMetrics = TRUE)
trainY <- permeability[trainIndex, 1]
testY <- permeability[-trainIndex, 1]
trainX <- fingerprints[trainIndex, !fingerNZV$nzv]
testX <- fingerprints[-trainIndex, !fingerNZV$nzv]
indx <- createFolds(trainY, k = 10, returnTrain = TRUE)
ctrl <- trainControl('cv', index = indx)
m5Tuner <- t(as.matrix(expand.grid(
N = c(1, 0),
U = c(1, 0),
M = floor(seq(4, 15, length.out = 3))
)))
startTime <- Sys.time()
m5Tune <- foreach(tuner = m5Tuner) %do% {
m5ctrl <- Weka_control(M = tuner[3],
N = tuner[1] == 1,
U = tuner[2] == 1)
mods <- lapply(ctrl$index,function(fold) {
d <- cbind(data.frame(permeability = trainY[fold]),
trainX[fold, ])
mod <- M5P(permeability ~ ., d, control = m5ctrl)
rmse <- RMSE(predict(mod, as.data.frame(trainX[-fold, ])),
trainY[-fold])
list(model = mod, rmse = rmse)
})
mean_rmse <- mean(sapply(mods, '[[', 'rmse'))
list(models = mods, mean_rmse = mean_rmse)
}
endTime <- Sys.time()
endTime - startTime
# Time difference of 59.17742 secs
The same data and controls (swapping 'rules' for 'M' -why can't I specify M as a tuning parameter?) will not finish:
m5Tuner <- expand.grid(
pruned = c("Yes", "No"),
smoothed = c("Yes", "No"),
rules = c("Yes", "No")
)
m5Tune <- train(trainX, trainY,
method = 'M5',
trControl = ctrl,
tuneGrid = m5Tuner,
control = Weka_control(M = 10))
The example from the book will not finish, either:
library(caret)
data(solubility)
set.seed(100)
indx <- createFolds(solTrainY, returnTrain = TRUE)
ctrl <- trainControl(method = "cv", index = indx)
set.seed(100)
m5Tune <- train(x = solTrainXtrans, y = solTrainY,
method = "M5",
trControl = ctrl,
control = Weka_control(M = 10))
This may be a problem with the use of a parallel backend with RWeka, for me, at least. My example from above will not finish with %dopar%.
I have run sudo R CMD javareconf before each example and restarted Rstudio.
sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] APMBook_0.0.0.9000 RWeka_0.4-27
[3] caret_6.0-68 ggplot2_2.1.0
[5] lattice_0.20-30 AppliedPredictiveModeling_1.1-6
# dozens others loaded via namespace.
When using parallel processing with train and RWeka models, you should have gotten the error:
In train.default(trainX, trainY, method = "M5", trControl = ctrl, :
Models using Weka will not work with parallel processing with multicore/doMC
The java interface to Weka does not work with multiple workers.
It takes a while but the train call will complete if you don't have workers registered with foreach
Max
I'm using Caret Package to train a model using the "nnet" method. It's working but i need to see weights used in the hidden layers.
This is possible when we use the nnet function directly:
model<-nnet(Data[5:8], Data[4],size=10,maxit=100000,linout=T,decay=0.1)
model$wts
[1] 9.160050e-01 1.184379e+00 -1.201645e+00 1.041427e+00 -2.367287e-03 6.861753e+00 1.223522e+00 -1.875841e+01 -1.233203e-02
[10] 5.281464e-01 -1.605204e+00 1.497933e+00 -2.882815e+00 -1.511277e+01 2.732411e-01 -2.999315e+01 1.498460e-01 -9.405826e-01
[19] -2.800337e+00 9.600647e-02 1.588405e+00 -2.106175e+00 -8.807753e+00 2.762392e+01 2.091118e-01 3.265564e+01 6.516821e-01
[28] 1.304455e-01 -7.633166e+00 1.017017e-02 6.366411e+01 -2.902564e-02 1.376147e-01 -8.353788e+00 6.376588e-04 5.995577e+00
[37] 1.176301e+01 -8.569926e+00 1.971122e+01 -2.358067e-01 3.971781e+01 1.940421e-01 1.755913e-01 -5.817047e+00 1.988909e-03
[46] 1.408106e+00 -1.549250e+00 1.757245e+01 -5.760102e+01 1.001197e+00 -5.493371e+00 4.786298e+00 6.049659e+00 -1.762611e+01
[55] -9.598485e+00 -1.716196e+01 6.477683e+00 -1.971476e+01 4.468062e+00 2.125993e+01 4.683170e+01
How can i see the weights when using the caret package?
mynnetfit <- train(DC ~ T+c+f+h, data = Data1, method = "nnet",
maxit = 1000, tuneGrid = my.grid, trace = T, linout = 1, trControl = ctrl)
The model object mynnetfit has a finalModel component, which is of class nnet. You can then coef(mynnetfit$finalModel) to get the weights of the nodes.
For example
library(caret)
## simulate data
set.seed(1)
dat <- LPH07_2(100, 20)
mod <- train(y ~ ., data=dat, method="nnet", trace=FALSE, linout=TRUE)
coef(mod$finalModel)
b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 i8->h1 i9->h1
-0.7622230 8.5760791 9.6162685 -13.0549859 5.3306854 8.1679126 3.1832575 -5.4354694 4.8410017 -6.3811887
i10->h1 i11->h1 i12->h1 i13->h1 i14->h1 i15->h1 i16->h1 i17->h1 i18->h1 i19->h1
7.0813781 3.4709351 5.6444663 4.2530566 0.6594511 0.5579828 23.5802215 5.0381758 -0.4883967 -13.0613378
i20->h1 i21->h1 i22->h1 i23->h1 i24->h1 i25->h1 i26->h1 i27->h1 i28->h1 i29->h1
11.8905272 -0.2732984 -4.5190578 -2.3095693 0.8891562 1.7922645 -0.4666446 -1.0980723 -4.7742597 -5.1603453
i30->h1 i31->h1 i32->h1 i33->h1 i34->h1 i35->h1 i36->h1 i37->h1 i38->h1 i39->h1
-0.1285864 2.2160653 0.2990097 -5.1722264 -4.8375324 1.4537326 -1.6870400 -2.1019009 1.3542151 0.7036545
i40->h1 b->o h1->o
-2.1592154 10.7700684 -27.4712736
yesterday I updated my R packages and since then parallel execution of the train function fails.
It seems like some functions that are called from within the workers are not available. These functions are at least flatTable and probFunction.
I experiencing this issues on my production machine, and was able to reproduce it on a clean Windows 7 x64 VM.
I added a minimal working example below. Dear users of stackoverflow: Any help is appreciated!
# R 3.0.2 x64, RStudio Version 0.98.490, Windows 7 x64
data(iris)
library(caret) # 6.0-21
library(doParallel) # 1.0.6
model <- "rf"
# Fail
?probFunction
?flatTable
fitControl <- trainControl(
method = "repeatedcv"
, number = 5 ## 5-fold CV
, repeats = 1 ## repeated one times
, verboseIter =TRUE
)
#### Sequential Version ####
# Runs
train(Species ~ ., data = iris, method = model, trControl = fitControl)
#### Parallelized version ####
# Fails with
# Error in e$fun(obj, substitute(ex), parent.frame(), e$data) :
# worker initialization failed: Error in eval(expr, envir, enclos): could not find function "flatTable"
cl <- makeCluster(3)
registerDoParallel(cl)
train(Species ~ ., data = iris, method = model, trControl = fitControl)
stopCluster(cl)
# Fails with
# Error in { : task 1 failed - "could not find function "probFunction""
fitControl <- trainControl(
method = "repeatedcv"
, number = 5 ## 5-fold CV
, repeats = 1 ## repeated one times
, verboseIter =TRUE
, classProbs = TRUE
)
cl <- makeCluster(3)
registerDoParallel(cl)
train(Species ~ ., data = iris, method = model, trControl = fitControl)
stopCluster(cl)
#### Again sequential version ####
# Fails with
# Error in summary.connection(connection) : invalid connection
train(Species ~ ., data = iris, method = model, trControl = fitControl)
R Session Info
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] e1071_1.6-1 class_7.3-9 randomForest_4.6-7 doParallel_1.0.6 iterators_1.0.6
[6] foreach_1.4.1 caret_6.0-21 ggplot2_0.9.3.1 lattice_0.20-23
loaded via a namespace (and not attached):
[1] car_2.0-19 codetools_0.2-8 colorspace_1.2-4 compiler_3.0.2 dichromat_2.0-0
[6] digest_0.6.4 grid_3.0.2 gtable_0.1.2 labeling_0.2 MASS_7.3-29
[11] munsell_0.4.2 nnet_7.3-7 plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5
[16] reshape2_1.2.2 scales_0.2.3 stringr_0.6.2 tools_3.0.2
The error that you're getting is caused by a bug in caret 6.0-21 when using doParallel, doSNOW, and doMPI. It's been fixed in version 6.0-22 in R-forge, but hasn't been released to CRAN yet. If you don't want to wait for the new version to be released, you can:
Downgrade to caret 5.x
Install caret 6.0-22 from R-forge
Install and use doSNOW 1.0.10 from R-forge rather than doParallel
The problem was caused by a change in CRAN policy that forbids the use of the ::: operator, even when referencing non-exported functions from within the same package.
Update
Caret 6.0-22 was released to CRAN on 2014-01-18. This should resolve the reported problem using caret with doSNOW and similar parallel backends.
The first error (could not find function ...) disappears with newer versions, as suggested by #Steve Weston, but the second error (Error in summary.connection(connection) : invalid connection) persists.
With caret version 6.0.84, I could fix it by adding allowParallel = F to the trainControl arguments for the last sequential run.
The last part of the code in the question changes to:
#### Again sequential version (new) ####
fitControl_new <- trainControl(
method = "repeatedcv"
, number = 5
, repeats = 1
, verboseIter =TRUE
, classProbs = TRUE
, allowParallel = F ## add this argument to overwrite the default TRUE
)
train(Species ~ ., data = iris, method = model, trControl = fitControl_new)