caret deepnet produces same value for all predictions - r

I am very new to deep learning. I trained a neural net using the packages deepnet and caret. For this regression problem caretuses a sigmoid function as activation function and a linear one as output function.
I preprocessed the predictors using preprocess = "range" (which I thought normalizes the predictors).
library(caret)
library(deepnet)
set.seed(123, kind = "Mersenne-Twister", normal.kind = "Inversion")
# create data
dat <- as.data.frame(ChickWeight)
dat$vari <- sample(LETTERS, nrow(dat), replace = TRUE)
dat$Chick <- as.character(dat$Chick)
preds <- dat[1:100,2:5]
response <- dat[1:100,1]
vali <- dat[101:150,]
# change format of categorical predictors to one-hot encoded format
dmy <- dummyVars(" ~ .", data = preds)
preds_dummies <- data.frame(predict(dmy, newdata = preds))
# specifiy trainControl for tuning mtry and with specified folds
control <- caret::trainControl(search = "grid", method="repeatedcv", number=3,
repeats=2,
savePred = T)
# tune hyperparameters and build final model
tunegrid <- expand.grid(layer1 = c(5,50),
layer2 = c(0,5,50),
layer3 = c(0,5,50),
hidden_dropout = c(0, 0.1),
visible_dropout = c(0, 0.1))
model <- caret::train(x = preds_dummies,
y = response,
method="dnn",
metric= "RMSE",
tuneGrid=tunegrid,
trControl= control,
preProcess = "range"
)
When I predict using the validation set with the tuned neural network model, it produces only one prediction value despite of various input predictors.
# predict with validation set
# create dummies
dmy <- dummyVars(" ~ .", data = vali)
vali_dummies <- data.frame(predict(dmy, newdata = vali))
vali_dummies <- vali_dummies[,which(names(vali_dummies) %in% model$finalModel$xNames)]
# add empty columns for categorical preds of the one used in the model (to have the same matix)
not_included <- setdiff(model$finalModel$xNames, names(vali_dummies))
vali_add <- as.data.frame(matrix(rep(0, length(not_included)*nrow(vali_dummies)),
nrow = nrow(vali_dummies),
ncol = length(not_included))
)
# change names
names(vali_add) <- not_included
# add to vali_dummies
vali_dummies <- cbind(vali_dummies, vali_add)
# put it in the same order as preds_dummies (sort the columns)
vali_dummies <- vali_dummies[names(preds_dummies)]
# normalize also the validation set
pp = preProcess(vali_dummies, method = c("range"))
vali_dummies <- predict(pp, vali_dummies)
# save obs and pred for predictions with the outer CV out-of-fold test set
temp <- data.frame(obs = vali[,1],
pred = caret::predict.train(object = model, newdata = vali_dummies))
temp
When I am using the Boston data set from the MASS package where no categorical predictors are present, I get slightly different prediction values for all the different input predictors of the validation set.
How can I fix this and create a neural network which predicts "different" predictions when using numeric as well as categorical predictors? What else besides normalization should I try?

Related

How to calibrate probabilities in R?

I am trying to calibrate probabilities that I get with the predict function in the R package.
I have in my case two classes and mutiple predictors. I used the iris dataset as an example for you to try and help me out.
my_data <- iris %>% #reducing the data to have two classes only
dplyr::filter((Species =="virginica" | Species == "versicolor") ) %>% dplyr::select(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)
my_data <- droplevels(my_data)
index <- createDataPartition(y=my_data$Species,p=0.6,list=FALSE)
#creating train and test set for machine learning
Train <- my_data[index,]
Test <- my_data[-index,]
#machine learning based on Train data partition with glmnet method
classCtrl <- trainControl(method = "repeatedcv", number=10,repeats=5,classProbs = TRUE,savePredictions = "final")
set.seed(355)
glmnet_ML <- train(Species~., Train, method= "glmnet", trControl=classCtrl)
glmnet_ML
#probabilities to assign each row of data to one class or the other on Test
predTestprob <- predict(glmnet_ML,Test,type="prob")
pred
#trying out calibration following "Applied predictive modeling" book from Max Kuhn p266-273
predTrainprob <- predict(glmnet_ML,Train,type="prob")
predTest <- predict(glmnet_ML,Test)
predTestprob <- predict(glmnet_ML,Test,type="prob")
Test$PredProb <- predTestprob[,"versicolor"]
Test$Pred <- predTest
Train$PredProb <- predTrainprob[,"versicolor"]
#logistic regression to calibrate
sigmoidalCal <- glm(relevel(Species, ref= "virginica") ~ PredProb,data = Train,family = binomial)
coef(summary(sigmoidalCal))
#predicting calibrated scores
sigmoidProbs <- predict(sigmoidalCal,newdata = Test[,"PredProb", drop = FALSE],type = "response")
Test$CalProb <- sigmoidProbs
#plotting to see if it works
calCurve2 <- calibration(Species ~ PredProb + CalProb, data = Test)
xyplot(calCurve2,auto.key = list(columns = 2))
According to me, the result given by the plot is not good which indicates a mistake in the calibration, the Calprob curve should follow the diagonal but it doe not.
Has anyone done anything similar ?

Accuracy results differ of console and Rmarkdown

I have multiple classification machine learning models with all different accuracy. When I run my xgBOOST (using library(caret)) in the console, I get an accuracy of 0.7586. But when I knit my Rmarkdown, the accuracy of the same model is 0.8621. I have no idea why this is different.
I followed the suggestions of this link, but nothing worked: https://community.rstudio.com/t/console-and-rmd-output-differ-same-program-used-but-the-calculation-gives-a-different-result/67873/3
I also followed the suggestions of problem, but nothing worked: Statistics Result in R Markdown is different from the Knit Output (All Format: Word, HTML, PDF)
At last I tried this, but also nothing worked: sample function gives different result in console and in knitted document when seed is set
Here is my code which I run the same in console and Rmarkdown but with different accuracy:
# Data
data <- data[!is.na(data$var1),]
# Change levels of var1
levels(data$var1)=c("No","Yes")
#Data Preparation and Preprocessing
# Create the training and test datasets
set.seed(100)
# Step 1: Get row numbers for the training data
trainRowNumbers <- createDataPartition(data$var1, p=0.8, list=FALSE)
# Step 2: Create the training dataset
trainset <- data[trainRowNumbers,]
# Step 3: Create the test dataset
testset <- data[-trainRowNumbers,]
# Store Y for later use.
y = trainset$var1
# Create the knn imputation model on the training data
preProcess_missingdata_model <- preProcess(as.data.frame(trainset), method= c("knnImpute"))
preProcess_missingdata_model
# Create the knn imputation model on the testset data
preProcess_missingdata_model_test <- preProcess(as.data.frame(testset), method = c("knnImpute"))
preProcess_missingdata_model_test
# Use the imputation model to predict the values of missing data points
library(RANN) # required for knnInpute
trainset <- predict(preProcess_missingdata_model, newdata = trainset)
anyNA(trainset)
# Use the imputation model to predict the values of missing data points
library(RANN) # required for knnInpute
testset <- predict(preProcess_missingdata_model_test, newdata = testset)
anyNA(testset)
# Append the Y variable
trainset$var1 <- y
# Run algorithms using 5-fold cross validation
control <- trainControl(method="cv",
number=5,
repeats = 5,
savePredictions = "final",
search = "grid",
classProbs = TRUE)
metric <- "Accuracy"
# Make Valid Column Names
colnames(trainset) <- make.names(colnames(trainset))
colnames(testset) <- make.names(colnames(testset))
# xgBOOST
set.seed(7)
fit.xgbDART <- train(var1~., data = trainset, method = "xgbTree", metric = metric, trControl = control, verbose = FALSE, tuneLength = 7, nthread = 1)
# estimate skill of xgBOOST on the testset dataset
predictions <- predict(fit.xgbDART, testset)
cm <- caret::confusionMatrix(predictions, testset$var1, mode='everything')
cm
My RNGKind is:
RNGkind()
[1] "L'Ecuyer-CMRG" "Inversion" "Rejection"
always add the function :
set.seed(544)
This function sets the starting number used to generate a sequence of random numbers – it ensures that you get the same result if you start with that same seed each time you run the same process. For example, if I use the sample() function immediately after setting a seed, I will always get the same sample.
This is my suggestion on where to use set.seed()
# Data
data <- data[!is.na(data$var1),]
# Change levels of var1
levels(data$var1)=c("No","Yes")
#Data Preparation and Preprocessing
# Create the training and test datasets
# Step 1: Get row numbers for the training data
set.seed(100)
trainRowNumbers <- createDataPartition(data$var1, p=0.8, list=FALSE)
# Step 2: Create the training dataset
trainset <- data[trainRowNumbers,]
# Step 3: Create the test dataset
testset <- data[-trainRowNumbers,]
# Store Y for later use.
y = trainset$var1
# Create the knn imputation model on the training data
set.seed(100)
preProcess_missingdata_model <- preProcess(as.data.frame(trainset), method= c("knnImpute"))
preProcess_missingdata_model
# Create the knn imputation model on the testset data
set.seed(100)
preProcess_missingdata_model_test <- preProcess(as.data.frame(testset), method = c("knnImpute"))
preProcess_missingdata_model_test
# Use the imputation model to predict the values of missing data points
library(RANN) # required for knnInpute
trainset <- predict(preProcess_missingdata_model, newdata = trainset)
anyNA(trainset)
# Use the imputation model to predict the values of missing data points
library(RANN) # required for knnInpute
testset <- predict(preProcess_missingdata_model_test, newdata = testset)
anyNA(testset)
# Append the Y variable
trainset$var1 <- y
# Run algorithms using 5-fold cross validation
set.seed(100)
control <- trainControl(method="cv",
number=5,
repeats = 5,
savePredictions = "final",
search = "grid",
classProbs = TRUE)
metric <- "Accuracy"
# Make Valid Column Names
colnames(trainset) <- make.names(colnames(trainset))
colnames(testset) <- make.names(colnames(testset))
# xgBOOST
set.seed(7)
fit.xgbDART <-
train(
var1 ~ .,
data = trainset,
method = "xgbTree",
metric = metric,
trControl = control,
verbose = FALSE,
tuneLength = 7,
nthread = 1
)
# estimate skill of xgBOOST on the testset dataset
predictions <- predict(fit.xgbDART, testset)
cm <- caret::confusionMatrix(predictions, testset$var1, mode='everything')

Listing model coefficients in descending order

I have a dataset with both continuous and categorical variables. I am running regression to predict one of the variables based on the other variables in the dataset. After comparing the results of ridge, lasso and elastic-net regression, the lasso regression is the best model to proceed with.
I used the 'coef' function to extract the model's coefficients, however, the result is a very long list with over 800 variables (as some of my categorical variables have many levels). Is there a way I can quickly rank the coefficients from largest to smallest? This is a glmnet model output
Reproducible problem with example code:
# Libraries Needed
library(caret)
library(glmnet)
library(mlbench)
library(psych)
# Data
data("BostonHousing")
data <- BostonHousing
str(data)
# Data Partition
set.seed(222)
ind <- sample(2, nrow(data), replace = T, prob = c(0.7, 0.3))
train <- data[ind==1,]
test <- data[ind==2,]
# Custom Control Parameters
custom <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
verboseIter = T)
# Linear Model
set.seed(1234)
lm <- train(medv ~.,
train,
method='lm',
trControl = custom)
# Results
lm$results
lm
summary(lm)
plot(lm$finalModel)
# Ridge Regression
set.seed(1234)
ridge <- train(medv ~.,
train,
method = 'glmnet',
tuneGrid = expand.grid(alpha = 0,
lambda = seq(0.0001, 1, length=5)),#try 5 values for lambda between 0.0001 and 1
trControl=custom)
#increasing lambda = increasing penalty and vice versa
#increase lambda therefore will cause coefs to shrink
# Plot Results
plot(ridge)
plot(ridge$finalModel, xvar = "lambda", label = T)
plot(ridge$finalModel, xvar = 'dev', label=T)
plot(varImp(ridge, scale=T))
# Lasso Regression
set.seed(1234)
lasso <- train(medv ~.,
train,
method = 'glmnet',
tuneGrid = expand.grid(alpha=1,
lambda = seq(0.0001,1, length=5)),
trControl = custom)
# Plot Results
plot(lasso)
lasso
plot(lasso$finalModel, xvar = 'lambda', label=T)
plot(lasso$finalModel, xvar = 'dev', label=T)
plot(varImp(lasso, scale=T))
# Elastic Net Regression
set.seed(1234)
en <- train(medv ~.,
train,
method = 'glmnet',
tuneGrid = expand.grid(alpha = seq(0,1,length=10),
lambda = seq(0.0001,1,length=5)),
trControl = custom)
# Plot Results
plot(en)
plot(en$finalModel, xvar = 'lambda', label=T)
plot(en$finalModel, xvar = 'dev', label=T)
plot(varImp(en))
# Compare Models
model_list <- list(LinearModel = lm, Ridge = ridge, Lasso = lasso, ElasticNet=en)
res <- resamples(model_list)
summary(res)
bwplot(res)
xyplot(res, metric = 'RMSE')
# Best Model
en$bestTune
best <- en$finalModel
coef(best, s = en$bestTune$lambda)
For most models all you'd have to do would be:
sort(coef(model), decreasing=TRUE)
Since you're using glmnet it's a little bit more complicated. I'm going to replicate a minimal version of your example here (the other models, plots, etc. are not necessary in order for us to be able to reproduce your problem ...)
## Packages
library(caret)
library(glmnet)
library(mlbench) ## for BostonHousing data
# Data
data("BostonHousing")
data <- BostonHousing
# Data Partition
set.seed(222)
ind <- sample(2, nrow(data), replace = TRUE, prob = c(0.7, 0.3))
train <- data[ind==1,]
test <- data[ind==2,]
# Custom Control Parameters
custom <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
verboseIter = TRUE)
# Elastic Net Regression
set.seed(1234)
en <- train(medv ~.,
train,
method = 'glmnet',
tuneGrid = expand.grid(alpha = seq(0,1,length=10),
lambda = seq(0.0001,1,length=5)),
trControl = custom)
# Best Model
best <- en$finalModel
coefs <- coef(best, s = en$bestTune$lambda)
(This could probably be made simpler: for example, do you really need the custom control parameters to show us the example? This would be even simpler without using caret - just using `glmnet - but I was afraid I might leave something out.)
Once you've got the coefficients, sorting does appear to work, albeit with a message about possible inefficiency:
sort(coefs, decreasing=TRUE)
## <sparse>[ <logic> ] : .M.sub.i.logical() maybe inefficient
## [1] 25.191049410 5.078589706 1.389548822 0.244605193 0.045600250
## [6] 0.008840485 0.004372752 -0.012701593 -0.028337745 -0.162794401
## [11] -0.335062819 -0.901475516 -1.395091095 -12.632336419
sort(as.numeric(coefs)) also appears to work fine.
If you want to sort the entire matrix (i.e. keeping the values for all penalization levels), you can take advantage of the fact that the penalization doesn't change the rank-order of the parameters:
coeftab <-coef(best)
lastvals <- coeftab[,ncol(coeftab)]
coeftab_s <- coeftab[order(lastvals,decreasing=TRUE),]
## plot, leaving out the intercept
matplot(t(coeftab_s)[,-1],type="l")

Caret returns different predictions with caret train object than it does with the extracted final model

I prefer to use caret when fitting models because of its relative speed and preprocessing capabilities. However, I'm slightly confused on how it makes predictions. When comparing predictions made directly from the train object and predictions made from the extracted final model, I'm seeing very different numbers. The predictions from the train object appear to be more accurate.
library(caret)
library(ranger)
x1 <- rnorm(100)
x2 <- rbeta(100, 1, 1)
y <- 2*x1 + x2 + 5*x1*x2
data <- data.frame(x1, x2, y)
fitRanger <- train(y ~ x1 + x2, data = data,
method = 'ranger',
tuneLength = 1,
preProcess = c('knnImpute', 'center', 'scale'))
predict.data <- data.frame(x1 = rnorm(10), x2 = rbeta(10, 1, 1))
prediction1 <- predict(fitRanger, newdata = predict.data)
prediction2 <- predict(fitRanger$finalModel, data = predict.data)$prediction
results <- data.frame(prediction1, prediction2)
results
I'm positive it has something to do with how I preprocess the data in the train object, but even when I preprocess the test data and use the Ranger model to make predictions, the values are different
predict.data.processed <- predict.data %>%
preProcess(method = c('knnImpute',
'center',
'scale')) %>% .$data
results3 <- predict(fitRanger$finalModel, data = predict.data.processed)$prediction
results <- cbind(results, results3)
results
I want to extract the predictions from each individual tree in the ranger model, which I can't do in caret. Any thoughts?
In order to get the same predictions from the final model as with caret train you should pre-process the data in the same way. Using your example with set.seed(1):
caret predict:
prediction1 <- predict(fitRanger,
newdata = predict.data)
ranger predict on the final model. caret pre process was used on predict.data
prediction2 <- predict(fitRanger$finalModel,
data = predict(fitRanger$preProcess,
predict.data))$prediction
all.equal(prediction1,
prediction2)
#output
TRUE

R caret held-out sample and testing set ROC

I am building two different classifiers to predict a binary out come. Then I want to compare the results of the two models by using a ROC curve and the area under it (AUC).
I split the data set into a training and testing set. On the training set I perform a form of cross-validation. From the held-out samples of the cross validation I am able to build a ROC curve per model. Then I use the models on the testing set and build another set of ROC curves.
The results are contradictory which is confusing me. I am not sure which result is the correct one or if I am doing something completely wrong. The held-out sample ROC curve shows that RF is the better model and the training set ROC curve shows that SVM is the better model.
Analysis
library(ggplot2)
library(caret)
library(pROC)
library(ggthemes)
library(plyr)
library(ROCR)
library(reshape2)
library(gridExtra)
my_data <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
str(my_data)
names(my_data)[1] <- "Class"
my_data$Class <- ifelse(my_data$Class == 1, "event", "noevent")
my_data$Class <- factor(emr$Class, levels = c("noevent", "event"), ordered = TRUE)
set.seed(1732)
ind <- createDataPartition(my_data$Class, p = 2/3, list = FALSE)
train <- my_data[ ind,]
test <- my_data[-ind,]
Next I train two models: Random Forest and SVM. Here I also use Max Kuhns function to get the averaged ROC curves from held-out samples for both models and save those results into a another data.frame along with the AUC from the curves.
#Train RF
ctrl <- trainControl(method = "repeatedcv",
number = 5,
repeats = 3,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
grid <- data.frame(mtry = seq(1,3,1))
set.seed(1537)
rf_mod <- train(Class ~ .,
data = train,
method = "rf",
metric = "ROC",
tuneGrid = grid,
ntree = 1000,
trControl = ctrl)
rfClasses <- predict(rf_mod, test)
#This is the ROC curve from held out samples. Source is from Max Kuhns 2016 UseR! code here: https://github.com/topepo/useR2016
roc_train <- function(object, best_only = TRUE, ...) {
lvs <- object$modelInfo$levels(object$finalModel)
if(best_only) {
object$pred <- merge(object$pred, object$bestTune)
}
## find tuning parameter names
p_names <- as.character(object$modelInfo$parameters$parameter)
p_combos <- object$pred[, p_names, drop = FALSE]
## average probabilities across resamples
object$pred <- plyr::ddply(.data = object$pred,
.variables = c("obs", "rowIndex", p_names),
.fun = function(dat, lvls = lvs) {
out <- mean(dat[, lvls[1]])
names(out) <- lvls[1]
out
})
make_roc <- function(x, lvls = lvs, nms = NULL, ...) {
out <- pROC::roc(response = x$obs,
predictor = x[, lvls[1]],
levels = rev(lvls))
out$model_param <- x[1,nms,drop = FALSE]
out
}
out <- plyr::dlply(.data = object$pred,
.variables = p_names,
.fun = make_roc,
lvls = lvs,
nms = p_names)
if(length(out) == 1) out <- out[[1]]
out
}
temp <- roc_train(rf_mod)
plot_data_ROC <- data.frame(Model='Random Forest', sens = temp$sensitivities, spec=1-temp$specificities)
#This is the AUC of the held-out samples roc curve for RF
auc.1 <- abs(sum(diff(1-temp$specificities) * (head(temp$sensitivities,-1)+tail(temp$sensitivities,-1)))/2)
#Build SVM
set.seed(1537)
svm_mod <- train(Class ~ .,
data = train,
method = "svmRadial",
metric = "ROC",
trControl = ctrl)
svmClasses <- predict(svm_mod, test)
#ROC curve into df
temp <- roc_train(svm_mod)
plot_data_ROC <- rbind(plot_data_ROC, data.frame(Model='Support Vector Machine', sens = temp$sensitivities, spec=1-temp$specificities))
#This is the AUC of the held-out samples roc curve for SVM
auc.2 <- abs(sum(diff(1-temp$specificities) * (head(temp$sensitivities,-1)+tail(temp$sensitivities,-1)))/2)
Next I will plot the results
#Plotting Final
#ROC of held-out samples
q <- ggplot(data=plot_data_ROC, aes(x=spec, y=sens, group = Model, colour = Model))
q <- q + geom_path() + geom_abline(intercept = 0, slope = 1) + xlab("False Positive Rate (1-Specificity)") + ylab("True Positive Rate (Sensitivity)")
q + theme(axis.line = element_line(), axis.text=element_text(color='black'),
axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text())
#ROC of testing set
rf.probs <- predict(rf_mod, test,type="prob")
pr <- prediction(rf.probs$event, factor(test$Class, levels = c("noevent", "event"), ordered = TRUE))
pe <- performance(pr, "tpr", "fpr")
roc.data <- data.frame(Model='Random Forest',fpr=unlist(pe#x.values), tpr=unlist(pe#y.values))
svm.probs <- predict(svm_mod, test,type="prob")
pr <- prediction(svm.probs$event, factor(test$Class, levels = c("noevent", "event"), ordered = TRUE))
pe <- performance(pr, "tpr", "fpr")
roc.data <- rbind(roc.data, data.frame(Model='Support Vector Machine',fpr=unlist(pe#x.values), tpr=unlist(pe#y.values)))
q <- ggplot(data=roc.data, aes(x=fpr, y=tpr, group = Model, colour = Model))
q <- q + geom_line() + geom_abline(intercept = 0, slope = 1) + xlab("False Positive Rate (1-Specificity)") + ylab("True Positive Rate (Sensitivity)")
q + theme(axis.line = element_line(), axis.text=element_text(color='black'),
axis.title = element_text(colour = 'black'), legend.text=element_text(), legend.title=element_text())
#AUC of hold out samples
data.frame(Rf = auc.1, Svm = auc.2)
#AUC of testing set. Source is from Max Kuhns 2016 UseR! code here: https://github.com/topepo/useR2016
test_pred <- data.frame(Class = factor(test$Class, levels = c("noevent", "event"), ordered = TRUE))
test_pred$Rf <- predict(rf_mod, test, type = "prob")[, "event"]
test_pred$Svm <- predict(svm_mod, test, type = "prob")[, "event"]
get_auc <- function(pred, ref){
auc(roc(ref, pred, levels = rev(levels(ref))))
}
apply(test_pred[, -1], 2, get_auc, ref = test_pred$Class)
The results from the held-out samples and from the testing set are totally different (I know they will be different but by this much?).
Rf Svm
0.656044 0.5983193
Rf Svm
0.6326531 0.6453428
From the held-out samples one would choose the RF model but from the testing set one would pick the SVM model.
Which is the "correct" or "better" way to chose the model?
Am I making a big mistake somewhere or not understanding something correctly?
If I understand correctly then you have 3 labeled data sets:
Training
Hold-out CV sample from training
"Testing" CV sample
While, yes, under a hold-out sample CV strategy you normally choose your model based on the hold-out sample, you also don't normally also have a larger validation data sample.
Clearly, if both the hold-out and the Testing data sets are (a) labeled and (b) as close to the level of orthogonality as possible from from the training data, then you'd choose your model based on whichever has the larger sample size.
In your case it looks like what you're calling the hold-out sample is just the repeated CV resampling from training. That being the case you have even more reason to prefer the results from the Testing data set validation. See Steffen's related note on repeated CV.
In theory Random Forest's bagging has a inherit form of cross-validation through the OOB stats and the CV conducted within the training phase should give you some measure of validation. However, in practice it's common to observe a lack of orthogonality and an increased likelihood of overfitting since the samples are coming from the training data itself and may be reinforcing the mistake of overfitting for accuracy.
I can explain that theoretically as above to some extent, then beyond that I just have to tell you that empirically I've found that the performance results from the so-called CV and OOB error calculated from the training data can be highly misleading and the true hold-out (Testing) data that was never touched during training is the far better validation.
Your true hold-out sample is the Testing data set, since none of its data is using during training. Use those results.

Resources