How to calculate AUC under twoClassSummary? - r

here is my code:
train <- data.frame(***contain label, feature group 1 and feature group 2***)
formula <- label ~ features group 1
ctrl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
summaryFunction = twoClassSummary,
classProbs = T)
fit <- train(formula,
data = train,
method = "glm",
metric = "ROC",
trControl = ctrl,
na.action = na.omit)
pred <- predict(fit, train)
my question is: How to calculate the AUC of pred?
I ve tried prSummary, ROCR and pROC, didn't work, it seems like that I can not calculate AUC when both of the obs and pred are exactly the same(levels-wise).
I m wondering if I can train with AUC as metric, how can I can't show the AUC?
p.s.
> levels(train$label)
[1] "classA" "classB"
> levels(as.factor(pred))
[1] "classA" "classB"
btw what I am doing is: fit multiple algo with caret and rank them by AUC, then I can choose the optimal one (base on AUC).
*reproducible example:
train set: iris
feature g1: first 2 features
feature g2: last 2 features
seed: 123*

this one could be the possible answer but I am not so sure if it s right, tell me if i m wrong.
response = as.factor(as.numeric(train$label))
predictor = as.vector(as.numeric(pred))
library(pROC)
result = as.numeric(roc(response, predictor)$auc)
btw since pROC run very slow, can anyone help me to convert this under ROCR package? thx a lot :)

Related

How to fix "The metric "Accuracy" was not in the result set. AUC will be used instead"

I am trying to run a logistic regression on a classification problem
the dependent variable "SUBSCRIBEDYN" is a factor with 2 levels ("Yes" and "No")
train.control <- trainControl(method = "repeatedcv",
number = 10,
repeats = 10,
verboseIter = F,
classProbs = T,
summaryFunction = prSummary)
set.seed(13)
simple.logistic.regression <- caret::train(SUBSCRIBEDYN ~ .,
data = train_data,
method = "glm",
metric = "Accuracy",
trControl = train.control)
simple.logistic.regression`
However, it does not accept Accuracy as a metric
"The metric "Accuracy" was not in the result set. AUC will be used instead"
For a classification model with 2 levels, you should use metric="ROC". metric="Accuracy" is used for multiple classes. However, after training the model, you can use the confusion matrix to retrieve the accuracy, for example using the function confusionMatrix().

Multiple evaluation metrics in classification using caret package [duplicate]

I used caret for logistic regression in R:
ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10,
savePredictions = TRUE)
mod_fit <- train(Y ~ ., data=df, method="glm", family="binomial",
trControl = ctrl)
print(mod_fit)
The default metric printed is accuracy and Cohen kappa. I want to extract the matching metrics like sensitivity, specificity, positive predictive value etc. but I cannot find an easy way to do it. The final model is provided but it is trained on all the data (as far as I can tell from documentation), so I cannot use it for predicting anew.
Confusion matrix calculates all required parameters, but passing it as a summary function doesn't work:
ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10,
savePredictions = TRUE, summaryFunction = confusionMatrix)
mod_fit <- train(Y ~ ., data=df, method="glm", family="binomial",
trControl = ctrl)
Error: `data` and `reference` should be factors with the same levels.
13.
stop("`data` and `reference` should be factors with the same levels.",
call. = FALSE)
12.
confusionMatrix.default(testOutput, lev, method)
11.
ctrl$summaryFunction(testOutput, lev, method)
Is there a way to extract this information in addition to accuracy and kappa, or somehow find it in the train_object returned by the caret train?
Thanks in advance!
Caret already has summary functions to output all the metrics you mention:
defaultSummary outputs Accuracy and Kappa
twoClassSummary outputs AUC (area under the ROC curve - see last line of answer), sensitivity and specificity
prSummary outputs precision and recall
in order to get combined metrics you can write your own summary function which combines the outputs of these three:
library(caret)
MySummary <- function(data, lev = NULL, model = NULL){
a1 <- defaultSummary(data, lev, model)
b1 <- twoClassSummary(data, lev, model)
c1 <- prSummary(data, lev, model)
out <- c(a1, b1, c1)
out}
lets try on the Sonar data set:
library(mlbench)
data("Sonar")
when defining the train control it is important to set classProbs = TRUE since some of these metrics (ROC and prAUC) can not be calculated based on predicted class but based on the predicted probabilities.
ctrl <- trainControl(method = "repeatedcv",
number = 10,
savePredictions = TRUE,
summaryFunction = MySummary,
classProbs = TRUE)
Now fit the model of your choice:
mod_fit <- train(Class ~.,
data = Sonar,
method = "rf",
trControl = ctrl)
mod_fit$results
#output
mtry Accuracy Kappa ROC Sens Spec AUC Precision Recall F AccuracySD KappaSD
1 2 0.8364069 0.6666364 0.9454798 0.9280303 0.7333333 0.8683726 0.8121087 0.9280303 0.8621526 0.10570484 0.2162077
2 31 0.8179870 0.6307880 0.9208081 0.8840909 0.7411111 0.8450612 0.8074942 0.8840909 0.8374326 0.06076222 0.1221844
3 60 0.8034632 0.6017979 0.9049242 0.8659091 0.7311111 0.8332068 0.7966889 0.8659091 0.8229330 0.06795824 0.1369086
ROCSD SensSD SpecSD AUCSD PrecisionSD RecallSD FSD
1 0.04393947 0.05727927 0.1948585 0.03410854 0.12717667 0.05727927 0.08482963
2 0.04995650 0.11053858 0.1398657 0.04694993 0.09075782 0.11053858 0.05772388
3 0.04965178 0.12047598 0.1387580 0.04820979 0.08951728 0.12047598 0.06715206
in this output
ROC is in fact the area under the ROC curve - usually called AUC
and
AUC is the area under the precision-recall curve across all cutoffs.

Additional metrics in caret - PPV, sensitivity, specificity

I used caret for logistic regression in R:
ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10,
savePredictions = TRUE)
mod_fit <- train(Y ~ ., data=df, method="glm", family="binomial",
trControl = ctrl)
print(mod_fit)
The default metric printed is accuracy and Cohen kappa. I want to extract the matching metrics like sensitivity, specificity, positive predictive value etc. but I cannot find an easy way to do it. The final model is provided but it is trained on all the data (as far as I can tell from documentation), so I cannot use it for predicting anew.
Confusion matrix calculates all required parameters, but passing it as a summary function doesn't work:
ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10,
savePredictions = TRUE, summaryFunction = confusionMatrix)
mod_fit <- train(Y ~ ., data=df, method="glm", family="binomial",
trControl = ctrl)
Error: `data` and `reference` should be factors with the same levels.
13.
stop("`data` and `reference` should be factors with the same levels.",
call. = FALSE)
12.
confusionMatrix.default(testOutput, lev, method)
11.
ctrl$summaryFunction(testOutput, lev, method)
Is there a way to extract this information in addition to accuracy and kappa, or somehow find it in the train_object returned by the caret train?
Thanks in advance!
Caret already has summary functions to output all the metrics you mention:
defaultSummary outputs Accuracy and Kappa
twoClassSummary outputs AUC (area under the ROC curve - see last line of answer), sensitivity and specificity
prSummary outputs precision and recall
in order to get combined metrics you can write your own summary function which combines the outputs of these three:
library(caret)
MySummary <- function(data, lev = NULL, model = NULL){
a1 <- defaultSummary(data, lev, model)
b1 <- twoClassSummary(data, lev, model)
c1 <- prSummary(data, lev, model)
out <- c(a1, b1, c1)
out}
lets try on the Sonar data set:
library(mlbench)
data("Sonar")
when defining the train control it is important to set classProbs = TRUE since some of these metrics (ROC and prAUC) can not be calculated based on predicted class but based on the predicted probabilities.
ctrl <- trainControl(method = "repeatedcv",
number = 10,
savePredictions = TRUE,
summaryFunction = MySummary,
classProbs = TRUE)
Now fit the model of your choice:
mod_fit <- train(Class ~.,
data = Sonar,
method = "rf",
trControl = ctrl)
mod_fit$results
#output
mtry Accuracy Kappa ROC Sens Spec AUC Precision Recall F AccuracySD KappaSD
1 2 0.8364069 0.6666364 0.9454798 0.9280303 0.7333333 0.8683726 0.8121087 0.9280303 0.8621526 0.10570484 0.2162077
2 31 0.8179870 0.6307880 0.9208081 0.8840909 0.7411111 0.8450612 0.8074942 0.8840909 0.8374326 0.06076222 0.1221844
3 60 0.8034632 0.6017979 0.9049242 0.8659091 0.7311111 0.8332068 0.7966889 0.8659091 0.8229330 0.06795824 0.1369086
ROCSD SensSD SpecSD AUCSD PrecisionSD RecallSD FSD
1 0.04393947 0.05727927 0.1948585 0.03410854 0.12717667 0.05727927 0.08482963
2 0.04995650 0.11053858 0.1398657 0.04694993 0.09075782 0.11053858 0.05772388
3 0.04965178 0.12047598 0.1387580 0.04820979 0.08951728 0.12047598 0.06715206
in this output
ROC is in fact the area under the ROC curve - usually called AUC
and
AUC is the area under the precision-recall curve across all cutoffs.

Plot ROC curve for bootstrapped caret model

I have a model like the following:
library(mlbench)
data(Sonar)
library(caret)
set.seed(998)
my_data <- Sonar
fitControl <-
trainControl(
method = "boot632",
number = 10,
classProbs = T,
savePredictions = T,
summaryFunction = twoClassSummary
)
model <- train(
Class ~ .,
data = my_data,
method = "xgbTree",
trControl = fitControl,
metric = "ROC"
)
How do I plot the ROC curve for this model? As I understand it, the probabilities must be saved (which I did in trainControl), but because of the random sampling which bootstrapping uses to generate a 'test' set, I am not sure how caret calculates the ROC value and how to generate a curve.
To isolate the class probabilities for the best performing parameters, I am doing:
for (a in 1:length(model$bestTune))
{model$pred <-
model$pred[model$pred[, paste(colnames(model$bestTune)[a])] == model$bestTune[1, a], ]}
Please advise.
Thanks!
First an explanation:
If you are not going to check how each possible hyper parameter combination predicted on each sample in each re-sample you can set savePredictions = "final" in trainControl to save space:
fitControl <-
trainControl(
method = "boot632",
number = 10,
classProbs = T,
savePredictions = "final",
summaryFunction = twoClassSummary
)
after running the model:
model <- train(
Class ~ .,
data = my_data,
method = "xgbTree",
trControl = fitControl,
metric = "ROC"
)
results of interest are in model$pred
here you can check how many samples were tested in each re-sample (I set 25 repetitions)
nrow(model$pred[model$pred$Resample == "Resample01",])
#83
caret always provides prediction from rows not used in the model build.
nrow(my_data) #208
83/208 makes sense for the test samples for boot632
Now to build the ROC curve. You may opt for several options here:
-average the probability for each sample and use that (this is usual for CV since you have all samples repeated the same number of times, but it can be done with boot also).
-plot all as is without averaging
-plot ROC for each re-sample.
I will show you the second approach:
Create a data frame of class probabilities and true outcomes:
for_lift = data.frame(Class = model$pred$obs, xgbTree = model$pred$R)
plot ROC:
pROC::plot.roc(pROC::roc(response = for_lift$Class,
predictor = for_lift$xgbTree,
levels = c("M", "R")),
lwd=1.5)
You can also do this with ggplot, to do so I find it easiest to make a lift object using caret function lift
lift_obj = lift(Class ~ xgbTree, data = for_lift, class = "R")
specify which class the probability was used ^.
library(ggplot2)
ggplot(lift_obj$data)+
geom_line(aes(1-Sp , Sn, color = liftModelVar))+
scale_color_discrete(guide = guide_legend(title = "method"))

How to custom a model in CARET to perform PLS-[Classifer] two-step classificaton model?

This question is a continuation of the same thread here. Below is a minimal working example taken from this book:
Wehrens R. Chemometrics with R multivariate data analysis in the
natural sciences and life sciences. 1st edition. Heidelberg; New York:
Springer. 2011. (page 250).
The example was taken from this book and its package ChemometricsWithR. It highlighted some pitfalls when modeling using cross-validation techniques.
The Aim:
A cross-validated methodology using the same set of repeated CV to perform a known strategy of PLS followed typically by LDA or cousins like logistic regression, SVM, C5.0, CART, with the spirit of caret package. So PLS would be needed every time before calling the waiting classifier in order to classify PLS score space instead of the observations themselves. The nearest approach in the caret package is doing PCA as a pre-processing step before modeling with any classifier. Below is a PLS-LDA procedure with only one cross-validation to test performance of the classifier, there was no 10-fold CV or any repetition. The code below was taken from the mentioned book but with some corrections otherwise throws error:
library(ChemometricsWithR)
data(prostate)
prostate.clmat <- classvec2classmat(prostate.type) # convert Y to a dummy var
odd <- seq(1, length(prostate.type), by = 2) # training
even <- seq(2, length(prostate.type), by = 2) # holdout test
prostate.pls <- plsr(prostate.clmat ~ prostate, ncomp = 16, validation = "CV", subset=odd)
Xtst <- scale(prostate[even,], center = colMeans(prostate[odd,]), scale = apply(prostate[odd,],2,sd))
tst.scores <- Xtst %*% prostate.pls$projection # scores for the waiting trained LDA to test
prostate.ldapls <- lda(scores(prostate.pls)[,1:16],prostate.type[odd]) # LDA for scores
table(predict(prostate.ldapls, new = tst.scores[,1:16])$class, prostate.type[even])
predictionTest <- predict(prostate.ldapls, new = tst.scores[,1:16])$class)
library(caret)
confusionMatrix(data = predictionTest, reference= prostate.type[even]) # from caret
Output:
Confusion Matrix and Statistics
Reference
Prediction bph control pca
bph 4 1 9
control 1 35 7
pca 34 4 68
Overall Statistics
Accuracy : 0.6564
95% CI : (0.5781, 0.7289)
No Information Rate : 0.5153
P-Value [Acc > NIR] : 0.0001874
Kappa : 0.4072
Mcnemar's Test P-Value : 0.0015385
Statistics by Class:
Class: bph Class: control Class: pca
Sensitivity 0.10256 0.8750 0.8095
Specificity 0.91935 0.9350 0.5190
Pos Pred Value 0.28571 0.8140 0.6415
Neg Pred Value 0.76510 0.9583 0.7193
Prevalence 0.23926 0.2454 0.5153
Detection Rate 0.02454 0.2147 0.4172
Detection Prevalence 0.08589 0.2638 0.6503
Balanced Accuracy 0.51096 0.9050 0.6643
However, the confusion matrix didn't match that in the book, anyway the code in the book did break, but this one here worked with me!
Notes:
Although this was only one CV, but the intention is to agree on this methodology first, sd and mean of the train set were applied on the test set, PLUS transformed into PLS scores based a specific number of PC ncomp. I want this to occur every round of the CV in the caret. If the methodology as code is correct here, then it can serve, may be, as a good start for a minimal work example while modifying the code of the caret package.
Side Notes:
It can be very messy with scaling and centering, I think some of the PLS functions in R do scaling internally, with or without centering, I am not sure, so building a custom model in caret should be handled with care to avoid both lack or multiple scalings or centerings (I am on my guards with these things).
Perils of multiple centering/scaling
The code below is just to show how multliple centering/scaling can change the data, only centering is shown here but the same problem with scaling applies too.
set.seed(1)
x <- rnorm(200, 2, 1)
xCentered1 <- scale(x, center=TRUE, scale=FALSE)
xCentered2 <- scale(xCentered1, center=TRUE, scale=FALSE)
xCentered3 <- scale(xCentered2, center=TRUE, scale=FALSE)
sapply (list(xNotCentered= x, xCentered1 = xCentered1, xCentered2 = xCentered2, xCentered3 = xCentered3), mean)
Output:
xNotCentered xCentered1 xCentered2 xCentered3
2.035540e+00 1.897798e-16 -5.603699e-18 -5.332377e-18
Please drop a comment if I am missing something somewhere in this course. Thanks.
If you want to fit these types of models with caret, you would need to use the latest version on CRAN. The last update was created so that people can use non-standard models as they see fit.
My approach below is to jointly fit the PLS and other model (I used random forest in the example below) and tune them at the same time. So for each fold, a 2D grid of ncomp and mtry is used.
The "trick" is to attached the PLS loadings to the random forest object so that they can be used during prediction time. Here is the code that defines the model (classification only):
modelInfo <- list(label = "PLS-RF",
library = c("pls", "randomForest"),
type = "Classification",
parameters = data.frame(parameter = c('ncomp', 'mtry'),
class = c("numeric", 'numeric'),
label = c('#Components',
'#Randomly Selected Predictors')),
grid = function(x, y, len = NULL) {
grid <- expand.grid(ncomp = seq(1, min(ncol(x) - 1, len), by = 1),
mtry = 1:len)
grid <- subset(grid, mtry <= ncomp)
},
loop = NULL,
fit = function(x, y, wts, param, lev, last, classProbs, ...) {
## First fit the pls model, generate the training set scores,
## then attach what is needed to the random forest object to
## be used later
pre <- plsda(x, y, ncomp = param$ncomp)
scores <- pls:::predict.mvr(pre, x, type = "scores")
mod <- randomForest(scores, y, mtry = param$mtry, ...)
mod$projection <- pre$projection
mod
},
predict = function(modelFit, newdata, submodels = NULL) {
scores <- as.matrix(newdata) %*% modelFit$projection
predict(modelFit, scores)
},
prob = NULL,
varImp = NULL,
predictors = function(x, ...) rownames(x$projection),
levels = function(x) x$obsLevels,
sort = function(x) x[order(x[,1]),])
and here is the call to train:
library(ChemometricsWithR)
data(prostate)
set.seed(1)
inTrain <- createDataPartition(prostate.type, p = .90)
trainX <-prostate[inTrain[[1]], ]
trainY <- prostate.type[inTrain[[1]]]
testX <-prostate[-inTrain[[1]], ]
testY <- prostate.type[-inTrain[[1]]]
## These will take a while for these data
set.seed(2)
plsrf <- train(trainX, trainY, method = modelInfo,
preProc = c("center", "scale"),
tuneLength = 10,
trControl = trainControl(method = "repeatedcv",
repeats = 5))
## How does random forest do on its own?
set.seed(2)
rfOnly <- train(trainX, trainY, method = "rf",
tuneLength = 10,
trControl = trainControl(method = "repeatedcv",
repeats = 5))
Just for kicks, I got:
> getTrainPerf(plsrf)
TrainAccuracy TrainKappa method
1 0.7940423 0.65879 custom
> getTrainPerf(rfOnly)
TrainAccuracy TrainKappa method
1 0.7794082 0.6205322 rf
and
> postResample(predict(plsrf, testX), testY)
Accuracy Kappa
0.7741935 0.6226087
> postResample(predict(rfOnly, testX), testY)
Accuracy Kappa
0.9032258 0.8353982
Max
Based on Max's valuable comments, I felt the need to have IRIS referee, which is famous for classification, and more importantly the Species outcome has more than two classes, which would be a good data set to test the PLS-LDA custom model in caret:
data(iris)
names(iris)
head(iris)
dim(iris) # 150x5
set.seed(1)
inTrain <- createDataPartition(y = iris$Species,
## the outcome data are needed
p = .75,
## The percentage of data in the
## training set
list = FALSE)
## The format of the results
## The output is a set of integers for the rows of Iris
## that belong in the training set.
training <- iris[ inTrain,] # 114
testing <- iris[-inTrain,] # 36
ctrl <- trainControl(method = "repeatedcv",
repeats = 5,
classProbs = TRUE)
set.seed(2)
plsFitIris <- train(Species ~ .,
data = training,
method = "pls",
tuneLength = 4,
trControl = ctrl,
preProc = c("center", "scale"))
plsFitIris
plot(plsFitIris)
set.seed(2)
plsldaFitIris <- train(Species ~ .,
data = training,
method = modelInfo,
tuneLength = 4,
trControl = ctrl,
preProc = c("center", "scale"))
plsldaFitIris
plot(plsldaFitIris)
Now comparing the two models:
getTrainPerf(plsFitIris)
TrainAccuracy TrainKappa method
1 0.8574242 0.7852462 pls
getTrainPerf(plsldaFitIris)
TrainAccuracy TrainKappa method
1 0.975303 0.9628179 custom
postResample(predict(plsFitIris, testing), testing$Species)
Accuracy Kappa
0.750 0.625
postResample(predict(plsldaFitIris, testing), testing$Species)
Accuracy Kappa
0.9444444 0.9166667
So, finally there was the EXPECTED difference, and improvement in the metrics. So this would support Max's notion, that two-class problems because of Bayes' probabilistic approach of plsda function both lead to the same results.
You need to wrap the CV around both PLS and LDA.
Yes, both plsr and lda center the data their own way
I had a closer look at caret::preProcess (): as it is defined now, you will not be able to use PLS as preprocessing method because it is supervised but caret::preProcess () uses unsupervised methods only (there is no way to hand over the dependent variable). This would probably make patching rather difficult.
So inside the caret framework, you'll need to go for a custom model.
If the scenario were to custom a model of PLS-LDA type, according to the code kindly provided by Max (maintainer of CARET), something is not corect in this code, but I didn't figure it out, because I used the Sonar data set the same in caret vignette and tried to reproduce the result one time using method="pls" and another time using the below custom model for PLS-LDA, the results were exactly identical even to the last digit, which was nonsensical. For benchmarking, one need a known data set (I think a cross-validated PLS-LDA for iris data set would fit here as it is famous for this type of analysis and there should be somewhere a cross-validated treatment of it), everything should be the same (the set.seed(xxx) and the no of K-CV repitition) except the code in question so as to rightly compare and to judge the code below:
modelInfo <- list(label = "PLS-LDA",
library = c("pls", "MASS"),
type = "Classification",
parameters = data.frame(parameter = c("ncomp"),
class = c("numeric"),
label = c("#Components")),
grid = function(x, y, len = NULL) {
grid <- expand.grid(ncomp = seq(1, min(ncol(x) - 1, len), by = 1))
},
loop = NULL,
fit = function(x, y, wts, param, lev, last, classProbs, ...) {
## First fit the pls model, generate the training set scores,
## then attach what is needed to the lda object to
## be used later
pre <- plsda(x, y, ncomp = param$ncomp)
scores <- pls:::predict.mvr(pre, x, type = "scores")
mod <- lda(scores, y, ...)
mod$projection <- pre$projection
mod
},
predict = function(modelFit, newdata, submodels = NULL) {
scores <- as.matrix(newdata) %*% modelFit$projection
predict(modelFit, scores)$class
},
prob = function(modelFit, newdata, submodels = NULL) {
scores <- as.matrix(newdata) %*% modelFit$projection
predict(modelFit, scores)$posterior
},
varImp = NULL,
predictors = function(x, ...) rownames(x$projection),
levels = function(x) x$obsLevels,
sort = function(x) x[order(x[,1]),])
Based on Zach's request, the code below is for method="pls" in caret, exactly the same concrete example in caret vigenette on CRAN:
library(mlbench) # data set from here
data(Sonar)
dim(Sonar) # 208x60
set.seed(107)
inTrain <- createDataPartition(y = Sonar$Class,
## the outcome data are needed
p = .75,
## The percentage of data in the
## training set
list = FALSE)
## The format of the results
## The output is a set of integers for the rows of Sonar
## that belong in the training set.
training <- Sonar[ inTrain,] #157
testing <- Sonar[-inTrain,] # 51
ctrl <- trainControl(method = "repeatedcv",
repeats = 3,
classProbs = TRUE,
summaryFunction = twoClassSummary)
set.seed(108)
plsFitSon <- train(Class ~ .,
data = training,
method = "pls",
tuneLength = 15,
trControl = ctrl,
metric = "ROC",
preProc = c("center", "scale"))
plsFitSon
plot(plsFitSon) # might be slightly difference than what in the vignette due to radnomness
Now, the code below is a pilot run to classify Sonar data using the custom model PLS-LDA which is under question, it is expected to come up with any numbers apart from identical with those using PLS only:
set.seed(108)
plsldaFitSon <- train(Class ~ .,
data = training,
method = modelInfo,
tuneLength = 15,
trControl = ctrl,
metric = "ROC",
preProc = c("center", "scale"))
Now comparing the results between the two models:
getTrainPerf(plsFitSon)
TrainROC TrainSens TrainSpec method
1 0.8741154 0.7638889 0.8452381 pls
getTrainPerf(plsldaFitSon)
TrainROC TrainSens TrainSpec method
1 0.8741154 0.7638889 0.8452381 custom
postResample(predict(plsFitSon, testing), testing$Class)
Accuracy Kappa
0.745098 0.491954
postResample(predict(plsldaFitSon, testing), testing$Class)
Accuracy Kappa
0.745098 0.491954
So, the results are exactly the same which cannot be. As if the lda model were not added?

Resources