Train on Specificity - r

I use Caret to train my model (binary classification task). How can I make sure that train() doesn't train on the accuracy metric, but on Specificity (TN / (TN+FP)) metric?
what works on Accuracy:
control <- trainControl(method="cv", number=10)
metric <- "Accuracy"
set.seed(7)
fit.svm <- train(target_var ~., data=dataset, method="svmRadial", metric=metric, trControl=control)
It doesn't work to change:
metric = "Specificity"
Does anyone know how to train the model to optimise the Specificity?
KR,
Arnand

Try specifiyng the summaryFunction argument to twoClassSummary inside trainControl along with classProbs = TRUE , and metric = "Spec" inside train():
control <- trainControl(method="cv",
number=10,
summaryFunction = twoClassSummary,
classProbs = TRUE)
fit.svm <- train(target_var ~.,
data=dataset,
method="svmRadial",
metric="Spec",
trControl=control)

Related

Setting ntree and mtry explicitly in Random Forest with caret

I am trying to explicitly pass the number of trees and mtry into the Random Forest algorithm with caret:
library(caret)
library(randomForest)
repGrid<-expand.grid(.mtry=c(4),.ntree=c(350))
controlRep <- trainControl(method="cv",number = 5)
rfClassifierRep <- train(label~ .,
data=overallDataset,
method="rf",
metric="Accuracy",
trControl=controlRep,
tuneGrid = repGrid,)
I get this error:
Error: The tuning parameter grid should have columns mtry
I tried doing the more sensible way first:
rfClassifierRep <- train(label~ .,
data=overallDataset,
method="rf",
metric="Accuracy",
trControl=controlRep,
ntree=350,
mtry=4,
tuneGrid = repGrid)
But that resulted in an error stating that I had too many hyperparameters. This is why I have tried to make a 1x1 grid.
ntree cannot be part of tuneGrid for Random Forest, only mtry (see the detailed catalog of tuning parameters per model here); you can only pass it through train. And inversely, since you tune mtry, the latter cannot be part of train.
All in all, the correct combination here is:
repGrid <- expand.grid(.mtry=c(4)) # no ntree
rfClassifierRep <- train(label~ .,
data=overallDataset,
method="rf",
metric="Accuracy",
trControl=controlRep,
ntree=350,
# no mtry
tuneGrid = repGrid)

Error : The tuning parameter grid should have columns mtry, SVM Regression

I'm trying to tune an SVM regression model using the caret package. Below the code:
control <- trainControl(method="cv", number=5)
tunegrid <- expand.grid(.mtry=c(6:12), .ntree=c(500, 600, 700, 800, 900, 1000))
set.seed(2)
custom <- train(CRTOT_03~., data=train, method="rf", metric="rmse", tuneGrid=tunegrid, trControl=control)
summary(custom)
plot(custom)
and Im getting the error
Error : The tuning parameter grid should have columns mtry
You are using Random Forests, not Support Vector machines. You are getting an error, because you can set .mtry only in the tuning grid for Random Forests in caret The ntree parameter is set by passing ntree to train, e.g.
control <- trainControl(method="cv", number=5)
tunegrid <- expand.grid(.mtry = 6:12)
set.seed(2)
custom <- train(CRTOT_03~.,
data=train, method="rf",
metric="rmse",
tuneGrid=tunegrid,
ntree = 1000,
trControl=control)
ntree is passed directly to randomForest

classProbs for Random Forest Regression in R with caret

I am performing a Random Forest Regression in R, but would like to get something similar to class Probabilities as Random Forest Classification. In the past, I have ran Random Forest Classification models and have been able to get how accurate the prediction is. Is there a similar concept within Random Forest Regression?
For the following code:
fitControl <- trainControl(## 5-fold CV
method = "cv",
number = 5,
search = 'grid',
classProbs = TRUE,
savePredictions = "final"
)
grid <- expand.grid(.mtry=seq(1,4,1))
set.seed(123)
rf_model <- train(iris$Sepal.length~.,
data = iris[-5],
method = "rf",
importance=TRUE,
trControl = fitControl
metric = "MAE",
tuneGrid=grid,
ntree=100
)
]

R: Feature Selection with Cross Validation using Caret on Logistic Regression

I am currently learning how to implement logistical Regression in R
I have taken a data set and split it into a training and test set and wish to implement forward selection, backward selection and best subset selection using cross validation to select the best features.
I am using caret to implement cross-validation on the training data set and then testing the predictions on the test data.
I have seen the rfe control in caret and had also had a look at the documentation on the caret website as well as following the links on the question How to use wrapper feature selection with algorithms in R?. It isn't apparent to me how to change the type of feature selection as it seems to default to backward selection. Can anyone help me with my workflow. Below is a reproducible example
library("caret")
# Create an Example Dataset from German Credit Card Dataset
mydf <- GermanCredit
# Create Train and Test Sets 80/20 split
trainIndex <- createDataPartition(mydf$Class, p = .8,
list = FALSE,
times = 1)
train <- mydf[ trainIndex,]
test <- mydf[-trainIndex,]
ctrl <- trainControl(method = "repeatedcv",
number = 10,
savePredictions = TRUE)
mod_fit <- train(Class~., data=train,
method="glm",
family="binomial",
trControl = ctrl,
tuneLength = 5)
# Check out Variable Importance
varImp(mod_fit)
summary(mod_fit)
# Test the new model on new and unseen Data for reproducibility
pred = predict(mod_fit, newdata=test)
accuracy <- table(pred, test$Class)
sum(diag(accuracy))/sum(accuracy)
You can simply call it in mod_fit. When it comes to backward stepwise the code below is sufficient
trControl <- trainControl(method="cv",
number = 5,
savePredictions = T,
classProbs = T,
summaryFunction = twoClassSummary)
caret_model <- train(Class~.,
train,
method="glmStepAIC", # This method fits best model stepwise.
family="binomial",
direction="backward", # Direction
trControl=trControl)
Note that in trControl
method= "cv", # No need to call repeated here, the number defined afterward defines the k-fold.
classProbs = T,
summaryFunction = twoClassSummary # Gives back ROC, sensitivity and specifity of the chosen model.

Fitting models with class probabilities with caret in R?

I'm working on making some predictions with stacked ML algorithms in R, and I have successfully prepared the sub-models (see working code below:
trainSet <- read.csv("train.csv")
testSet <- read.csv("test.csv")
trainSet$Survived <- as.factor(trainSet$Survived)
algorithmList <- c('lda', 'rpart', 'glm', 'knn', 'svmRadial')
# create submodels
control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)
set.seed(seed)
models <- caretList(Survived~ Pclass + Sex + Fare, data=trainSet, trControl=control, methodList=algorithmList)
results <- resamples(models)
summary(results)
dotplot(results)
but when I actually go to stack the sub-models:
# stack using glm
stackControl <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)
set.seed(seed)
stack.glm <- caretStack(models, method="glm", metric="Accuracy", trControl=stackControl)
print(stack.glm)
It gives me the error message:
Error in check_caretList_model_types(list_of_models) :
The following models were fit by caret::train with no class probabilities: lda, rpart, glm, knn, svmRadial.
Please re-fit them with trainControl(classProbs=TRUE)
But, as you can see, I believe I actually did fit them with classProbs=TRUE (see my 'control' variable) and don't understand why I'm getting this error message! Any ideas?

Resources