classProbs for Random Forest Regression in R with caret - r

I am performing a Random Forest Regression in R, but would like to get something similar to class Probabilities as Random Forest Classification. In the past, I have ran Random Forest Classification models and have been able to get how accurate the prediction is. Is there a similar concept within Random Forest Regression?
For the following code:
fitControl <- trainControl(## 5-fold CV
method = "cv",
number = 5,
search = 'grid',
classProbs = TRUE,
savePredictions = "final"
)
grid <- expand.grid(.mtry=seq(1,4,1))
set.seed(123)
rf_model <- train(iris$Sepal.length~.,
data = iris[-5],
method = "rf",
importance=TRUE,
trControl = fitControl
metric = "MAE",
tuneGrid=grid,
ntree=100
)
]

Related

How to check if Random Forest model performed better than chance? (T-Test)

I perform 15-fold cross validation on random forest. Like this:
trainControl = trainControl(method="cv", number = 15, search = "random", savePredictions = T)
tuningGrid <- expand.grid(mtry=c(2,4,6,8))
model = train(Classification~., data=data, method="rf", trControl = trainControl,
tuneGrid = tuningGrid, ntree = 25)
model
How can I use T-Test to show that the model performs better than chance based on the 15 accuracies?
I don't know what to put for x in t.test

How do I get the training accuracies for each fold in k-fold cross validation in R?

I would like to evaluate whether the logistic regression model I created is overfit. I'd like to compare the accuracies of each training fold to the test fold, but I don't know how to view these in R. This is the k-fold cross validation code:
library(caret)
levels(habitatdata$outcome) <- c("absent", "present") #rename factor levels
set.seed(12)
cvIndex <- createFolds(factor(habitatdata$outcome), 5, returnTrain = T) #create stratified folds
ctrlspecs <- trainControl(index = cvIndex,
method = "cv",
number = 5,
savePredictions = "all",
classProbs = TRUE) #specify training methods
set.seed(123)
model1 <- train(outcome~ ist + hwt,
data=habitatdata,
method = "glm",
family = binomial, trControl = ctrlspecs) #specify model
How do I view the training accuracies of each fold?
Look at model1$resample - it should give you a table with Accuracy (and Kappa) for each fold.

How to fix "The metric "Accuracy" was not in the result set. AUC will be used instead"

I am trying to run a logistic regression on a classification problem
the dependent variable "SUBSCRIBEDYN" is a factor with 2 levels ("Yes" and "No")
train.control <- trainControl(method = "repeatedcv",
number = 10,
repeats = 10,
verboseIter = F,
classProbs = T,
summaryFunction = prSummary)
set.seed(13)
simple.logistic.regression <- caret::train(SUBSCRIBEDYN ~ .,
data = train_data,
method = "glm",
metric = "Accuracy",
trControl = train.control)
simple.logistic.regression`
However, it does not accept Accuracy as a metric
"The metric "Accuracy" was not in the result set. AUC will be used instead"
For a classification model with 2 levels, you should use metric="ROC". metric="Accuracy" is used for multiple classes. However, after training the model, you can use the confusion matrix to retrieve the accuracy, for example using the function confusionMatrix().

How to compute AUC under ROC in R (caret, random forest , svm)

I am using random forest and support vector machine method in caret package in R. I want to calculate AUC under ROC for both cases; however, I do not know how to do it in this particular case. My outcome is coded as 0 and 1. Here is the example of code I am using :
set.seed(123)
cvCtrl <- trainControl(method = "cv", number = 10)
rf_moded<-train(readm30~.,data=train,method="rf", trControl=cvCtrl)
Do you want to train the model with ROC? Then you need the following:
For trainControl:
control <- trainControl(method = 'cv', number = 10,
savePredictions = 'final', classProbs = TRUE, summaryFunction = twoClassSummary)
And in train:
train(
outcome ~ .,
data = data,
method = method,
trControl = control,
metric = "ROC"
)

Plot ROC curve for bootstrapped caret model

I have a model like the following:
library(mlbench)
data(Sonar)
library(caret)
set.seed(998)
my_data <- Sonar
fitControl <-
trainControl(
method = "boot632",
number = 10,
classProbs = T,
savePredictions = T,
summaryFunction = twoClassSummary
)
model <- train(
Class ~ .,
data = my_data,
method = "xgbTree",
trControl = fitControl,
metric = "ROC"
)
How do I plot the ROC curve for this model? As I understand it, the probabilities must be saved (which I did in trainControl), but because of the random sampling which bootstrapping uses to generate a 'test' set, I am not sure how caret calculates the ROC value and how to generate a curve.
To isolate the class probabilities for the best performing parameters, I am doing:
for (a in 1:length(model$bestTune))
{model$pred <-
model$pred[model$pred[, paste(colnames(model$bestTune)[a])] == model$bestTune[1, a], ]}
Please advise.
Thanks!
First an explanation:
If you are not going to check how each possible hyper parameter combination predicted on each sample in each re-sample you can set savePredictions = "final" in trainControl to save space:
fitControl <-
trainControl(
method = "boot632",
number = 10,
classProbs = T,
savePredictions = "final",
summaryFunction = twoClassSummary
)
after running the model:
model <- train(
Class ~ .,
data = my_data,
method = "xgbTree",
trControl = fitControl,
metric = "ROC"
)
results of interest are in model$pred
here you can check how many samples were tested in each re-sample (I set 25 repetitions)
nrow(model$pred[model$pred$Resample == "Resample01",])
#83
caret always provides prediction from rows not used in the model build.
nrow(my_data) #208
83/208 makes sense for the test samples for boot632
Now to build the ROC curve. You may opt for several options here:
-average the probability for each sample and use that (this is usual for CV since you have all samples repeated the same number of times, but it can be done with boot also).
-plot all as is without averaging
-plot ROC for each re-sample.
I will show you the second approach:
Create a data frame of class probabilities and true outcomes:
for_lift = data.frame(Class = model$pred$obs, xgbTree = model$pred$R)
plot ROC:
pROC::plot.roc(pROC::roc(response = for_lift$Class,
predictor = for_lift$xgbTree,
levels = c("M", "R")),
lwd=1.5)
You can also do this with ggplot, to do so I find it easiest to make a lift object using caret function lift
lift_obj = lift(Class ~ xgbTree, data = for_lift, class = "R")
specify which class the probability was used ^.
library(ggplot2)
ggplot(lift_obj$data)+
geom_line(aes(1-Sp , Sn, color = liftModelVar))+
scale_color_discrete(guide = guide_legend(title = "method"))

Resources