R programming - How to convert model output into data frame - r

below are my scripts in running KNN model using cross validate method.
## cross validation
library(caret)
cvroc <- trainControl(method = "repeatedcv",
number = 10, # number of iteration
repeats = 3,
classProbs = TRUE,
summaryFunction = twoClassSummary)
#KNN Model
set.seed(222)
fit_roc <- train(admit ~. ,
data = training,
method = 'knn',
tuneLength = 20,
trControl = cvroc, .
preProc = c("center","scale"),
metric = "ROC",
tuneGrid =expand.grid(k=1:60))
fit_roc
KNN model output
Question:
My question will be, how can i convert the output from the model into a data.frame?
i used the command below it gives error.
aa <- data.frame(fit_roc)
Thanks!

Depending on which part of the output you want, you can do the following: The point is fit_roc$whatever you want to extract
fit_roc$results

Related

How to check if Random Forest model performed better than chance? (T-Test)

I perform 15-fold cross validation on random forest. Like this:
trainControl = trainControl(method="cv", number = 15, search = "random", savePredictions = T)
tuningGrid <- expand.grid(mtry=c(2,4,6,8))
model = train(Classification~., data=data, method="rf", trControl = trainControl,
tuneGrid = tuningGrid, ntree = 25)
model
How can I use T-Test to show that the model performs better than chance based on the 15 accuracies?
I don't know what to put for x in t.test

Can I use caret RFE result for subsequent random forest with CV?

I've been googling and reading alot about my issue but couldn't find a clear answer.
In order to prevent data leakage, I use caret RFE for feature elimination:
rfFuncs$summary <- twoClassSummary
#---- rfe control
rfe_ctrl <- rfeControl(functions = rfFuncs,
method = "repeatedcv",
number = 5, # folds
repeats = 3, # iterations
verbose = TRUE,
returnResamp = "all",
saveDetails = TRUE)
#---- rfe
set.seed(164)
rfe_profile <- rfe(df_train[,features_name],
df_train[,target_name],
sizes = subsets,
rfeControl = rfe_ctrl,
metric = "ROC",
scale = TRUE)
#---- select final features
features_rfe <- predictors(rfe_profile)[1:30]
This works well and tells me how many predictors I should use. Let's say I get the result that 60 features works best. However, according to plots and further analysis (e.g. pickSizeTolerance) I know I can use the first 30 features and still having good performance of predictions.
Now, can I use these 30 features (=features_rfe) in a subsequent model training step (see code below) since I don't touch the test set until after this step? Or would I introduce data leakage with this step because the model gets trained with pre-selected best features?
#---- random forest control
rf_ctrl <- trainControl(method = "cv",
number = 5, # folds
verbose = TRUE,
returnResamp = "all",
classProbs = TRUE,
summaryFunction = twoClassSummary)
#---- random forest grid search
rf_grid <- expand.grid(mtry = c(1, 5, 10, 15, 20))
#---- random forest
set.seed(10)
rf_caret_train <- train(df_train[,features_rfe],
df_train[,target_name],
method = "rf",
trControl = rf_ctrl,
tuneGrid = rf_grid,
metric = "ROC",
verbose = TRUE)
#---- make predictions
rf_caret_test_raw <- predict(rf_caret_train, df_test[,features_rfe], type='raw', scale=TRUE)
I hope you cann help me.
Best,
Troji

Use F1 Score instead of Accuracy to Optimize SVM Parameters

I am using the e1071 'tune' function to optimize an SVM model. I would like to use F1 instead of Accuracy as the value to optimize for. I have found on this post: Optimize F-score in e1071 package that I need to define a new error.fun. The problem that I am having is that the function that is shown in that post was not shown to ultimately be the solution and it does not work for me. If I knew the variable names for the predictions from each iteration of tune I could write a function to calculate F1 but I don't know how to get those values. How can I calculate F1 and use it to optimize model parameters using 'tune' in e1071? My code is as follows:
tuned = tune.svm(PriYN~., data = dataset, kernel = "radial", probability=TRUE, gamma = 10^(-5:-1), cost = 10^(-3:1), tunecontrol=tune.control(cross=10))
Using {caret} :
ctrl <- trainControl(method = "repeatedcv", # choose your CV method
number = 5, # according to CV method
repeats = 2, # according to CV method
summaryFunction = prSummary, # TO TUNE ON F1 SCORE
classProbs = T,
verboseIter = T
#sampling = "smote" # you can try 'smote' resampling method
)
Then tune your model
set.seed(2202)
svm_model <- train(target ~., data = training,
method = "svmRadial",
#preProcess = c("center", "scale"),
tuneLength = 10,
metric = "F", # The metric used for tuning is the F1 SCORE
trControl = ctrl)
svm_model

How to fix "The metric "Accuracy" was not in the result set. AUC will be used instead"

I am trying to run a logistic regression on a classification problem
the dependent variable "SUBSCRIBEDYN" is a factor with 2 levels ("Yes" and "No")
train.control <- trainControl(method = "repeatedcv",
number = 10,
repeats = 10,
verboseIter = F,
classProbs = T,
summaryFunction = prSummary)
set.seed(13)
simple.logistic.regression <- caret::train(SUBSCRIBEDYN ~ .,
data = train_data,
method = "glm",
metric = "Accuracy",
trControl = train.control)
simple.logistic.regression`
However, it does not accept Accuracy as a metric
"The metric "Accuracy" was not in the result set. AUC will be used instead"
For a classification model with 2 levels, you should use metric="ROC". metric="Accuracy" is used for multiple classes. However, after training the model, you can use the confusion matrix to retrieve the accuracy, for example using the function confusionMatrix().

How to compute AUC under ROC in R (caret, random forest , svm)

I am using random forest and support vector machine method in caret package in R. I want to calculate AUC under ROC for both cases; however, I do not know how to do it in this particular case. My outcome is coded as 0 and 1. Here is the example of code I am using :
set.seed(123)
cvCtrl <- trainControl(method = "cv", number = 10)
rf_moded<-train(readm30~.,data=train,method="rf", trControl=cvCtrl)
Do you want to train the model with ROC? Then you need the following:
For trainControl:
control <- trainControl(method = 'cv', number = 10,
savePredictions = 'final', classProbs = TRUE, summaryFunction = twoClassSummary)
And in train:
train(
outcome ~ .,
data = data,
method = method,
trControl = control,
metric = "ROC"
)

Resources