below are my scripts in running KNN model using cross validate method.
## cross validation
library(caret)
cvroc <- trainControl(method = "repeatedcv",
number = 10, # number of iteration
repeats = 3,
classProbs = TRUE,
summaryFunction = twoClassSummary)
#KNN Model
set.seed(222)
fit_roc <- train(admit ~. ,
data = training,
method = 'knn',
tuneLength = 20,
trControl = cvroc, .
preProc = c("center","scale"),
metric = "ROC",
tuneGrid =expand.grid(k=1:60))
fit_roc
KNN model output
Question:
My question will be, how can i convert the output from the model into a data.frame?
i used the command below it gives error.
aa <- data.frame(fit_roc)
Thanks!
Depending on which part of the output you want, you can do the following: The point is fit_roc$whatever you want to extract
fit_roc$results
Related
I perform 15-fold cross validation on random forest. Like this:
trainControl = trainControl(method="cv", number = 15, search = "random", savePredictions = T)
tuningGrid <- expand.grid(mtry=c(2,4,6,8))
model = train(Classification~., data=data, method="rf", trControl = trainControl,
tuneGrid = tuningGrid, ntree = 25)
model
How can I use T-Test to show that the model performs better than chance based on the 15 accuracies?
I don't know what to put for x in t.test
I've been googling and reading alot about my issue but couldn't find a clear answer.
In order to prevent data leakage, I use caret RFE for feature elimination:
rfFuncs$summary <- twoClassSummary
#---- rfe control
rfe_ctrl <- rfeControl(functions = rfFuncs,
method = "repeatedcv",
number = 5, # folds
repeats = 3, # iterations
verbose = TRUE,
returnResamp = "all",
saveDetails = TRUE)
#---- rfe
set.seed(164)
rfe_profile <- rfe(df_train[,features_name],
df_train[,target_name],
sizes = subsets,
rfeControl = rfe_ctrl,
metric = "ROC",
scale = TRUE)
#---- select final features
features_rfe <- predictors(rfe_profile)[1:30]
This works well and tells me how many predictors I should use. Let's say I get the result that 60 features works best. However, according to plots and further analysis (e.g. pickSizeTolerance) I know I can use the first 30 features and still having good performance of predictions.
Now, can I use these 30 features (=features_rfe) in a subsequent model training step (see code below) since I don't touch the test set until after this step? Or would I introduce data leakage with this step because the model gets trained with pre-selected best features?
#---- random forest control
rf_ctrl <- trainControl(method = "cv",
number = 5, # folds
verbose = TRUE,
returnResamp = "all",
classProbs = TRUE,
summaryFunction = twoClassSummary)
#---- random forest grid search
rf_grid <- expand.grid(mtry = c(1, 5, 10, 15, 20))
#---- random forest
set.seed(10)
rf_caret_train <- train(df_train[,features_rfe],
df_train[,target_name],
method = "rf",
trControl = rf_ctrl,
tuneGrid = rf_grid,
metric = "ROC",
verbose = TRUE)
#---- make predictions
rf_caret_test_raw <- predict(rf_caret_train, df_test[,features_rfe], type='raw', scale=TRUE)
I hope you cann help me.
Best,
Troji
I am using the e1071 'tune' function to optimize an SVM model. I would like to use F1 instead of Accuracy as the value to optimize for. I have found on this post: Optimize F-score in e1071 package that I need to define a new error.fun. The problem that I am having is that the function that is shown in that post was not shown to ultimately be the solution and it does not work for me. If I knew the variable names for the predictions from each iteration of tune I could write a function to calculate F1 but I don't know how to get those values. How can I calculate F1 and use it to optimize model parameters using 'tune' in e1071? My code is as follows:
tuned = tune.svm(PriYN~., data = dataset, kernel = "radial", probability=TRUE, gamma = 10^(-5:-1), cost = 10^(-3:1), tunecontrol=tune.control(cross=10))
Using {caret} :
ctrl <- trainControl(method = "repeatedcv", # choose your CV method
number = 5, # according to CV method
repeats = 2, # according to CV method
summaryFunction = prSummary, # TO TUNE ON F1 SCORE
classProbs = T,
verboseIter = T
#sampling = "smote" # you can try 'smote' resampling method
)
Then tune your model
set.seed(2202)
svm_model <- train(target ~., data = training,
method = "svmRadial",
#preProcess = c("center", "scale"),
tuneLength = 10,
metric = "F", # The metric used for tuning is the F1 SCORE
trControl = ctrl)
svm_model
I am trying to run a logistic regression on a classification problem
the dependent variable "SUBSCRIBEDYN" is a factor with 2 levels ("Yes" and "No")
train.control <- trainControl(method = "repeatedcv",
number = 10,
repeats = 10,
verboseIter = F,
classProbs = T,
summaryFunction = prSummary)
set.seed(13)
simple.logistic.regression <- caret::train(SUBSCRIBEDYN ~ .,
data = train_data,
method = "glm",
metric = "Accuracy",
trControl = train.control)
simple.logistic.regression`
However, it does not accept Accuracy as a metric
"The metric "Accuracy" was not in the result set. AUC will be used instead"
For a classification model with 2 levels, you should use metric="ROC". metric="Accuracy" is used for multiple classes. However, after training the model, you can use the confusion matrix to retrieve the accuracy, for example using the function confusionMatrix().
I am using random forest and support vector machine method in caret package in R. I want to calculate AUC under ROC for both cases; however, I do not know how to do it in this particular case. My outcome is coded as 0 and 1. Here is the example of code I am using :
set.seed(123)
cvCtrl <- trainControl(method = "cv", number = 10)
rf_moded<-train(readm30~.,data=train,method="rf", trControl=cvCtrl)
Do you want to train the model with ROC? Then you need the following:
For trainControl:
control <- trainControl(method = 'cv', number = 10,
savePredictions = 'final', classProbs = TRUE, summaryFunction = twoClassSummary)
And in train:
train(
outcome ~ .,
data = data,
method = method,
trControl = control,
metric = "ROC"
)