I have a problem that I reproduced on a standard R dataset which is the following: built a logistic model with penalties in Caret I want to analyze it through the performance package (for checking collineracy, heteroskedasticity of the error...)
In the end, however, it gives me this error: check_model() not implemented for models of class train yet.
Below is the reproducible code:
data<- as.data.frame(mtcars)
data$vs<- as.factor(data$vs)
set.seed(10)
trc<- trainControl(method = "repeatedcv", number=3,repeats=4, classProbs=FALSE)
library(caret)
model <- caret::train(vs~., data=data, trControl= trc, family="binomial", method = "regLogistic")
library(performance)
performance::check_model(model)
Related
I built a logistic regression model with penalty with caret and then i try to create an object through DALEX::explain to subsequently analyze the various aspects of the model.
Perhaps the problem lies in having a binary classification model.
Here my reproducible code
library(DALEX)
library(modelStudio)
set.seed(10)
data<- as.data.frame(mtcars)
data$vs<- as.factor(data$vs)
set.seed(10)
trc<- trainControl(method = "repeatedcv", number=3,repeats=4, classProbs=FALSE)
library(caret)
model <- caret::train(vs~., data=data, trControl= trc, family="binomial", method = "regLogistic")
explainer<- DALEX::explain(
model = model,
data= as.data.frame(data[, -which(colnames(data) %in% "vs")]),
y = as.numeric(as.character(data$vs)),
predict_function = predict,
label = "regLogistic")
modelStudio::modelStudio(explainer)
I ran a simple logit model using the caret package, and I'm looking for a way to obtain the marginal effects (AME). The margins package does not seems to work here.
library(caret)
library(margins)
data("GermanCredit")
set seed (100)
kfold.cross <- trainControl(method="cv", number=5, verboseIter=FALSE, classProbs = TRUE, savePredictions = TRUE)
GermanCredit.logit <- train(Class~Age+ForeignWorker,data=GermanCredit,method="glm",family=binomial,trControl=kfold.cross, metric="Accuracy")
GermanCredit.logitmfx <- margins(GermanCredit.logit)
Gives the following error message:
Error in UseMethod("vcov") : no applicable method for 'vcov' applied to an object of class "c('train', 'train.formula')"
Makes sense to me, but I can't figure out which package I need in this case.
PS the $finalModel gives the log odds (/ratios).
I want to perform backward feature selection using the function fastbw from the rms package. I use a sample dataset PimaIndiansDiabetes as below:
library(mlbench)
data(PimaIndiansDiabetes)
library(caret)
trControl <- trainControl(method = "repeatedcv",
repeats = 3,
classProbs = TRUE,
number = 10,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
caret_model <- train(diabetes~.,
data=PimaIndiansDiabetes,
method="glm",
trControl=trControl)
library(rms)
reduced_model <- fastbw(caret_model$finalModel)
This gives me an error:
Error in fastbw(caret_model$finalModel) : fit does not have design
information
May I know what this means and how to resolve it?
You're probably stuck. fastbw() works only with models from rms, i.e. ?fastbw says:
fit: fit object with ‘Varcov(fit)’ defined (e.g., from ‘ols’,
‘lrm’, ‘cph’, ‘psm’, ‘glmD’)
I tried your fit with method="lrm" (lrm is rms's logistic regression tool), but got
Error: Model lrm is not in caret's built-in library
I think you're going to have to find another way to do stepwise regression, e.g. see this question: i.e. using library(MASS) and then method="glmStepAIC" (within caret), or stepAIC (from scratch).
It's not obvious to me why you're training a model and then doing stepwise regression ...
I have some code which fits several (cross-validated) models to some data, as below.
library(datasets)
library(caret)
library(caretEnsemble)
# load data
data("iris")
# establish cross-validation structure
set.seed(32)
trainControl <- trainControl(method="repeatedcv",
number=5, repeats=3, # 3x 5-fold CV
search="random")
algorithmList <- c('lda', # Linear Discriminant Analysis
'rpart' , # Classification and Regression Trees
'svmRadial') # SVM with RBF Kernel
# cross-validate models from algorithmList
models <- caretList(Species~., data=iris, trControl=trainControl, methodList=algorithmList)
so far so good. however, if I add 'gbm' to my algorithmList, I get a ton of extraneous log messages because gbm seems to have a verbose=TRUE default fit param.
According to the caret docs, if I were running train on method='gbm' by itself (not along with several models trained in a caretList), I could simply add verbose=FALSE to train(), which would flow through to gbm. But this throws an error when I try it in caretList.
So I would like to pass verbose=FALSE (or any other fit param, in theory) specifically to one particular model from caretList's methodList. How can I accomplish this?
ok this is actually addressed well in the docs.
?caretList
includes:
tuneList: optional, a NAMED list of caretModelSpec objects. This is
much more flexible than methodList and allows the specificaiton of
model-specific parameters
And I've confirmed my problem is solved if instead of:
algorithmList <- c('lda', # Linear Discriminant Analysis
'rpart' , # Classification and Regression Trees
'svmRadial', # SVM with RBF Kernel
'gbm') # Gradient-boosted machines
I use:
modelTypes <- list(lda = caretModelSpec(method="lda"),
rpart = caretModelSpec(method="rpart"),
svmRadial= caretModelSpec(method="svmRadial"),
gbm = caretModelSpec(method="rf", verbose=FALSE)
...then the models <- caretList(... line goes from:
models <- caretList(... methodList=algorithmList)
to:
models <-caretList(... tuneList = modelTypes)
I am using caret package in order to train a K-Nearest Neigbors algorithm. For this, I am running this code:
Control <- trainControl(method="cv", summaryFunction=twoClassSummary, classProb=T)
tGrid=data.frame(k=1:100)
trainingInfo <- train(Formula, data=trainData, method = "knn",tuneGrid=tGrid,
trControl=Control, metric = "ROC")
As you can see, I am interested in obtain the AUC parameter of the ROC. This code works good but returns the testing error (which the algorithm uses for tuning the k parameter of the model) as the mean of the error of the CrossValidation folds. I am interested in return, in addition of the testing error, the training error (the mean across each fold of the error obtained with the training data). ¿How can I do it?
Thank you
What you are asking is a bad idea on multiple levels. You will grossly over-estimate the area under the ROC curve. Consider the 1-NN model: you will have perfect predictions every time.
To do this, you will need to run train again and modify the index and indexOut objects:
library(caret)
set.seed(1)
dat <- twoClassSim(200)
set.seed(2)
folds <- createFolds(dat$Class, returnTrain = TRUE)
Control <- trainControl(method="cv",
summaryFunction=twoClassSummary,
classProb=T,
index = folds,
indexOut = folds)
tGrid=data.frame(k=1:100)
set.seed(3)
a_bad_idea <- train(Class ~ ., data=dat,
method = "knn",
tuneGrid=tGrid,
trControl=Control, metric = "ROC")
Max