marginal effects after caret in R - r

I ran a simple logit model using the caret package, and I'm looking for a way to obtain the marginal effects (AME). The margins package does not seems to work here.
library(caret)
library(margins)
data("GermanCredit")
set seed (100)
kfold.cross <- trainControl(method="cv", number=5, verboseIter=FALSE, classProbs = TRUE, savePredictions = TRUE)
GermanCredit.logit <- train(Class~Age+ForeignWorker,data=GermanCredit,method="glm",family=binomial,trControl=kfold.cross, metric="Accuracy")
GermanCredit.logitmfx <- margins(GermanCredit.logit)
Gives the following error message:
Error in UseMethod("vcov") : no applicable method for 'vcov' applied to an object of class "c('train', 'train.formula')"
Makes sense to me, but I can't figure out which package I need in this case.
PS the $finalModel gives the log odds (/ratios).

Related

Using performance package on a Caret Object

I have a problem that I reproduced on a standard R dataset which is the following: built a logistic model with penalties in Caret I want to analyze it through the performance package (for checking collineracy, heteroskedasticity of the error...)
In the end, however, it gives me this error: check_model() not implemented for models of class train yet.
Below is the reproducible code:
data<- as.data.frame(mtcars)
data$vs<- as.factor(data$vs)
set.seed(10)
trc<- trainControl(method = "repeatedcv", number=3,repeats=4, classProbs=FALSE)
library(caret)
model <- caret::train(vs~., data=data, trControl= trc, family="binomial", method = "regLogistic")
library(performance)
performance::check_model(model)

Getting error when predicting Random forest AUC

I am using the r package caret and ranger to develop a classifier to predict the risk of dying, but I am having trouble calculating AUC:
I am aware that I need to set probability = TRUE when training the model, however, I get an error saying:
'formal argument "probability" matched by multiple actual arguments' and I cant run the model.
My code:
set.seed(40)
control.data <- trainControl(method="cv", number=10, sampling ="up", VerboseIter= TRUE)
rfGrid <- expand.grid(
.mtry=2:6,
.splitrule="gini",
.min.node.size=c(250,500))
fit.dataup <- train(mort_30 ~ C_SEX+V_AGE+Hemoglobin+Thrombocytes+Leukocytes+CRP,
data=data.train,
method="ranger",
max.depth=10,
num.trees=500,
trControl=control.data,
tuneGrid=rfGrid,
importance="impurity",
probability = TRUE,
verbose=TRUE)
Then i get this error message, when trying to run it:
Model fit failed for Fold01: mtry=2, splitrule=gini, min.node.size=500 Error in ranger::ranger(dependent.variable.name = ".outcome", data =x,: formal argument "probability" matched by multiple actual arguments
What am I doing wrong?

R: error when doing backward feature selection with rms::fastbw on caret model

I want to perform backward feature selection using the function fastbw from the rms package. I use a sample dataset PimaIndiansDiabetes as below:
library(mlbench)
data(PimaIndiansDiabetes)
library(caret)
trControl <- trainControl(method = "repeatedcv",
repeats = 3,
classProbs = TRUE,
number = 10,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
caret_model <- train(diabetes~.,
data=PimaIndiansDiabetes,
method="glm",
trControl=trControl)
library(rms)
reduced_model <- fastbw(caret_model$finalModel)
This gives me an error:
Error in fastbw(caret_model$finalModel) : fit does not have design
information
May I know what this means and how to resolve it?
You're probably stuck. fastbw() works only with models from rms, i.e. ?fastbw says:
fit: fit object with ‘Varcov(fit)’ defined (e.g., from ‘ols’,
‘lrm’, ‘cph’, ‘psm’, ‘glmD’)
I tried your fit with method="lrm" (lrm is rms's logistic regression tool), but got
Error: Model lrm is not in caret's built-in library
I think you're going to have to find another way to do stepwise regression, e.g. see this question: i.e. using library(MASS) and then method="glmStepAIC" (within caret), or stepAIC (from scratch).
It's not obvious to me why you're training a model and then doing stepwise regression ...

using ranger random forest with caret train function results in error: protect(): protection stack overflow

I am trying to train a classifier using the ranger function, but need to optimize the parameters by using the train function from the caret package.
When I implement this call:
grid <- expand.grid(mtry=c(1,2))
trControl <- trainControl(method='cv', number=10, classProbs = TRUE, summaryFunction=twoClassSummary, allowParallel = FALSE, verboseIter = TRUE)
rf.caret <- train(x=dataMat[,1:20000], y=factor(alldata$classLabels), method = 'ranger', save.memory=TRUE, metric='ROC', tuneGrid = grid, trControl=trControl)
I get the following error:
Error : protect(): protection stack overflow
on every fold of the cv and the command fails.
This is the same error I get if use the ranger function outside of train, but use the formula interface. Do I need to do something special to implement ranger's dependent.variable.name interface inside of the train function?
alldata$classLabels is a factor of two levels made from a character vector, and dataMat is a 40x23,000 data.frame of doubles.

R-Caret, caretList, The metric "Accuracy" was not in the result set

Trying to learn r-Caret and caretList.
I am trying to follow the tutorial caretEnsemble Classification example
I have encountered a few errors and searched how to fix some of the basic set up.
However, I am getting the error:
Warning messages:
1: In train.default(x, y, weights = w, ...) :
The metric "Accuracy" was not in the result set. ROC will be used instead.
2: In train.default(x, y, weights = w, ...) :
The metric "Accuracy" was not in the result set. ROC will be used instead.
My setup is:
#Libraries
library(caret)
library(devtools)
library(caretEnsemble)
#Data
library(mlbench)
dat <- mlbench.xor(500, 2)
X <- data.frame(dat$x)
Y <- factor(ifelse(dat$classes=='1', 'Yes', 'No'))
#Split train/test
train <- runif(nrow(X)) <= .66
#Setup CV Folds
#returnData=FALSE saves some space
folds=5
repeats=1
myControl <- trainControl(method='cv',
number=folds,
repeats=repeats,
returnResamp='none',
classProbs=TRUE,
returnData=FALSE,
savePredictions=TRUE,
verboseIter=TRUE,
allowParallel=TRUE,
summaryFunction=twoClassSummary,
index=createMultiFolds(Y[train],
k=folds,
times=repeats)
)
#Make list of all models
all.models<-caretList(Y~., data=X, trControl=myControl, methodList=c("blackboost", "parRF"))
I edited the section of "train all models" using caretList so that it will work with caretEnsemble and caretStack further down the code (link provided above).
How do I get the accuracies so that I can use them in caretEnsemble and caretStack?
I assume you would like to use 'Accuracy' as the summary metric that should be used to select the optimal base learner models across their resamples and the metalearner later on via caretEnsemble or caretStack.
In this case you must not set summaryFunction = twoClassSummary in trainControl because like this train will use 'ROC' as the performance metric and not 'Accuracy'. Instead you should go with the default setting for summaryFunction (That means you do not have to specify it explicitly in trainControl). Like this train which is called via caretList will automatically use 'Accuracy' as the performance metric because of the categorical response.
In addition, there a few other things to note:
You should not set returnResamp = FALSE in trainControl. Because when you do, you won't be able to compare the model's individual accuracies later via summary(resamples(model.list))
Even though you created an index to separate the data into a train and test set you don't use it when passing the data to caretList. The correct caretList call should begin like this caretList(Y[train] ~ ., data=X[train, ], ...
The tutorial you mentioned above is a bit outdated. You should also check out the package's current vignette and this tutorial from MachineLearningMastery. The latter also uses "Accuracy" as the performance metric in its example

Resources