I am using the r package caret and ranger to develop a classifier to predict the risk of dying, but I am having trouble calculating AUC:
I am aware that I need to set probability = TRUE when training the model, however, I get an error saying:
'formal argument "probability" matched by multiple actual arguments' and I cant run the model.
My code:
set.seed(40)
control.data <- trainControl(method="cv", number=10, sampling ="up", VerboseIter= TRUE)
rfGrid <- expand.grid(
.mtry=2:6,
.splitrule="gini",
.min.node.size=c(250,500))
fit.dataup <- train(mort_30 ~ C_SEX+V_AGE+Hemoglobin+Thrombocytes+Leukocytes+CRP,
data=data.train,
method="ranger",
max.depth=10,
num.trees=500,
trControl=control.data,
tuneGrid=rfGrid,
importance="impurity",
probability = TRUE,
verbose=TRUE)
Then i get this error message, when trying to run it:
Model fit failed for Fold01: mtry=2, splitrule=gini, min.node.size=500 Error in ranger::ranger(dependent.variable.name = ".outcome", data =x,: formal argument "probability" matched by multiple actual arguments
What am I doing wrong?
Related
I ran a simple logit model using the caret package, and I'm looking for a way to obtain the marginal effects (AME). The margins package does not seems to work here.
library(caret)
library(margins)
data("GermanCredit")
set seed (100)
kfold.cross <- trainControl(method="cv", number=5, verboseIter=FALSE, classProbs = TRUE, savePredictions = TRUE)
GermanCredit.logit <- train(Class~Age+ForeignWorker,data=GermanCredit,method="glm",family=binomial,trControl=kfold.cross, metric="Accuracy")
GermanCredit.logitmfx <- margins(GermanCredit.logit)
Gives the following error message:
Error in UseMethod("vcov") : no applicable method for 'vcov' applied to an object of class "c('train', 'train.formula')"
Makes sense to me, but I can't figure out which package I need in this case.
PS the $finalModel gives the log odds (/ratios).
I want to perform backward feature selection using the function fastbw from the rms package. I use a sample dataset PimaIndiansDiabetes as below:
library(mlbench)
data(PimaIndiansDiabetes)
library(caret)
trControl <- trainControl(method = "repeatedcv",
repeats = 3,
classProbs = TRUE,
number = 10,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
caret_model <- train(diabetes~.,
data=PimaIndiansDiabetes,
method="glm",
trControl=trControl)
library(rms)
reduced_model <- fastbw(caret_model$finalModel)
This gives me an error:
Error in fastbw(caret_model$finalModel) : fit does not have design
information
May I know what this means and how to resolve it?
You're probably stuck. fastbw() works only with models from rms, i.e. ?fastbw says:
fit: fit object with ‘Varcov(fit)’ defined (e.g., from ‘ols’,
‘lrm’, ‘cph’, ‘psm’, ‘glmD’)
I tried your fit with method="lrm" (lrm is rms's logistic regression tool), but got
Error: Model lrm is not in caret's built-in library
I think you're going to have to find another way to do stepwise regression, e.g. see this question: i.e. using library(MASS) and then method="glmStepAIC" (within caret), or stepAIC (from scratch).
It's not obvious to me why you're training a model and then doing stepwise regression ...
I am getting the following error: $ operator is invalid for atomic vectors. I am getting the error when trying to calculate the prediction error for a logistic regression model.
Here is the code and data I am using:
install.packages("ElemStatLearn")
library(ElemStatLearn)
# training data
train = vowel.train
# only looking at the first two classes
train.new = train[1:3]
# test data
test = vowel.test
test.new = test[1:3]
# performing the logistic regression
train.new$y <- as.factor(train.new$y)
mylogit <- glm(y ~ ., data = train.new, family = "binomial")
train.logit.values <- predict(mylogit, newdata=test.new, type = "response")
# this is where the error occurs (below)
train.logit.values$se.fit
I tried to make it of type list but that did not seem to work, I am wondering if there is a quick fix so that I can obtain either the prediction error or the misclassification rate.
Trying to learn r-Caret and caretList.
I am trying to follow the tutorial caretEnsemble Classification example
I have encountered a few errors and searched how to fix some of the basic set up.
However, I am getting the error:
Warning messages:
1: In train.default(x, y, weights = w, ...) :
The metric "Accuracy" was not in the result set. ROC will be used instead.
2: In train.default(x, y, weights = w, ...) :
The metric "Accuracy" was not in the result set. ROC will be used instead.
My setup is:
#Libraries
library(caret)
library(devtools)
library(caretEnsemble)
#Data
library(mlbench)
dat <- mlbench.xor(500, 2)
X <- data.frame(dat$x)
Y <- factor(ifelse(dat$classes=='1', 'Yes', 'No'))
#Split train/test
train <- runif(nrow(X)) <= .66
#Setup CV Folds
#returnData=FALSE saves some space
folds=5
repeats=1
myControl <- trainControl(method='cv',
number=folds,
repeats=repeats,
returnResamp='none',
classProbs=TRUE,
returnData=FALSE,
savePredictions=TRUE,
verboseIter=TRUE,
allowParallel=TRUE,
summaryFunction=twoClassSummary,
index=createMultiFolds(Y[train],
k=folds,
times=repeats)
)
#Make list of all models
all.models<-caretList(Y~., data=X, trControl=myControl, methodList=c("blackboost", "parRF"))
I edited the section of "train all models" using caretList so that it will work with caretEnsemble and caretStack further down the code (link provided above).
How do I get the accuracies so that I can use them in caretEnsemble and caretStack?
I assume you would like to use 'Accuracy' as the summary metric that should be used to select the optimal base learner models across their resamples and the metalearner later on via caretEnsemble or caretStack.
In this case you must not set summaryFunction = twoClassSummary in trainControl because like this train will use 'ROC' as the performance metric and not 'Accuracy'. Instead you should go with the default setting for summaryFunction (That means you do not have to specify it explicitly in trainControl). Like this train which is called via caretList will automatically use 'Accuracy' as the performance metric because of the categorical response.
In addition, there a few other things to note:
You should not set returnResamp = FALSE in trainControl. Because when you do, you won't be able to compare the model's individual accuracies later via summary(resamples(model.list))
Even though you created an index to separate the data into a train and test set you don't use it when passing the data to caretList. The correct caretList call should begin like this caretList(Y[train] ~ ., data=X[train, ], ...
The tutorial you mentioned above is a bit outdated. You should also check out the package's current vignette and this tutorial from MachineLearningMastery. The latter also uses "Accuracy" as the performance metric in its example
I would like to fit a Binomial GLM on a certain dataset. Using glm(...,family=binomial) everything works fine however I would like to do it with the caret train() function. Unfortunately I get an unexpected error which I cannot get rid of.
library("marginalmodelplots")
library("caret")
MissUSA <- MissAmerica08[,c(2,4,6,7,8,10)]
formula <- cbind(Top10, 9-Top10)~.
glmfit <- glm(formula=formula, data=MissUSA, family=binomial())
trainfit <-train(form=formula,data=MissUSA,trControl=trainControl(method = "none"), method="glm", family=binomial())
The error I get is:
"Error : nrow(x) == length(y) is not TRUE"
caret doesn't support grouped data for a binomial outcome. You can expand the data into a factor variable that is binary (Bernoulli) data. Also, if you do that, you do not need to use family=binomial() in the call to train.
Max