How to r un loocv in R with naivebayes model? - r

My following output produce accuracy:
data(iris)
x = iris[,-5]
y = iris$Species
train_control <- trainControl(method="LOOCV")
model <- train(x,y, trControl=train_control, method="nb")
But what i wish to get is the following output with probability each class belong to:
Model=naiveBayes(Species ~., data=iris)
Model

Please include the packages you are using, like:
library(caret)
It looks to me like caret::train method "nb" uses NaiveBayes (from klaR package), while naiveBayes is from the e1071 package.
In any case model$finalmodel contains the model object.

Related

Using performance package on a Caret Object

I have a problem that I reproduced on a standard R dataset which is the following: built a logistic model with penalties in Caret I want to analyze it through the performance package (for checking collineracy, heteroskedasticity of the error...)
In the end, however, it gives me this error: check_model() not implemented for models of class train yet.
Below is the reproducible code:
data<- as.data.frame(mtcars)
data$vs<- as.factor(data$vs)
set.seed(10)
trc<- trainControl(method = "repeatedcv", number=3,repeats=4, classProbs=FALSE)
library(caret)
model <- caret::train(vs~., data=data, trControl= trc, family="binomial", method = "regLogistic")
library(performance)
performance::check_model(model)

Difference between Model and $FinalModel for classification in R?

Currently got this Random Forest model, just seeing how well it predicts those with diabetes positive or diabetes negative
Model is calculated using the caret workflow
when looking at variable importance i was told to use the code
randomForest::importance(model$finalModel)
what is the purpose of $finalModel? what is $finalModel as compared to just the original model? should it not be just be the original model passed in as the argument instead to view variable importance?
example below:
library(tidyverse)
library(mlbench)
library(caret)
library(car)
library(glmnet)
library(rpart.plot)
library(rpart)
data("PimaIndiansDiabetes2")
PimaIndiansDiabetes2 <- na.omit(PimaIndiansDiabetes2)
set.seed(123)
training.samples <- PimaIndiansDiabetes2$diabetes %>% createDataPartition(p = 0.8, list = FALSE)
train.data <- PimaIndiansDiabetes2[training.samples,]
test.data <- PimaIndiansDiabetes2[-training.samples,]
model_rf <- caret::train(
diabetes ~.,
data = train.data,
method = "rf",
trControl = trainControl("cv", number = 10),
importance = TRUE)
model_rf
model_rf$bestTune
model_rf$finalModel
# variable importance here
importance(model_rf$finalModel)
From the documentation:
finalModel A fit object using the best parameters
Most of the times with train you pass some different values for hyper-parameter estimation, to find the values that achieve the best performance (using trainControl).
Inside model_rf you'll find under finalModel the model built with the best parameters.
FYI caret also has a function for variable importance plotting: varImp.

Why are the predict values of gbm (R package) negative?

I analyzed my data with 'gbm' R package. My data is based on a cohort study. Therefore, I ran 'gbm' model based on the 'coxph' results.
After constructing a model, I would like to see how this model can predict well. On the other hand, like the code below, the values of prediction are negative. So, I have a trouble understanding this phenomenon.
Please let me know how to interpret this value.
Here's my code.
install.packages("survival")
install.packages("randomForestSRC")
install.packages("gbm")
library(survival)
library(randomForestSRC)
library(gbm)
data(pbc, package="randomForestSRC")
data <- na.omit(pbc)
exposure <- names(data[, names(data.model) !=c("days", "status")])
formula <- as.formula(paste("Surv(days, status)~", paste(exposure, collapse="+")))
set.seed(123)
ex <- gbm(Surv(days, status)~.,
data=data,
distribution="coxph",
cv.folds=5,
shrinkage=.01,
n.trees=1000)
set.seed(123)
pred <- predict(ex, n.trees=1000, type="response")
Read the ?predict.gbm help page, particularly the parameter type. By default predictions are on the link scale.

r caretEnsemble - passing a fit param to one specific model in caretList

I have some code which fits several (cross-validated) models to some data, as below.
library(datasets)
library(caret)
library(caretEnsemble)
# load data
data("iris")
# establish cross-validation structure
set.seed(32)
trainControl <- trainControl(method="repeatedcv",
number=5, repeats=3, # 3x 5-fold CV
search="random")
algorithmList <- c('lda', # Linear Discriminant Analysis
'rpart' , # Classification and Regression Trees
'svmRadial') # SVM with RBF Kernel
# cross-validate models from algorithmList
models <- caretList(Species~., data=iris, trControl=trainControl, methodList=algorithmList)
so far so good. however, if I add 'gbm' to my algorithmList, I get a ton of extraneous log messages because gbm seems to have a verbose=TRUE default fit param.
According to the caret docs, if I were running train on method='gbm' by itself (not along with several models trained in a caretList), I could simply add verbose=FALSE to train(), which would flow through to gbm. But this throws an error when I try it in caretList.
So I would like to pass verbose=FALSE (or any other fit param, in theory) specifically to one particular model from caretList's methodList. How can I accomplish this?
ok this is actually addressed well in the docs.
?caretList
includes:
tuneList: optional, a NAMED list of caretModelSpec objects. This is
much more flexible than methodList and allows the specificaiton of
model-specific parameters
And I've confirmed my problem is solved if instead of:
algorithmList <- c('lda', # Linear Discriminant Analysis
'rpart' , # Classification and Regression Trees
'svmRadial', # SVM with RBF Kernel
'gbm') # Gradient-boosted machines
I use:
modelTypes <- list(lda = caretModelSpec(method="lda"),
rpart = caretModelSpec(method="rpart"),
svmRadial= caretModelSpec(method="svmRadial"),
gbm = caretModelSpec(method="rf", verbose=FALSE)
...then the models <- caretList(... line goes from:
models <- caretList(... methodList=algorithmList)
to:
models <-caretList(... tuneList = modelTypes)

Test for Heteroscedasticity on a caret nnet model

Is there a function within caret (or another package) that can perform a Breusch-Pagan / Cook-Weisberg test for heteroskedasticity on an 'nnet' model trained using caret?
E.g. something similar to library(car); ncvTest or library(lmtest); bptest for lm objects, but that works on nnet objects created from caret?
Example data
library(caret)
set.seed(4)
n <- 100
x1i <- rnorm(n)
x2i <- rnorm(n)
yi <- rnorm(n)
dat <- data.frame(yi, x1i, x2i)
mod <- train(yi ~., data=dat, method="nnet", trace=FALSE, linout=TRUE)
This produces the plot of fitted vs residuals:
No there is not anything like that in the package right now.

Resources