I'm trying to run a caret method that not requires parameters, such as lda, the example below uses "lvq" which needs 2 parameters (size and k)
set.seed(7)
# load the library
library(caret)
# load the dataset
data(iris)
# prepare training scheme
control <- trainControl(method="repeatedcv", number=10, repeats=3)
# design the parameter tuning grid
grid <- expand.grid(size=c(5,10,20,50), k=c(1,2,3,4,5))
# train the model
model <- train(Species~., data=iris, method="lvq", trControl=control, tuneGrid=grid)
# summarize the model
print(model)
plot(model)
I tried to work it out assigning tuneGrid=NULL
set.seed(7)
# load the library
library(caret)
# load the dataset
data(iris)
# prepare training scheme
control <- trainControl(method="repeatedcv", number=10, repeats=3)
# design the parameter tuning grid
grid <- expand.grid(size=c(5,10,20,50), k=c(1,2,3,4,5))
# train the model
model <- train(Species~., data=iris, method="lda", trControl=control, tuneGrid=NULL)
# summarize the model
print(model)
plot(model)
But I get the error
There are no tuning parameters for this model
Caret contains a number of LDA methods like:
method = "lda" involves no tuning parameters.
method = "lda2" allows to tune dimen (number of discriminant vectors).
If you want to tune parameters (and that must be only number of discriminant vectors), you must use "lda2". "lda" do not allows tuning so to run it you must delete tuneGrid. Deleting tuneGrid you just switch off cross-validation.
I'll answer my own question, I think that just deleting the tuneGrid=NULL works fine
set.seed(7)
# load the library
library(caret)
# load the dataset
data(iris)
# prepare training scheme
control <- trainControl(method="repeatedcv", number=10, repeats=3)
# design the parameter tuning grid
grid <- expand.grid(size=c(5,10,20,50), k=c(1,2,3,4,5))
# train the model
model <- train(Species~., data=iris, method="lda", trControl=control)
# summarize the model
print(model)
Related
I'm trying to train my dataset using R. Following is the code that I'll be using
functionRankFeatureByImportance <- function(logwine_withoutQuality){
#logwine_withoutQuality$quality<-factor(logwine_withoutQuality$quality)
# ensure results are repeatable
set.seed(7)
# prepare training scheme
control <- trainControl(method="repeatedcv", number=10, repeats=3)
# train the model
model <- train(logwine_withoutQuality[,-12],
logwine_withoutQuality$quality, method="lvq", preProcess="scale", trControl=control)
# estimate variable importance
importance <- varImp(model, scale=FALSE)
# summarize importance
print(importance)
# plot importance
plot(importance)
}
But when using this I'm getting an error like below.
I'm unable to understand what my error is.
Following is an image of the dataset I'm using
There aren't any null values in the dataframe.
Really appreciate if someone can kindly help me to solve this
I have some code which fits several (cross-validated) models to some data, as below.
library(datasets)
library(caret)
library(caretEnsemble)
# load data
data("iris")
# establish cross-validation structure
set.seed(32)
trainControl <- trainControl(method="repeatedcv",
number=5, repeats=3, # 3x 5-fold CV
search="random")
algorithmList <- c('lda', # Linear Discriminant Analysis
'rpart' , # Classification and Regression Trees
'svmRadial') # SVM with RBF Kernel
# cross-validate models from algorithmList
models <- caretList(Species~., data=iris, trControl=trainControl, methodList=algorithmList)
so far so good. however, if I add 'gbm' to my algorithmList, I get a ton of extraneous log messages because gbm seems to have a verbose=TRUE default fit param.
According to the caret docs, if I were running train on method='gbm' by itself (not along with several models trained in a caretList), I could simply add verbose=FALSE to train(), which would flow through to gbm. But this throws an error when I try it in caretList.
So I would like to pass verbose=FALSE (or any other fit param, in theory) specifically to one particular model from caretList's methodList. How can I accomplish this?
ok this is actually addressed well in the docs.
?caretList
includes:
tuneList: optional, a NAMED list of caretModelSpec objects. This is
much more flexible than methodList and allows the specificaiton of
model-specific parameters
And I've confirmed my problem is solved if instead of:
algorithmList <- c('lda', # Linear Discriminant Analysis
'rpart' , # Classification and Regression Trees
'svmRadial', # SVM with RBF Kernel
'gbm') # Gradient-boosted machines
I use:
modelTypes <- list(lda = caretModelSpec(method="lda"),
rpart = caretModelSpec(method="rpart"),
svmRadial= caretModelSpec(method="svmRadial"),
gbm = caretModelSpec(method="rf", verbose=FALSE)
...then the models <- caretList(... line goes from:
models <- caretList(... methodList=algorithmList)
to:
models <-caretList(... tuneList = modelTypes)
I am using Random forest algorithm to predict target variable "Y" have 4 values
Below syntax is used to create model
control <- trainControl(method="repeatedcv", number=2, repeats=1, search="random")
seed <- 7
metric <- "Accuracy"
set.seed(seed)
mtry <- sqrt(ncol(train))
model <- train(Target~., data=complete, method="rf", metric=metric, tuneLength=15, trControl=control)
But, when I test trained model on test dataset it does gives accuracy close to 50% only , is there any way in which accuracy can be increased close 70% and above?
library(caret)
data(iris)
train_control <- trainControl(method="repeatedcv", number=10, repeats=10)
model <- train(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width, data=iris, trControl=train_control, method="lm")
I can get the coefficients of the final selected model with model$finalModel$coefficients. Is there any way to get the coefficients for all models?
I am using caret package in order to train a K-Nearest Neigbors algorithm. For this, I am running this code:
Control <- trainControl(method="cv", summaryFunction=twoClassSummary, classProb=T)
tGrid=data.frame(k=1:100)
trainingInfo <- train(Formula, data=trainData, method = "knn",tuneGrid=tGrid,
trControl=Control, metric = "ROC")
As you can see, I am interested in obtain the AUC parameter of the ROC. This code works good but returns the testing error (which the algorithm uses for tuning the k parameter of the model) as the mean of the error of the CrossValidation folds. I am interested in return, in addition of the testing error, the training error (the mean across each fold of the error obtained with the training data). ¿How can I do it?
Thank you
What you are asking is a bad idea on multiple levels. You will grossly over-estimate the area under the ROC curve. Consider the 1-NN model: you will have perfect predictions every time.
To do this, you will need to run train again and modify the index and indexOut objects:
library(caret)
set.seed(1)
dat <- twoClassSim(200)
set.seed(2)
folds <- createFolds(dat$Class, returnTrain = TRUE)
Control <- trainControl(method="cv",
summaryFunction=twoClassSummary,
classProb=T,
index = folds,
indexOut = folds)
tGrid=data.frame(k=1:100)
set.seed(3)
a_bad_idea <- train(Class ~ ., data=dat,
method = "knn",
tuneGrid=tGrid,
trControl=Control, metric = "ROC")
Max