I'm trying to train a random forest model using caret in R. I want to tune the parameters to get the best values, using the expand.grid function. However, I keep getting this error:
Error: The tuning parameter grid should have columns mtry
This is my code. The data I use here is called scoresWithResponse:
ctrlCV = trainControl(method = 'cv', number = 10 , classProbs = TRUE , savePredictions = TRUE, summaryFunction = twoClassSummary )
rfGRID = expand.grid(interaction.depth = c(2, 3, 5, 6, 7, 8, 10),
n.trees = c(50,75,100,125,150,200,250),
shrinkage = seq(.005, .2,.005),
n.minobsinnode = c(5,7,10, 12 ,15, 20),
nodesize = c(1:10),
mtry = c(1:10))
RF_loop_trn = c()
RF_loop_tst = c()
for(i in (1:5)){
print(i)
IND = createDataPartition(y = scoresWithResponse$response, p=0.75, list = FALSE)
scoresWithResponse.trn = scoresWithResponse[IND, ]
scoresWithResponse.tst = scoresWithResponse[-IND,]
rfFit = train(response~., data = scoresWithResponse.trn,
importance = TRUE,
method = "rf",
metric="ROC",
trControl = ctrlCV,
tuneGrid = rfGRID,
classProbs = TRUE,
summaryFunction = twoClassSummary
)
RF_loop_trn[i] = auc(roc(scoresWithResponse.trn$response,predict(rfFit,scoresWithResponse.trn, type='prob')[,1]))
RF_loop_tst[i] = ahaveroc(scoresWithResponse.tst$response,predict(rfFit,scoresWithResponse.tst, type='prob')[,1]))
}
After investigating for some time, there has been several suggestions like redownloading the caret package from github, adding a . before each parameter in expand.grid, adding a dot only before the mtry parameter (something like .mtry), adding the mtry to the train function instead expand.grid.. I tried all that and they all produce the same error.
Where and how should I add the mtry parameter? what is causing this error?
I can't comment yet so I'll reply
Have you tried to insert the mtry argument directly into the train function? for example:
rfGRID = expand.grid(interaction.depth = c(2, 3, 5, 6, 7, 8, 10),
n.trees = c(50,75,100,125,150,200,250),
shrinkage = seq(.005, .2,.005),
n.minobsinnode = c(5,7,10, 12 ,15, 20),
nodesize = c(1:10)
)
rfFit = train(response~., data = scoresWithResponse.trn,
importance = TRUE,
method = "rf",
metric="ROC",
trControl = ctrlCV,
tuneGrid = rfGRID,
classProbs = TRUE,
summaryFunction = twoClassSummary,
mtry = 1000
)
Related
cv <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = prSummary,
seeds = set.seed(123))
turn_grid_xgb <- expand.grid(
eta = c(0.1,0.3,0.5),
max_depth = 5,
min_child_weight = 1,
subsample = 0.8,
colsample_bytree = 0.8,
nrounds = (1:10)*200,
gamma = 0)
set.seed(123)
suppressWarnings({
xgb_1 <- train(label~., data = baked_train,
method = "xgbTree",
tuneGrid = turn_grid_xgb,
trControl = cv,
verbose = FALSE,
metric = "F")
Hi, when I was trying to run the above code, the following warnings are shown in the R console. Does anyone know how to get rid of it? I have tried suppressWarnings() , warning = FALSE on the chunk setting, and it is still there.
thx!!
WARNING: amalgamation/../src/c_api/c_api.cc:718: `ntree_limit` is deprecated, use `iteration_range` instead.
[02:15:13] WARNING: amalgamation/../src/c_api/c_api.cc:718: `ntree_limit` is deprecated, use `iteration_range` instead.
[02:15:13] WARNING: amalgamation/../src/c_api/c_api.cc:718: `ntree_limit` is deprecated, use `iteration_range` instead.
To get rid of xgboost warnings you can set verbosity = 0 which will be passed on by caret::train to the xgboost call:
library(caret)
library(mlbench)
data(Sonar)
cv <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = prSummary,
seeds = set.seed(123))
turn_grid_xgb <- expand.grid(
eta = 0.1,
max_depth = 5,
min_child_weight = 1,
subsample = 0.8,
colsample_bytree = 0.8,
nrounds = c(1,5)*200,
gamma = 0)
set.seed(123)
xgb_1 <- train(Class~., data = Sonar,
method = "xgbTree",
tuneGrid = turn_grid_xgb,
trControl = cv,
verbose = FALSE,
metric = "F",
verbosity = 0)
I try to run a 10 fold lasso regression by using R, but when I run the tuneGrid, it shows this error and I don't know how to fix it. Here is my code:
ctrlspecs<-trainControl(method="cv",number=10, savePredictions="all", classProb=TRUE)
lambdas<-c(seq(0,2,length=3))
foldlasso<-train(y1~x1,data=train_dat, method="glm", mtryGrid=expand.grid(alpha=1,lambda=lambdas),
trControl=ctrlspecs,tuneGrid=expand.grid(.alpha=1,.lambda=lambdas),na.action=na.omit)
Clean your code!!!
ctrlspecs <-
trainControl(
method = "cv",
number = 10,
savePredictions = "all",
classProb = TRUE
)
lambdas <- c(seq(0, 2, length = 3))
foldlasso <-
train(
y1~x1,
data=train_dat,
method = "glm",
mtryGrid = expand.grid(alpha = 1, lambda = lambdas),
trControl = ctrlspecs,
na.action = na.omit
)
I am conducting knn regression on my data, and would like to:
a) cross-validate through repeatedcv to find an optimal k;
b) when building knn model, using PCA at 90% level threshold to reduce dimensionality.
library(caret)
library(dplyr)
set.seed(0)
data = cbind(rnorm(15, 100, 10), matrix(rnorm(300, 10, 5), ncol = 20)) %>%
data.frame()
colnames(data) = c('True', paste0('Day',1:20))
tr = data[1:10, ] #training set
tt = data[11:15,] #test set
train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)
k = train(True ~ .,
method = "knn",
tuneGrid = expand.grid(k = 1:10),
trControl = train.control,
preProcess = c('scale','pca'),
metric = "RMSE",
data = tr)
My question is: currently the PCA threshold is by default 95% (not sure), how can I change it to 80%?
You can try to add preProcOptions argument in trainControl
train.control = trainControl(method = "repeatedcv", number = 5, repeats=3, preProcOptions = list(thresh = 0.80))
I am using Caret R package to train an SVM modell. My code is as follows:
options(show.error.locations = TRUE)
svmTrain <- function(svmType, subsetSizes, data, seeds, metric){
svmFuncs$summary <- function(...) c(twoClassSummary(...), defaultSummary(...), prSummary(...))
data_x <- data.frame(data[,2:ncol(data)])
data_y <- unlist(data[,1])
FSctrl <- rfeControl(method = "cv",
number = 10,
rerank = TRUE,
verbose = TRUE,
functions = svmFuncs,
saveDetails = TRUE,
seeds = seeds
)
TRctrl <- trainControl(method = "cv",
savePredictions = TRUE,
classProbs = TRUE,
verboseIter = TRUE,
sampling = "down",
number = 10,
search = "random",
repeats = 3,
returnResamp = "all",
allowParallel = TRUE
)
svmProf <- rfe( x = data_x,
y = data_y,
sizes = subsetSizes,
metric = metric,
rfeControl = FSctrl,
method = svmType,
preProc = c("center", "scale"),
trControl = TRctrl,
selectSize = pickSizeBest(data, metric = "AUC", maximize = TRUE),
tuneLength = 5
)
}
data1a = openTable(3, 'a')
data1b = openTable(3, 'b')
data = rbind(data1a, data1b)
last <- roundToTens(ncol(data)-1)
subsetSizes <- c( 3:9, seq(10, last, 10) )
svmTrain <- svmTrain("svmRadial", subsetSizes, data, seeds, "AUC")
When I comment out pickSizeBest row, the algorithm runs fine. However, when I do not comment, it gives the following error:
Error in { (from svm.r#58) : task 1 failed - "Stopping"
Row 58 is svmProf <- rfe( x = data_x,..
I tried to look up if I use pickSizeBest the wrong way, but I cannot find the problem. Could somebody help me?
Many thanks!
EDIT: I just realized that pickSizeBest (data, ...) should not use data. However, I still do not know what should be add there.
I can't run your example, but I would suggest that you just pass the function pickSizeBest, i.e.:
[...]
trControl = TRctrl,
selectSize = pickSizeBest,
tuneLength = 5
[...]
The functionality is described here:
http://topepo.github.io/caret/recursive-feature-elimination.html#backwards-selection
I'm trying to use the rfe function from caret package but i can't make it work for the gbm model using the ROC metric.
I found some insights there:
Feature Selection in caret rfe + sum with ROC
http://www.cybaea.net/Blogs/Feature-selection-Using-the-caret-package.html
I've ended with this piece of code :
gbmFuncs <- treebagFuncs
gbmFuncs$fit <- function (x, y, first, last, ...) {
library("gbm")
n.levels <- length(unique(y))
if ( n.levels == 2 ) {
distribution = "bernoulli"
} else {
distribution = "gaussian"
}
gbm.fit(x, y, distribution = distribution, ...)
}
gbmFuncs$pred <- function (object, x) {
n.trees <- suppressWarnings(gbm.perf(object,
plot.it = FALSE,
method = "OOB"))
if ( n.trees <= 0 ) n.trees <- object$n.trees
predict(object, x, n.trees = n.trees, type = "link")
}
control <- rfeControl(functions = gbmFuncs, method = "cv", verbose = TRUE, returnResamp="final",
number = 5)
trainctrl <- trainControl(classProbs= TRUE,
summaryFunction = twoClassSummary)
gbmFit_bernoulli_sel <- rfe(data_model[x, -as.numeric(y)+2,
sizes=c(10, 15, 20, 30, 40, 50), rfeControl = control, verbose = FALSE,
interaction.depth = 14, n.trees = 10000, shrinkage = .01, metric="ROC",
trControl = trainctrl)
But I get this error :
Error in { :
task 1 failed - "argument inutilisé (trControl = list(method = "boot", number = 25, repeats = 25, p = 0.75, initialWindow = NULL, horizon = 1, fixedWindow = TRUE, verboseIter = FALSE, returnData = TRUE, returnResamp = "final", savePredictions = FALSE, classProbs = TRUE, summaryFunction = function (data, lev = NULL, model = NULL)
{
require(pROC)
if (!all(levels(data[, "pred"]) == levels(data[, "obs"]))) stop("levels of observed and predicted data do not match")
rocObject <- try(pROC::roc(data$obs, data[, lev[1]]), silent = TRUE)
rocAUC <- if (class(rocObject)[1] == "try-error") NA else rocObject$auc
out <- c(rocAUC, sensitivity(data[, "pred"], data[, "obs"], lev[1]), specificity(data[, "pred"], data[, "obs"], lev[2]))
names(out) <- c("ROC", "Sens", "Spec")
out
EDIT
Work with this code :
caretFuncs$summary <- twoClassSummary
controlrfe <- rfeControl(functions = caretFuncs, method = "cv", number = 3, verbose = TRUE)
gbmGrid <- expand.grid(interaction.depth = 5, n.trees = 1000, shrinkage = .01)
confroltrain <- trainControl(method = "none", classProbs=T, summaryFunction = twoClassSummary, verbose = TRUE)
gbmFit_bernoulli_sel <- rfe(data_model[,-ncol(data_model)], data_model[,ncol(data_model)],
sizes=c(10,15), rfeControl = controlrfe, metric="ROC",
trControl = confroltrain, tuneGrid=gbmGrid, method="gbm")
I had to use the train function because when I used gbmFuncs, I had some problem apparently because gbm.fit need a numeric target variable but the ROC metric evaluation need a factor.
Thanks for you help.
You are trying to pass trControl to gbm.fit. Connect the (three) dots =]
Try removing trControl = trainctrl.
Max