CARET xgbtree warning: `ntree_limit` is deprecated, use `iteration_range` instead - r

cv <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = prSummary,
seeds = set.seed(123))
turn_grid_xgb <- expand.grid(
eta = c(0.1,0.3,0.5),
max_depth = 5,
min_child_weight = 1,
subsample = 0.8,
colsample_bytree = 0.8,
nrounds = (1:10)*200,
gamma = 0)
set.seed(123)
suppressWarnings({
xgb_1 <- train(label~., data = baked_train,
method = "xgbTree",
tuneGrid = turn_grid_xgb,
trControl = cv,
verbose = FALSE,
metric = "F")
Hi, when I was trying to run the above code, the following warnings are shown in the R console. Does anyone know how to get rid of it? I have tried suppressWarnings() , warning = FALSE on the chunk setting, and it is still there.
thx!!
WARNING: amalgamation/../src/c_api/c_api.cc:718: `ntree_limit` is deprecated, use `iteration_range` instead.
[02:15:13] WARNING: amalgamation/../src/c_api/c_api.cc:718: `ntree_limit` is deprecated, use `iteration_range` instead.
[02:15:13] WARNING: amalgamation/../src/c_api/c_api.cc:718: `ntree_limit` is deprecated, use `iteration_range` instead.

To get rid of xgboost warnings you can set verbosity = 0 which will be passed on by caret::train to the xgboost call:
library(caret)
library(mlbench)
data(Sonar)
cv <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = prSummary,
seeds = set.seed(123))
turn_grid_xgb <- expand.grid(
eta = 0.1,
max_depth = 5,
min_child_weight = 1,
subsample = 0.8,
colsample_bytree = 0.8,
nrounds = c(1,5)*200,
gamma = 0)
set.seed(123)
xgb_1 <- train(Class~., data = Sonar,
method = "xgbTree",
tuneGrid = turn_grid_xgb,
trControl = cv,
verbose = FALSE,
metric = "F",
verbosity = 0)

Related

Error: The tuning parameter grid should have columns mtry

I'm trying to train a random forest model using caret in R. I want to tune the parameters to get the best values, using the expand.grid function. However, I keep getting this error:
Error: The tuning parameter grid should have columns mtry
This is my code. The data I use here is called scoresWithResponse:
ctrlCV = trainControl(method = 'cv', number = 10 , classProbs = TRUE , savePredictions = TRUE, summaryFunction = twoClassSummary )
rfGRID = expand.grid(interaction.depth = c(2, 3, 5, 6, 7, 8, 10),
n.trees = c(50,75,100,125,150,200,250),
shrinkage = seq(.005, .2,.005),
n.minobsinnode = c(5,7,10, 12 ,15, 20),
nodesize = c(1:10),
mtry = c(1:10))
RF_loop_trn = c()
RF_loop_tst = c()
for(i in (1:5)){
print(i)
IND = createDataPartition(y = scoresWithResponse$response, p=0.75, list = FALSE)
scoresWithResponse.trn = scoresWithResponse[IND, ]
scoresWithResponse.tst = scoresWithResponse[-IND,]
rfFit = train(response~., data = scoresWithResponse.trn,
importance = TRUE,
method = "rf",
metric="ROC",
trControl = ctrlCV,
tuneGrid = rfGRID,
classProbs = TRUE,
summaryFunction = twoClassSummary
)
RF_loop_trn[i] = auc(roc(scoresWithResponse.trn$response,predict(rfFit,scoresWithResponse.trn, type='prob')[,1]))
RF_loop_tst[i] = ahaveroc(scoresWithResponse.tst$response,predict(rfFit,scoresWithResponse.tst, type='prob')[,1]))
}
After investigating for some time, there has been several suggestions like redownloading the caret package from github, adding a . before each parameter in expand.grid, adding a dot only before the mtry parameter (something like .mtry), adding the mtry to the train function instead expand.grid.. I tried all that and they all produce the same error.
Where and how should I add the mtry parameter? what is causing this error?
I can't comment yet so I'll reply
Have you tried to insert the mtry argument directly into the train function? for example:
rfGRID = expand.grid(interaction.depth = c(2, 3, 5, 6, 7, 8, 10),
n.trees = c(50,75,100,125,150,200,250),
shrinkage = seq(.005, .2,.005),
n.minobsinnode = c(5,7,10, 12 ,15, 20),
nodesize = c(1:10)
)
rfFit = train(response~., data = scoresWithResponse.trn,
importance = TRUE,
method = "rf",
metric="ROC",
trControl = ctrlCV,
tuneGrid = rfGRID,
classProbs = TRUE,
summaryFunction = twoClassSummary,
mtry = 1000
)

Error: The tuning parameter grid should have columns parameter

I try to run a 10 fold lasso regression by using R, but when I run the tuneGrid, it shows this error and I don't know how to fix it. Here is my code:
ctrlspecs<-trainControl(method="cv",number=10, savePredictions="all", classProb=TRUE)
lambdas<-c(seq(0,2,length=3))
foldlasso<-train(y1~x1,data=train_dat, method="glm", mtryGrid=expand.grid(alpha=1,lambda=lambdas),
trControl=ctrlspecs,tuneGrid=expand.grid(.alpha=1,.lambda=lambdas),na.action=na.omit)
Clean your code!!!
ctrlspecs <-
trainControl(
method = "cv",
number = 10,
savePredictions = "all",
classProb = TRUE
)
lambdas <- c(seq(0, 2, length = 3))
foldlasso <-
train(
y1~x1,
data=train_dat,
method = "glm",
mtryGrid = expand.grid(alpha = 1, lambda = lambdas),
trControl = ctrlspecs,
na.action = na.omit
)

How to interpret/tune a multinomial classification with caret-GBM?

Two questions
Visualizing the error of a model
Calculating the log loss
(1) I'm trying to tune a multinomial GBM classifier, but I'm not sure how to adapt to the outputs. I understand that LogLoss is meant to be minimized, but in the below plot, for any range of iterations or trees, it only appears to increase.
inTraining <- createDataPartition(final_data$label, p = 0.80, list = FALSE)
training <- final_data[inTraining,]
testing <- final_data[-inTraining,]
fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3, verboseIter = FALSE, savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)
gbmGrid1 <- expand.grid(.interaction.depth = (1:5)*2, .n.trees = (1:10)*25, .shrinkage = 0.1, .n.minobsinnode = 10)
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
verbose = 1, metric = "ROC", tuneGrid = gbmGrid1)
plot(gbmFit1)
--
(2) on a related note, when I try to directly investigate mnLogLoss I get this error, which keeps me from trying to quantify the error.
mnLogLoss(testing, levels(testing$label)) : 'lev' cannot be NULL
I suspect you set the learning rate too high. So using an example dataset:
final_data = iris
final_data$label=final_data$Species
final_data$Species=NULL
inTraining <- createDataPartition(final_data$label, p = 0.80, list = FALSE)
training <- final_data[inTraining,]
testing <- final_data[-inTraining,]
fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3,
verboseIter = FALSE, savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)
gbmGrid1 <- expand.grid(.interaction.depth = 1:3, .n.trees = (1:10)*10, .shrinkage = 0.1, .n.minobsinnode = 10)
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
verbose = 1, tuneGrid = gbmGrid1,metric="logLoss")
plot(gbmFit1)
A bit different from yours but you can see the upward trend after 20. It really depends on your data but if you have a high learning rate, you arrive very quickly at a minimum and anything after that introduces noise. You can see this illustration from Boehmke's book and also check out a more statistics based discussion.
Let's lower the learning rate and you can see:
gbmGrid1 <- expand.grid(.interaction.depth = 1:3, .n.trees = (1:10)*10, .shrinkage = 0.01, .n.minobsinnode = 10)
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
verbose = 1, tuneGrid = gbmGrid1,metric="logLoss")
plot(gbmFit1)
Note that you most likely need more iterations to reach a lower loss, like what you see with the first.

Change tuning parameters shown in the plot created by Caret in R

I'm using the Caret package in R to train a model by the method called 'xgbTree' in R.
After plotting the trained model as shown the picture below: the tuning parameter namely 'eta' = 0.2 is not what I want as I also have eta = 0.1 as tuning parameter defined in expand.grid before training the model, which is the best tune. So I want to change the eta = 0.2 in the plot to the scenario that eta = 0.1 in the plot function. How could I do it? Thank you.
set.seed(100) # For reproducibility
xgb_trcontrol = trainControl(
method = "cv",
#repeats = 2,
number = 10,
#search = 'random',
allowParallel = TRUE,
verboseIter = FALSE,
returnData = TRUE
)
xgbGrid <- expand.grid(nrounds = c(100,200,1000), # this is n_estimators in the python code above
max_depth = c(6:8),
colsample_bytree = c(0.6,0.7),
## The values below are default values in the sklearn-api.
eta = c(0.1,0.2),
gamma=0,
min_child_weight = c(5:8),
subsample = c(0.6,0.7,0.8,0.9)
)
set.seed(0)
xgb_model8 = train(
x, y_train,
trControl = xgb_trcontrol,
tuneGrid = xgbGrid,
method = "xgbTree"
)
What happens is that the plotting device plots over all values of your grid, and the last one to appear is eta=0.2. For example:
xgb_trcontrol = trainControl(method = "cv", number = 3,returnData = TRUE)
xgbGrid <- expand.grid(nrounds = c(100,200,1000),
max_depth = c(6:8),
colsample_bytree = c(0.6,0.7),
eta = c(0.1,0.2),
gamma=0,
min_child_weight = c(5:8),
subsample = c(0.6,0.7,0.8,0.9)
)
set.seed(0)
x = mtcars[,-1]
y_train = mtcars[,1]
xgb_model8 = train(
x, y_train,
trControl = xgb_trcontrol,
tuneGrid = xgbGrid,
method = "xgbTree"
)
You can save your plots like this:
pdf("plots.pdf")
plot(xgb_model8,metric="RMSE")
dev.off()
Or if you want to plot a specific parameter, for example eta = 0.2, you would also need to fix the colsample_bytree, otherwise it's too many parameters:
library(ggplot2)
ggplot(subset(xgb_model8$results
,eta==0.1 & colsample_bytree==0.6),
aes(x=min_child_weight,y=RMSE,group=factor(subsample),col=factor(subsample))) +
geom_line() + geom_point() + facet_grid(nrounds~max_depth)

Unable to run caret xgboost classification

I was trying to use xgboost for classification of the iris data, but face this error.
"Error in frankv(predicted) : x is a list, 'cols' can not be 0-length
In addition: Warning message:
In train.default(x_train, y_train, trControl = ctrl, tuneGrid = xgbgrid, :
cannnot compute class probabilities for regression"
I am using the following code. Any help or explanation will be highly appreciated.
data(iris)
library(caret)
library(dplyr)
library(xgboost)
set.seed(123)
index <- createDataPartition(iris$Species, p=0.8, list = FALSE)
trainData <- iris[index,]
testData <- iris[-index,]
x_train = xgb.DMatrix(as.matrix(trainData %>% select(-Species)))
y_train = as.numeric(trainData$Species)
#### Generic control parametrs
ctrl <- trainControl(method="repeatedcv",
number=10,
repeats=5,
savePredictions=TRUE,
classProbs=TRUE,
summaryFunction = twoClassSummary)
xgbgrid <- expand.grid(nrounds = 10,
max_depth = 5,
eta = 0.05,
gamma = 0.01,
colsample_bytree = 0.75,
min_child_weight = 0,
subsample = 0.5,
objective = "binary:logitraw",
eval_metric = "error")
set.seed(123)
xgb_model = train(x_train,
y_train,
trControl = ctrl,
tuneGrid = xgbgrid,
method = "xgbTree")
There are a few issues:
The outcome variable should be a factor.
The tune grid has parameters that are not used by caret's tune grid.
Since there are three levels, using a two class summary would be inappropriate. A multiclass summary is used with summaryFunction = multiClassSummary.
A working example:
data(iris)
library(caret)
library(dplyr)
library(xgboost)
set.seed(123)
index <- createDataPartition(iris$Species, p=0.8, list = FALSE)
trainData <- iris[index,]
testData <- iris[-index,]
x_train = xgb.DMatrix(as.matrix(trainData %>% select(-Species)))
y_train = as.factor(trainData$Species)
#### Generic control parametrs
ctrl <- trainControl(method="repeatedcv",
number=10,
repeats=5,
savePredictions=TRUE,
classProbs=TRUE,
summaryFunction = multiClassSummary)
xgbgrid <- expand.grid(nrounds = 10,
max_depth = 5,
eta = 0.05,
gamma = 0.01,
colsample_bytree = 0.75,
min_child_weight = 0,
subsample = 0.5)
set.seed(123)
x_train
xgb_model = train(x_train,
y_train,
trControl = ctrl,
method = "xgbTree",
tuneGrid = xgbgrid)
xgb_model

Resources