I'm using the Caret package in R to train a model by the method called 'xgbTree' in R.
After plotting the trained model as shown the picture below: the tuning parameter namely 'eta' = 0.2 is not what I want as I also have eta = 0.1 as tuning parameter defined in expand.grid before training the model, which is the best tune. So I want to change the eta = 0.2 in the plot to the scenario that eta = 0.1 in the plot function. How could I do it? Thank you.
set.seed(100) # For reproducibility
xgb_trcontrol = trainControl(
method = "cv",
#repeats = 2,
number = 10,
#search = 'random',
allowParallel = TRUE,
verboseIter = FALSE,
returnData = TRUE
)
xgbGrid <- expand.grid(nrounds = c(100,200,1000), # this is n_estimators in the python code above
max_depth = c(6:8),
colsample_bytree = c(0.6,0.7),
## The values below are default values in the sklearn-api.
eta = c(0.1,0.2),
gamma=0,
min_child_weight = c(5:8),
subsample = c(0.6,0.7,0.8,0.9)
)
set.seed(0)
xgb_model8 = train(
x, y_train,
trControl = xgb_trcontrol,
tuneGrid = xgbGrid,
method = "xgbTree"
)
What happens is that the plotting device plots over all values of your grid, and the last one to appear is eta=0.2. For example:
xgb_trcontrol = trainControl(method = "cv", number = 3,returnData = TRUE)
xgbGrid <- expand.grid(nrounds = c(100,200,1000),
max_depth = c(6:8),
colsample_bytree = c(0.6,0.7),
eta = c(0.1,0.2),
gamma=0,
min_child_weight = c(5:8),
subsample = c(0.6,0.7,0.8,0.9)
)
set.seed(0)
x = mtcars[,-1]
y_train = mtcars[,1]
xgb_model8 = train(
x, y_train,
trControl = xgb_trcontrol,
tuneGrid = xgbGrid,
method = "xgbTree"
)
You can save your plots like this:
pdf("plots.pdf")
plot(xgb_model8,metric="RMSE")
dev.off()
Or if you want to plot a specific parameter, for example eta = 0.2, you would also need to fix the colsample_bytree, otherwise it's too many parameters:
library(ggplot2)
ggplot(subset(xgb_model8$results
,eta==0.1 & colsample_bytree==0.6),
aes(x=min_child_weight,y=RMSE,group=factor(subsample),col=factor(subsample))) +
geom_line() + geom_point() + facet_grid(nrounds~max_depth)
Related
cv <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = prSummary,
seeds = set.seed(123))
turn_grid_xgb <- expand.grid(
eta = c(0.1,0.3,0.5),
max_depth = 5,
min_child_weight = 1,
subsample = 0.8,
colsample_bytree = 0.8,
nrounds = (1:10)*200,
gamma = 0)
set.seed(123)
suppressWarnings({
xgb_1 <- train(label~., data = baked_train,
method = "xgbTree",
tuneGrid = turn_grid_xgb,
trControl = cv,
verbose = FALSE,
metric = "F")
Hi, when I was trying to run the above code, the following warnings are shown in the R console. Does anyone know how to get rid of it? I have tried suppressWarnings() , warning = FALSE on the chunk setting, and it is still there.
thx!!
WARNING: amalgamation/../src/c_api/c_api.cc:718: `ntree_limit` is deprecated, use `iteration_range` instead.
[02:15:13] WARNING: amalgamation/../src/c_api/c_api.cc:718: `ntree_limit` is deprecated, use `iteration_range` instead.
[02:15:13] WARNING: amalgamation/../src/c_api/c_api.cc:718: `ntree_limit` is deprecated, use `iteration_range` instead.
To get rid of xgboost warnings you can set verbosity = 0 which will be passed on by caret::train to the xgboost call:
library(caret)
library(mlbench)
data(Sonar)
cv <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = prSummary,
seeds = set.seed(123))
turn_grid_xgb <- expand.grid(
eta = 0.1,
max_depth = 5,
min_child_weight = 1,
subsample = 0.8,
colsample_bytree = 0.8,
nrounds = c(1,5)*200,
gamma = 0)
set.seed(123)
xgb_1 <- train(Class~., data = Sonar,
method = "xgbTree",
tuneGrid = turn_grid_xgb,
trControl = cv,
verbose = FALSE,
metric = "F",
verbosity = 0)
I try to run a 10 fold lasso regression by using R, but when I run the tuneGrid, it shows this error and I don't know how to fix it. Here is my code:
ctrlspecs<-trainControl(method="cv",number=10, savePredictions="all", classProb=TRUE)
lambdas<-c(seq(0,2,length=3))
foldlasso<-train(y1~x1,data=train_dat, method="glm", mtryGrid=expand.grid(alpha=1,lambda=lambdas),
trControl=ctrlspecs,tuneGrid=expand.grid(.alpha=1,.lambda=lambdas),na.action=na.omit)
Clean your code!!!
ctrlspecs <-
trainControl(
method = "cv",
number = 10,
savePredictions = "all",
classProb = TRUE
)
lambdas <- c(seq(0, 2, length = 3))
foldlasso <-
train(
y1~x1,
data=train_dat,
method = "glm",
mtryGrid = expand.grid(alpha = 1, lambda = lambdas),
trControl = ctrlspecs,
na.action = na.omit
)
I am trying to predict the times table training a neural network. However, I couldn't really get how preProcess argument works in train function in Caret.
In the docs, it says:
The preProcess class can be used for many operations on predictors, including centering and scaling.
When we set preProcess like below,
tt.cv <- train(product ~ .,
data = tt.train,
method = 'neuralnet',
tuneGrid = tune.grid,
trControl = train.control,
linear.output = TRUE,
algorithm = 'backprop',
preProcess = 'range',
learningrate = 0.01)
Does it mean that the train function preprocesses (normalizes) the training data passed, in this case tt.train?
After the training is done, when we are trying to predict, do we pass normalized inputs to the predict function or are inputs normalized in the function because we set the preProcess parameter?
# Do we do
predict(tt.cv, tt.test)
# or
predict(tt.cv, tt.normalized.test)
And from the quote above, it seems that when we use preProcess, outputs are not normalized this way in training, how do we go about normalizing outputs? Or do we just normalize the training data beforehand like below and then pass it to the train function?
preProc <- preProcess(tt, method = 'range')
tt.preProcessed <- predict(preProc, tt)
tt.preProcessed.train <- tt.preProcessed[indexes,]
tt.preProcessed.test <- tt.preProcessed[-indexes,]
The whole code:
library(caret)
library(neuralnet)
# Create the dataset
tt = data.frame(multiplier = rep(1:10, times = 10), multiplicand = rep(1:10, each = 10))
tt = cbind(tt, data.frame(product = tt$multiplier * tt$multiplicand))
# Splitting
indexes = createDataPartition(tt$product,
times = 1,
p = 0.7,
list = FALSE)
tt.train = tt[indexes,]
tt.test = tt[-indexes,]
# Pre-process
preProc <- preProcess(tt, method = c('center', 'scale'))
tt.preProcessed <- predict(preProc, tt)
tt.preProcessed.train <- tt.preProcessed[indexes,]
tt.preProcessed.test <- tt.preProcessed[-indexes,]
# Train
train.control <- trainControl(method = "repeatedcv",
number = 10,
repeats = 3,
savePredictions = TRUE)
tune.grid <- expand.grid(layer1 = 8,
layer2 = 0,
layer3 = 0)
tt.cv <- train(product ~ .,
data = tt.train,
method = 'neuralnet',
tuneGrid = tune.grid,
trControl = train.control,
algorithm = 'backprop',
learningrate = 0.01,
stepmax = 100000,
preProcess = c('center', 'scale'),
lifesign = 'minimal',
threshold = 0.01)
I am using Bayesian optimization to tune the parameters of SVM for regression problem. In the following code, what should be the value of init_grid_dt = initial_grid ? I got the upper and lower bounds of the sigma and C parameters of SVM, but dont know what should be the initial-grid?
In one of the example on the web, they took a random search results as input to the initial grid. The code is as follow:
ctrl <- trainControl(method = "repeatedcv", repeats = 5)
svm_fit_bayes <- function(logC, logSigma) {
## Use the same model code but for a single (C, sigma) pair.
txt <- capture.output(
mod <- train(y ~ ., data = train_dat,
method = "svmRadial",
preProc = c("center", "scale"),
metric = "RMSE",
trControl = ctrl,
tuneGrid = data.frame(C = exp(logC), sigma = exp(logSigma)))
)
list(Score = -getTrainPerf(mod)[, "TrainRMSE"], Pred = 0)
}
lower_bounds <- c(logC = -5, logSigma = -9)
upper_bounds <- c(logC = 20, logSigma = -0.75)
bounds <- list(logC = c(lower_bounds[1], upper_bounds[1]),
logSigma = c(lower_bounds[2], upper_bounds[2]))
## Create a grid of values as the input into the BO code
initial_grid <- rand_search$results[, c("C", "sigma", "RMSE")]
initial_grid$C <- log(initial_grid$C)
initial_grid$sigma <- log(initial_grid$sigma)
initial_grid$RMSE <- -initial_grid$RMSE
names(initial_grid) <- c("logC", "logSigma", "Value")
library(rBayesianOptimization)
ba_search <- BayesianOptimization(svm_fit_bayes,
bounds = bounds,
init_grid_dt = initial_grid,
init_points = 0,
n_iter = 30,
acq = "ucb",
kappa = 1,
eps = 0.0,
verbose = TRUE)
I am using caret for modeling using "xgboost"
1- However, I get following error :
"Error: The tuning parameter grid should have columns nrounds,
max_depth, eta, gamma, colsample_bytree, min_child_weight, subsample"
The code:
library(caret)
library(doParallel)
library(dplyr)
library(pROC)
library(xgboost)
## Create train/test indexes
## preserve class indices
set.seed(42)
my_folds <- createFolds(train_churn$churn, k = 10)
# Compare class distribution
i <- my_folds$Fold1
table(train_churn$churn[i]) / length(i)
my_control <- trainControl(
summaryFunction = twoClassSummary,
classProbs = TRUE,
verboseIter = TRUE,
savePredictions = TRUE,
index = my_folds
)
my_grid <- expand.grid(nrounds = 500,
max_depth = 7,
eta = 0.1,
gammma = 1,
colsample_bytree = 1,
min_child_weight = 100,
subsample = 1)
set.seed(42)
model_xgb <- train(
class ~ ., data = train_churn,
metric = "ROC",
method = "xgbTree",
trControl = my_control,
tuneGrid = my_grid)
2- I also want to get a prediction made by averaging the predictions made by using the model fitted for each fold.
I know it's 'tad' bit late but, check your spelling of gamma in the grid of tuning parameters. You misspelled it as gammma (with triple m's).