How to use mlrMBO with mlr for hyperparameter optimisation and tuning - r

Im trying to train ML algorithms (rf, adaboost, xgboost) in R on a dataset where the target is multiclass classification. For hyperparameter tuning I use the MLR package.
My goal of the code below is to tune the parameters mtry and nodesize, but keep ntrees constant at 128 (with mlrMBO). However, I get the error message below. How can I define this in the correct way?
rdesc <- makeResampleDesc("CV",stratify = T,iters=10L)
traintask <- makeClassifTask(data = df_train,
target = "more_than_X_perc_damage")
testtask <- makeClassifTask(data = df_test,
target = "more_than_X_perc_damage")
lrn <- makeLearner("classif.randomForest",
predict.type = "prob")
# parameter space
params_to_tune <- makeParamSet(makeIntegerParam("ntree", lower = 128, upper = 128),
makeNumericParam("mtry", lower = 0, upper = 1, trafo = function(x) ceiling(x*ncol(train_x))),
makeNumericParam("nodesize",lower = 0,upper = 1, trafo = function(x) ceiling(nrow(train_x)^x)))
ctrl = makeTuneControlMBO(mbo.control=mlrMBO::makeMBOControl())
tuned_params <- tuneParams(learner = lrn,
task = traintask,
control = ctrl,
par.set = params_to_tune,
resampling = rdesc,
measure=acc)
rf_tuned_learner <- setHyperPars(learner = lrn,
par.vals = tuned_params$x)
rf_tuned_model <- mlr::train(rf_tuned_learner, traintask)
# prediction performance
pred <- predict(rf_tuned_model, testtask)
performance(pred)
calculateConfusionMatrix(pred)
stats <- confusionMatrix(pred$data$response,pred$data$truth)
acc_rf_tune <- stats$overall[1] # accuracy
print(acc_rf_tune)
Error in (function (fn, nvars, max = FALSE, pop.size = 1000, max.generations = 100, :
Domains[,1] must be less than or equal to Domains[,2]
Thanks in advance!

You can do this by not including the hyperparameter you want to keep constant in the ParamSet and instead setting it to the value you want when creating the learner.

Related

Benchmarking multiple AutoTuning instances

I have been trying to use mlr3 to do some hyperparameter tuning for xgboost. I want to compare three different models:
xgboost tuned over just the alpha hyperparameter
xgboost tuned over alpha and lambda hyperparameters
xgboost tuned over alpha, lambda, and maxdepth hyperparameters.
After reading the mlr3 book, I thought that using AutoTuner for the nested resampling and benchmarking would be the best way to go about doing this. Here is what I have tried:
task_mpcr <- TaskRegr$new(id = "mpcr", backend = data.numeric, target = "n_reads")
measure <- msr("poisson_loss")
xgb_learn <- lrn("regr.xgboost")
set.seed(103)
fivefold.cv = rsmp("cv", folds = 5)
param.list <- list( alpha = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
lambda = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
max_depth = p_int(lower = 2, upper = 10)
)
model.list <- list()
for(model.i in 1:length(param.list)){
param.list.subset <- param.list[1:model.i]
search_space <- do.call(ps, param.list.subset)
model.list[[model.i]] <- AutoTuner$new(
learner = xgb_learn,
resampling = fivefold.cv,
measure = measure,
search_space = search_space,
terminator = trm("none"),
tuner = tnr("grid_search", resolution = 10),
store_tuning_instance = TRUE
)
}
grid <- benchmark_grid(
task = task_mpcr,
learner = model.list,
resampling = rsmp("cv", folds =3)
)
bmr <- benchmark(grid, store_models = TRUE)
Note that I added Poisson loss as a measure for the count data I am working with.
For some reason after running the benchmark function, the Poisson loss of all my models is nearly identical per fold, making me think that no tuning was done.
I also cannot find a way to access the hyperparameters used to get the lowest loss per train/test iteration.
Am I misusing the benchmark function entirely?
Also, this is my first question on SO, so any formatting advice would be appreciated!
To see whether tuning has an effect, you can just add an untuned learner to the benchmark. Otherwise, the conclusion could be that tuning alpha is sufficient for your example.
I adapted the code so that it runs with an example task.
library(mlr3verse)
task <- tsk("mtcars")
measure <- msr("regr.rmse")
xgb_learn <- lrn("regr.xgboost")
param.list <- list(
alpha = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
lambda = p_dbl(lower = 0.001, upper = 100, logscale = TRUE)
)
model.list <- list()
for(model.i in 1:length(param.list)){
param.list.subset <- param.list[1:model.i]
search_space <- do.call(ps, param.list.subset)
at <- AutoTuner$new(
learner = xgb_learn,
resampling = rsmp("cv", folds = 5),
measure = measure,
search_space = search_space,
terminator = trm("none"),
tuner = tnr("grid_search", resolution = 5),
store_tuning_instance = TRUE
)
at$id = paste0(at$id, model.i)
model.list[[model.i]] <- at
}
model.list <- c(model.list, list(xgb_learn)) # add baseline learner
grid <- benchmark_grid(
task = task,
learner = model.list,
resampling = rsmp("cv", folds =3)
)
bmr <- benchmark(grid, store_models = TRUE)
autoplot(bmr)
bmr_data = bmr$data$as_data_table() # convert benchmark result to a handy data.table
bmr_data$learner[[1]]$learner$param_set$values # the final learner used by AutoTune is nested in $learner
# best found value during grid search
bmr_data$learner[[1]]$archive$best()
# transformed value (the one that is used for the learner)
bmr_data$learner[[1]]$archive$best()$x_domain
In the last lines you see how one can access the individual runs of the benchmark. Im my example we have 9 runs resulting for 3 learners and 3 outer resampling folds.

Error with SVM hyperparameter tuning in mlrMBO Bayesian optimization

I am trying to optimize an SVM for a classification task, which has worked for many other models I've tried this process on. Yet, when I used an SVM in my model based optimization function it returns an error: "Error in checkStuff(fun, design, learner, control) : Provided learner does not support factor parameters."
Attached is the relevant code. In my training task, all independent variables are numeric, the only factor is my outcome of interest.
library(mlr)
library(mlrMBO)
library(dplyr)
library(PRROC)
library(ggplot2)
library(DiceKriging)
traindf <- read.csv("/Users/njr/Google Drive/HMS IR Research/NSQIP Research/Endovascular/randomtraining.csv")
testdf <- read.csv("/Users/njr/Google Drive/HMS IR Research/NSQIP Research/Endovascular/randomtesting.csv")
traindf$Amputation<-as.factor(traindf$Amputation)
testdf$Amputation <- as.factor(testdf$Amputation)
trn.task = makeClassifTask(data = traindf, target = "Amputation", positive = "2")
test.task = makeClassifTask(data = testdf, target = "Amputation", positive = "2")
set.seed(9)
svmlrn = makeLearner("classif.svm", predict.type = "prob")
svm_model <- mlr::train(svmlrn, task = trn.task)
res = makeResampleDesc("CV", iters = 10, stratify = TRUE)
par5 = makeParamSet(
makeDiscreteParam("kernel", values = c("radial", "polynomial", "linear")),
makeNumericParam("cost", -15, 15, trafo = function(x) 2^x),
makeNumericParam("gamma", -15, 15, trafo = function(x) 2^x, requires = quote(kernel == "radial")),
makeIntegerParam("degree", lower = 1, upper = 4, requires = quote(kernel == "polynomial"))
)
mbo.ctrl = makeMBOControl()
mbo.ctrl = setMBOControlInfill(mbo.ctrl, crit = crit.ei)
mbo.ctrl = setMBOControlTermination(mbo.ctrl, iters = 35, max.evals = 25)
design.mat = generateRandomDesign(n = 50, par.set = par5)
surrogate.lrn = makeLearner("regr.km", predict.type = "se")
ctrl = mlr::makeTuneControlMBO(learner = surrogate.lrn, mbo.control = mbo.ctrl, mbo.design = design.mat)
parallelStartMulticore(cpus = 8L)
res.mbo = tuneParams(makeLearner("classif.svm"), trn.task, resampling = res, par.set = par5, control = ctrl,
show.info = TRUE, measures = auc)
parallelStop()
this is the traceback error code:
6.
stop("Provided learner does not support factor parameters.")
5.
checkStuff(fun, design, learner, control)
4.
initOptProblem(fun = fun, design = design, learner = learner, control = control, show.info = show.info, more.args = more.args)
3.
mlrMBO::mbo(tff, design = control$mbo.design, learner = control$learner, control = mbo.control, show.info = FALSE)
2.
sel.func(learner, task, resampling, measures, par.set, control, opt.path, show.info, resample.fun)
1.
tuneParams(makeLearner("classif.svm"), trn.task, resampling = res, par.set = par5, control = ctrl, show.info = TRUE, measures = auc)
The problem is that your parameter set has a categorical parameter (kernel) and the surrogate model you're using (regr.km) doesn't support that. You could try for example a random forest as surrogate model instead.

Visualising classif.rpart tree from mlr package in r

May I know how to visualize the tree model generated in mlr classif.rpart or is it possible to print the tree rules
makeatree <- makeLearner("classif.rpart", predict.type = "response")
set_cv <- makeResampleDesc("CV",iters = 3L)
gs <- makeParamSet(makeIntegerParam("minsplit",lower = 10, upper = 50),makeIntegerParam("minbucket", lower = 5, upper = 50),makeNumericParam("cp", lower = 0.001, upper = 0.2))
gscontrol <- makeTuneControlGrid()
stune <- tuneParams(learner = makeatree, resampling = set_cv, task = trainTask, par.set = gs, control = gscontrol, measures = acc)
t.tree <- setHyperPars(makeatree, par.vals = stune$x)
tpmodel <- predict(t.rpart, testTask)
t.rpart <- train(t.tree, trainTask)
here Visualizing the tree model in rpart.plot() will not work as this is not rpart object, Thanks in advance

auc in mlr benchmark experiment for classification problem gives error (requires predict type to be: 'prob')

I am conducting a benchmark analysis using the mlr package and would like to use auc as my performance measure. I have specified predict.type = "prob" and am still getting the following error message:
0001: Error in FUN(X[[i]], ...) :
Measure auc requires predict type to be: 'prob'!
My code:
#define measures
meas <- list(acc, mlr::auc, brier)
##random forest
p_length <- ncol(training_complete) - 1
lrn_RF = makeLearner("classif.randomForest", predict.type = "prob", par.vals = list("ntree" = 500L))
wcw_lrn_RF = makeWeightedClassesWrapper(lrn_RF, wcw.weight = 0.10) #weighted class wrapper
parsRF = makeParamSet(
makeIntegerParam("mtry", lower = 1 , upper = floor(0.4*p_length)),
makeIntegerParam("nodesize", lower = 10, upper = 50))
tuneRF = makeTuneControlRandom(maxit = 100)
inner = makeResampleDesc("CV", iters = 2)
learnerRF = makeTuneWrapper(lrn_RF, resampling = inner, meas, par.set = parsRF, control = tuneRF, show.info = FALSE)
##extreme gradient boosting
lrn_xgboost <- makeLearner(
"classif.xgboost",
predict.type = "prob", #before was response
par.vals = list(objective = "binary:logistic", eval_metric = "error", nrounds = 200))
getParamSet("classif.xgboost")
pars_xgboost = makeParamSet(
makeIntegerParam("nrounds", lower = 100, upper = 500),
makeIntegerParam("max_depth", lower = 1, upper = 10),
makeNumericParam("eta", lower = .1, upper = .5),
makeNumericParam("lambda", lower = -1, upper = 0, trafo = function(x) 10^x))
tunexgboost = makeTuneControlRandom(maxit = 50)
inner = makeResampleDesc("CV", iters = 2)
learnerxgboost = makeTuneWrapper(lrn_xgboost, resampling = inner, meas, par.set = pars_xgboost,control = tunexgboost, show.info = FALSE)
##Benchmarking via outer resampling loop
#Learners to be compared
lrns = list(
makeLearner("classif.featureless"),
learnerRF,
learnerxgboost
)
#outer resampling strategy
rdesc = makeResampleDesc("CV", iters = 5)
library(methods)
library(parallel)
library(parallelMap)
set.seed(123, "L'Ecuyer")
parallelStartSocket(parallel::detectCores(), level = "mlr.resample")
churn_benchmarking <- benchmark(learners = lrns,
tasks = trainTask,
resamplings = rdesc,
models = FALSE,
measures = meas)
parallelStop()
Any hint is highly appreciated!
I can see one problem. Your featureless learner is not providing probabilities.
Write makeLearner("classif.featureless", predict.type = "prob") instead.

Combining getOOBPreds with nested resampling and parameter tuning

In the package R::mlr I read from the tutorial that the getOOBPreds function that I can access the out-of-bag predictions from say a random forest model, but I cannot figure out how to use this in a nested resampling procedure designed to tune hyperparameters.
I understand that the inner loop shoud be somehow
Thanks for sharing insights / hints !
I tried as inner loop:
makeTuneWrapper(lrnr,
resampling = "oob",
par.set = params,
control = ctrl,
show.info = TRUE,
measures = list(logloss, multiclass.brier,
timetrain))
... but the value "oob" for parameter resampling is not valid.
tentative MRE:
library(mlr)
# Task
tsk = iris.task
# Learner
lrnr <- makeLearner("classif.randomForestSRC", predict.type = "prob")
# Hyperparameters
params <- makeParamSet(makeIntegerParam("mtry",lower = 2,upper = 10),
makeIntegerParam("nodesize",lower = 1,upper = 100),
makeIntegerParam("nsplit",lower = 1,upper = 20))
# Validation strategy
rdesc_inner_oob <- makeResampleDesc("oob") # FAILS
ctrl <- makeTuneControlRandom(maxit = 10L)
tuning_lrnr = makeTuneWrapper(lrnr,
# resampling = oob, # ALSO WRONG
resampling = rdesc_inner_oob,
par.set = params,
control = ctrl,
measures = list(logloss, multiclass.brier, timetrain))
outer = makeResampleDesc("CV", iters = 3)
r = resample(learner = tuning_lrnr,
task = tsk,
resampling = outer,
extract = getOOBPreds,
show.info = TRUE,
measures = list(multiclass.brier))

Resources