Tuning GLMNET using mlr3 - r

MLR3 is really cool. I am trying to tune the regularisation prarameter
searchspace_glmnet_trafo = ParamSet$new(list(
ParamDbl$new("regr.glmnet.lambda", log(0.01), log(10))
))
searchspace_glmnet_trafo$trafo = function(x, param_set) {
x$regr.glmnet.lambda = (exp(x$regr.glmnet.lambda))
x
}
but get the error
Error in glmnet::cv.glmnet(x = data, y = target, family = "gaussian", :
Need more than one value of lambda for cv.glmnet
A minimum non-working example is below. Any help is greatly appreciated.
library(mlr3verse)
data("kc_housing", package = "mlr3data")
library(anytime)
dates = anytime(kc_housing$date)
kc_housing$date = as.numeric(difftime(dates, min(dates), units = "days"))
kc_housing$zipcode = as.factor(kc_housing$zipcode)
kc_housing$renovated = as.numeric(!is.na(kc_housing$yr_renovated))
kc_housing$has_basement = as.numeric(!is.na(kc_housing$sqft_basement))
kc_housing$id = NULL
kc_housing$price = kc_housing$price / 1000
kc_housing$yr_renovated = NULL
kc_housing$sqft_basement = NULL
lrnglm=lrn("regr.glmnet")
kc_housing
tsk = TaskRegr$new("sales", kc_housing, target = "price")
fencoder = po("encode", method = "treatment",
affect_columns = selector_type("factor"))
pipe = fencoder %>>% lrnglm
glearner = GraphLearner$new(pipe)
glearner$train(tsk)
searchspace_glmnet_trafo = ParamSet$new(list(
ParamDbl$new("regr.glmnet.lambda", log(0.01), log(10))
))
searchspace_glmnet_trafo$trafo = function(x, param_set) {
x$regr.glmnet.lambda = (exp(x$regr.glmnet.lambda))
x
}
inst = TuningInstance$new(
tsk, glearner,
rsmp("cv"), msr("regr.mse"),
searchspace_glmnet_trafo, term("evals", n_evals = 100)
)
gsearch = tnr("grid_search", resolution = 100)
gsearch$tune(inst)

lambda needs to be a vector param, not a single value (as the message tells).
I suggest to not tune cv.glmnet.
This algorithm does an internal 10-fold CV optimization and relies on its own sequence for lambda.
Consult the help page of the learner for more information.
You can apply your own tuning (tuning of param s, not lambda) on glmnet::glmnet(). However, this algorithm is not (yet) available for use with {mlr3}.

Related

Error in UUIDgenerate() : Too many DLL modules. in mlr3 pakcage

I using the mlr3 package for autotuning ML models (mlr3pipelines graph, to be more correct).
It is very hard to reproduce the problem because the error occurs occasionally. The same code sometimes returns an error and sometimes doesn't.
Here is the code snippet
learners_l = list(
ranger = lrn("classif.ranger", predict_type = "prob", id = "ranger"),
log_reg = lrn("classif.log_reg", predict_type = "prob", id = "log_reg")
)
# create complete grapg
graph = po("removeconstants", ratio = 0.05) %>>%
po("branch", options = c("nop_prep", "yeojohnson", "pca", "ica"), id = "prep_branch") %>>%
gunion(list(po("nop", id = "nop_prep"), po("yeojohnson"), po("pca", scale. = TRUE), po("ica"))) %>>%
po("unbranch", id = "prep_unbranch") %>>%
learners_l %>>%
po("classifavg", innum = length(learners))
graph_learner = as_learner(graph)
search_space = ps(
prep_branch.selection = p_fct(levels = c("nop_prep", "yeojohnson", "pca", "ica")),
pca.rank. = p_int(2, 6, depends = prep_branch.selection == "pca"),
ica.n.comp = p_int(2, 6, depends = prep_branch.selection == "ica"),
yeojohnson.standardize = p_lgl(depends = prep_branch.selection == "yeojohnson"),
ranger.ranger.mtry.ratio = p_dbl(0.2, 1),
ranger.ranger.max.depth = p_int(2, 6)
)
at_classif = auto_tuner(
method = "random_search",
learner = graph_learner,
resampling = rsmp("cv", folds = 3),
measure = msr("classif.acc"),
search_space = search_space,
term_evals = 20
)
at_classif$train(task_classif)
You can use any task you want.
The error I get is:
INFO [15:05:33.610] [bbotk] Starting to optimize 6 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=20, k=0]'
INFO [15:05:33.653] [bbotk] Evaluating 1 configuration(s)
Error in UUIDgenerate() : Too many DLL modules.
There is a fixed buffer for loading RNG functions in uuid which will fail if there are too many DLLs loaded already. A simple work-around is to run
library(uuid)
UUIDgenerate()
before other packages which will force the loading of the RNG functions early.
(#12 now tracks the underlying issue and should be fixed in uuid 1.0-3).

Using target risk or target return in R package fPortfolio

I use the R package fPortfolio for portfolio optimization for a rolling portfolio (adaptive asset allocation). Therefore, I use the backtesting function.
I aim at constructing a portfolio for a set of assets for a predefined target return (and minimized risk) or for a predefined target risk and maximized returns.
Even allowing for short selling (as proposed in another post 5 years ago) seems not to work. Besides, I do not want to allow for short selling in my approach.
I cannot figure out why changing values for target return or target risk do not influence the solution at all.
Where do I go wrong?
require(quantmod)
require(fPortfolio)
require(PortfolioAnalytics)
tickers= c("SPY","TLT","GLD","VEIEX","QQQ","SHY")
getSymbols(tickers)
data.raw = as.timeSeries(na.omit(cbind(Ad(SPY),Ad(TLT),Ad(GLD),Ad(VEIEX),Ad(QQQ),Ad(SHY))))
data.arith = na.omit(Return.calculate(data.raw, method="simple"))
colnames(data.arith) = c("SPY","TLT","GLD","VEIEX","QQQ","SHY")
cvarSpec <- portfolioSpec(
model = list(
type = "CVAR",
optimize = "maxReturn",
estimator = "covEstimator",
tailRisk = list(),
params = list(alpha = 0.05, a = 1)),
portfolio = list(
weights = NULL,
targetReturn = NULL,
targetRisk = 0.08,
riskFreeRate = 0,
nFrontierPoints = 50,
status = 0),
optim = list(
solver = "solveRglpk.CVAR",
objective = NULL,
params = list(),
control = list(),
trace = FALSE))
backtest = portfolioBacktest()
setWindowsHorizon(backtest) = "12m"
assets <- SPY ~ SPY + TLT + GLD + VEIEX + QQQ + SHY
portConstraints ="LongOnly"
myPortfolio = portfolioBacktesting(
formula = assets,
data = data.arith,
spec = cvarSpec,
constraints = portConstraints,
backtest = backtest,
trace = TRUE)
setSmootherLambda(myPortfolio$backtest) <- "1m"
myPortfolioSmooth <- portfolioSmoothing(myPortfolio)
backtestPlot(myPortfolioSmooth, cex = 0.6, font = 1, family = "mono")

mlrCPO - Task conversion TOCPO

I would like to build a CPO for the mlr::makeClassificationViaRegression wrapper. The wrapper builds regression models that predict for the positive class whether a particular example belongs to it (1) or not (-1). It also calculates predicted probabilities using a softmax.
After reading the documentation and vignettes for makeCPOTargetOp, my attempt is as follows:
cpoClassifViaRegr = makeCPOTargetOp(
cpo.name = 'ClassifViaRegr',
dataformat = 'task', #Not sure - will this work if input is df with unknown target values?
# properties.data = c('numerics', 'factors', 'ordered', 'missings'), #Is this needed?
properties.adding = 'twoclass', #See https://mlrcpo.mlr-org.com/articles/a_4_custom_CPOs.html#task-type-and-conversion
properties.needed = character(0),
properties.target = c('classif', 'twoclass'),
task.type.out = 'regr',
predict.type.map = c(response = 'response', prob = 'response'),
constant.invert = TRUE,
cpo.train = function(data, target) {
getTaskDesc(data)
},
cpo.retrafo = function(data, target, control) {
cat(class(target))
td = getTaskData(target, target.extra = T)
target.name = paste0(control$positive, ".prob")
data = td$data
data[[target.name]] = ifelse(td$target == pos, 1, -1)
makeRegrTask(id = paste0(getTaskId(target), control$positive, '.'),
data = data,
target = target.name,
weights = target$weights,
blocking = target$blocking)
},
cpo.train.invert = NULL, #Since constant.invert = T
cpo.invert = function(target, control.invert, predict.type) {
if(predict.type == 'response') {
factor(ifelse(target > 0, control.invert$positive, control.invert$positive))
} else {
levs = c(control.invert$positive, control.invert$negative)
propVectorToMatrix(vnapply(target, function(x) exp(x) / sum(exp(x))), levs)
}
})
It seems to work as expected, the demo below shows that the inverted prediction is identical to the prediction obtained using the makeClassificationViaRegr wrapper:
lrn = makeLearner("regr.lm")
# Wrapper -----------------------------------------------------------------
lrn2 = makeClassificationViaRegressionWrapper(lrn)
model = train(lrn2, sonar.task, subset = 1:140)
predictions = predict(model, newdata = getTaskData(sonar.task)[141:208, 1:60])
# CPO ---------------------------------------------------------------------
sonar.train = subsetTask(sonar.task, 1:140)
sonar.test = subsetTask(sonar.task, 141:208)
trafd = sonar.train %>>% cpoClassifViaRegr()
mod = train(lrn, trafd)
retr = sonar.test %>>% retrafo(trafd)
pred = predict(mod, retr)
invpred = invert(inverter(retr), pred)
identical(predictions$data$response, invpred$data$response)
The problem is that the after the CPO has converted the task from twoclass to regr, there is no way for me to specify predict.type = 'prob'. In the case of the wrapper, the properties of the base regr learner are modified to accept predict.type = prob (see here). But the CPO is unable to modify the learner in this way, so how can I tell my model to return predicted probabilities instead of the predicted response?
I was thinking I could specify a include.prob parameter, i.e. cpoClassifViaRegr(include.prob = T). If set to TRUE, the cpo.invert returns the predicted probabilities in addition to the predicted response. Would something like this work?

error : argument "x" is missing, with no default?

As im very new to XGBoost, I am trying to tune the parameters using mlr library and model but after using setHayperPars() learning using train() throws an error (in particular when i run xgmodel line): Error in colnames(x) : argument "x" is missing, with no default, and i can't recognize what's this error means, below is the code:
library(mlr)
library(dplyr)
library(caret)
library(xgboost)
set.seed(12345)
n=dim(mydata)[1]
id=sample(1:n, floor(n*0.6))
train=mydata[id,]
test=mydata[-id,]
traintask = makeClassifTask (data = train,target = "label")
testtask = makeClassifTask (data = test,target = "label")
#create learner
lrn = makeLearner("classif.xgboost",
predict.type = "response")
lrn$par.vals = list( objective="multi:softprob",
eval_metric="merror")
#set parameter space
params = makeParamSet( makeIntegerParam("max_depth",lower = 3L,upper = 10L),
makeIntegerParam("nrounds",lower = 20L,upper = 100L),
makeNumericParam("eta",lower = 0.1, upper = 0.3),
makeNumericParam("min_child_weight",lower = 1L,upper = 10L),
makeNumericParam("subsample",lower = 0.5,upper = 1),
makeNumericParam("colsample_bytree",lower = 0.5,upper = 1))
#set resampling strategy
configureMlr(show.learner.output = FALSE, show.info = FALSE)
rdesc = makeResampleDesc("CV",stratify = T,iters=5L)
# set the search optimization strategy
ctrl = makeTuneControlRandom(maxit = 10L)
# parameter tuning
set.seed(12345)
mytune = tuneParams(learner = lrn, task = traintask,
resampling = rdesc, measures = acc,
par.set = params, control = ctrl,
show.info = FALSE)
# build model using the tuned paramters
#set hyperparameters
lrn_tune = setHyperPars(lrn,par.vals = mytune$x)
#train model
xgmodel = train(learner = lrn_tune,task = traintask)
Could anyone tell me what's wrong!?
You have to be very careful when loading multiple packages that may involve methods with the same name - here caret and mlr, which both include a train method. Moreover, the order of the library statements is significant: here, as caret is loaded after mlr, it masks functions with the same name from it (and possibly every other package loaded previously), like train.
In your case, where you obviously want to use the train method from mlr (and not from caret), you should declare this explicitly in your code:
xgmodel = mlr::train(learner = lrn_tune,task = traintask)

Tuning parms in rpart with MLR package?

I am trying to use the MLR package to tune the hyper-parameters of a decision tree built with the rpart package. Even if I can tune the basic parameters of the decision tree (e.g. minsplit, maxdepth and so on), I am not able to properly set the values of the parameter param. Specifically, I would like to try different priors in the grid search.
Here the code I written (dat is the dataframe I am using, and target is my class variable):
# Create a task
dat.task = makeClassifTask(id = "tree", data = dat, target = "target")
# Define the model
resamp = makeResampleDesc("CV", iters = 4L)
# Create the learner
lrn = makeLearner("classif.rpart")
# Create the grid params
control.grid = makeTuneControlGrid()
ps = makeParamSet(
makeDiscreteParam("cp", values = seq(0.001, 0.006, 0.002)),
makeDiscreteParam("minsplit", values = c(1, 5, 10, 50)),
makeDiscreteParam("maxdepth", values = c(20, 30, 50)),
makeDiscreteParam("parms", values = list(prior=list(c(.6, .4),
c(.5, .5))))
)
When I try to execute the tuning, with:
# Actual tuning, with accuracy as evaluation metric
tuned = tuneParams(lrn, task = dat.task,
resampling = resamp,
control = control.grid,
par.set = ps, measures = acc)
I get the error
Error in get(paste("rpart", method, sep = "."), envir = environment())(Y, : The parms list must have names
I also tried to define parms as an UntypedParam with
makeUntypedParam("parms", special.vals = list(prior=list(c(.6, .4), c(.5,.5))))
This was because by typing getParamSet("classif.rpart"), it seems to me that the tuning accepts an "untyped variable" rather than a discrete one.
However, when I try this, I get the error:
Error in makeOptPath(par.set, y.names, minimize, add.transformed.x, include.error.message, :
OptPath can currently only be used for: numeric,integer,numericvector,integervector,logical,logicalvector,discrete,discretevector,character,charactervector
Can anybody help?
You have to define the Parameter "parms" like this:
makeDiscreteParam("parms", values = list(a = list(prior = c(.6, .4)), b = list(prior = c(.5, .5))))
a and b can be arbitrary names that just reflect what the actual value says.

Resources