r caret: train ONE model once the hyper-parameters are already known - r

I am using caret to train a ridge regression:
library(ISLR)
Hitters = na.omit(Hitters)
x = model.matrix(Salary ~ ., Hitters)[, -1] #Dropping the intercept column.
y = Hitters$Salary
set.seed(0)
train = sample(1:nrow(x), 7*nrow(x)/10)
library(caret)
set.seed(0)
# Values of lambda over which to check:
grid = 10 ^ seq(5, -2, length = 100)
train_control = trainControl(method = 'cv', number = 10)
tune.grid = expand.grid(lambda = grid, alpha = 0)
ridge.caret = train(x[train, ], y[train],
method = 'glmnet',
trControl = train_control,
tuneGrid = tune.grid)
ridge.caret$bestTune
# alpha is 0 and best lambda is 242.0128
So, I found my optimal lambda and alpha. In fact, it's not really important for my question, what they are.
Now, how could I now run just ONE ridge regression (using caret) with alpha = 0 and lambda = 242.0128 for the whole data set?
I discovered that I can specify trainControl method as 'none'. See the code below. Did I correctly specify the tuneGrid (with just one line). Is this how it should be done?
Thank you very much!
set.seed(12345)
ridge_full <- train(x, y,
method = 'glmnet',
trControl = trainControl(method = 'none'),
tuneGrid = expand.grid(lambda = ridge.caret$bestTune$lambda, alpha = 0))
coef(ridge_full$finalModel, s = ridge_full$bestTune$lambda)

Related

preprocessing (center and scale) only specific variables (numeric variables)

I have a dataframe that consist of numerical and non-numerical variables. I am trying to fit a logisic regression model predicting my variable "risk" based on all other variables, optimizing AUC using a 6-fold cross validation.
However, I want to center and scale all numerical explanatory variables. My code raises no errors or warning but somehow I fail to figure out how to tell train() through preProcess (or in some other way) to just center and scale my numerical variables.
Here is the code:
test <- train(risk ~ .,
method = "glm",
data = df,
family = binomial(link = "logit"),
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv",
number = 6,
classProbs = TRUE,
summaryFunction = prSummary),
metric = "AUC")
You could try to preprocess all numerical variables in original df first and then applying train function over scaled df
library(dplyr)
library(caret)
df <- df %>%
dplyr::mutate_if(is.numeric, scale)
test <- train(risk ~ .,
method = "glm",
data = df,
family = binomial(link = "logit"),
trControl = trainControl(method = "cv",
number = 6,
classProbs = TRUE,
summaryFunction = prSummary),
metric = "AUC")

Caret obtain train & cv predictions from model to plot

I've trained a simple model:
mySim <- train(Event ~ .,
method = 'rf',
data = train,
tuneGrid = tg)
Optimising the two nnet parameters weight_decay and size of the hidden layer. I'm new to trying out caret so what I would usually do is plot the train error and cv error for each model build. To do this, I'd need to have the predictive values of my train and validation pass.
This is the first time I've used cross validation so I'm a little unsure how I can go about getting the predictions from the train and hold-out set at each tuneGrid iteration.
If I have a grid search of length 3 (3 models to build) and 5-fold cross validation I assume I'm going to have 15 sets of train & holdout predictions for each model.
The plot I'm essentially looking to build is:
Where my y-axis is a performance metric, lets say entropy loss for the sake of classification with nnet and the size grid search values on the x-axis increases from 0 - max.
Is there a way in which I can extract the predicted values from the train / holdout set during trainControl cross validation?
I've looked through some of the attributes train returns but not sure if I'm missing something.
I know I lack code in this question but hopefully I've explained myself.
Update
I am correct in assuming setting the following parameters in trainControl will return the predictions allowing me to create this plot:
returnResamp
savePredictions
carets::train keeps only the hold out predictions. If you specify savePredictions ="all" it will save hold out predictions for all hyper parameter combinations. However it does not save the train set predictions. You could generate them afterwards with the knowledge which indexes were used for the hold outs. This info is the model$pred slot of the object returned by train. mlr package has an option to keep both hold out and train predictions and metric.
Here is an example on how to perform the requested operation with mlr library:
library(mlr)
library(mlbench) #for the data set
I will use the Sonar data set:
data(Sonar)
create a task:
task <- makeClassifTask(data = Sonar, target = "Class")
create a learner:
lrn <- makeLearner("classif.nnet", predict.type = "prob")
get all tune-able parameters for a learner:
getParamSet("classif.nnet")
set which ones you would like to tune and the range:
ps <- makeParamSet(
makeIntegerParam("size", lower = 3, upper = 5),
makeNumericParam("decay", lower = 0.1, upper = 0.2))
define resampling:
cross_val <- makeResampleDesc("RepCV",
reps = 2, folds = 5, stratify = TRUE, predict = "both")
how the search will be performed (grid in this case):
ctrl <- mlr::makeTuneControlGrid(resolution = 4L)
get everything together:
res.mbo <- tuneParams(lrn, task, cross_val, par.set = ps, control = ctrl,
show.info = FALSE, measures = list(auc, setAggregation(auc, test.sd), setAggregation(auc, train.mean), setAggregation(auc, train.sd)))
you can define many measures in a list (the first one is used to select hyper parameters all the other are just for show).
extract the results:
res <- mlr::generateHyperParsEffectData(res.mbo)$data
plot:
library(tidyverse)
res %>%
gather(key, value, c(3,5)) %>%
mutate(key = as.factor(key)) %>%
ggplot()+
geom_point(aes(x = size, y = value, color = key))+
geom_smooth(aes(x = size, y = value, color = key))+
facet_wrap(~decay)
a bunch of warnings about geom_smooth since there are only 3 points per fit
and an example on how to do it in caret just on the hold out samples:
library(caret)
create a tune control
ctrl <- trainControl(
method = "repeatedcv",
number = 5,
repeats = 2,
classProbs = TRUE,
savePredictions = "all",
returnResamp = "all",
summaryFunction = twoClassSummary
)
create a grid of hyper parameters:
grid <- expand.grid(size = c(4, 5, 6), decay = seq(from = 0.1, to = 0.2, length.out = 4))
tune:
fit <- caret::train(Sonar[,1:60], Sonar$Class,
method = 'nnet',
tuneGrid = grid,
metric = 'ROC',
trControl = ctrl)
plot:
fit$results %>%
ggplot()+
geom_point(aes(x = size, y = ROC))+
geom_smooth(aes(x = size, y = ROC))+
facet_wrap(~decay)

Plotting ROC curve from two different algorithms using lift in caret

I have a two models like the following:
library(mlbench)
data(Sonar)
library(caret)
set.seed(998)
my_data <- Sonar
fitControl <-
trainControl(
method = "boot632",
number = 10,
classProbs = T,
savePredictions = "final",
summaryFunction = twoClassSummary
)
modelxgb <- train(
Class ~ .,
data = my_data,
method = "xgbTree",
trControl = fitControl,
metric = "ROC"
)
library(mlbench)
data(Sonar)
library(caret)
set.seed(998)
my_data <- Sonar
fitControl <-
trainControl(
method = "boot632",
number = 10,
classProbs = T,
savePredictions = "final",
summaryFunction = twoClassSummary
)
modelsvm <- train(
Class ~ .,
data = my_data,
method = "svmLinear2",
trControl = fitControl,
metric = "ROC"
)
I want to plot the ROC curves for both models on one ggplot.
I am doing the following to generate the points for the curve:
for_lift_xgb = data.frame(Class = modelxgb$pred$obs, xgbTree = modelxgb$pred$R)
for_lift_svm = data.frame(Class = modelsvm$pred$obs, svmLinear2 = modelsvm$pred$R)
lift_obj_xgb = lift(Class ~ xgbTree, data = for_lift_xgb, class = "R")
lift_obj_svm = lift(Class ~ svmLinear2, data = for_lift_svm, class = "R")
What would be the easiest way to plot both of these curves on a single plot, and have them in different colors. I would also like to annotate the individual AUC values on the plot.
After building the models you can combine the predictions in a single data frame:
for_lift = data.frame(Class = modelxgb$pred$obs,
xgbTree = modelxgb$pred$R,
svmLinear2 = modelsvm$pred$R)
use it to build the lift object using the following:
lift = lift(Class ~ xgbTree + svmLinear2, data = for_lift, class = "R")
and plot with ggplot:
library(ggplot)
ggplot(lift$data)+
geom_line(aes(1-Sp , Sn, color = liftModelVar))+
scale_color_discrete(guide = guide_legend(title = "method"))
You can combine and compare many models this way.
To add auc to the plot you can create a data frame with the models names, the corresponding auc and the coordinates for plotting:
auc_ano <- data.frame(model = c("xgbTree","svmLinear2"),
auc = c(pROC::roc(response = for_lift$Class,
predictor = for_lift$xgbTree,
levels=c("M", "R"))$auc,
pROC::roc(response = for_lift$Class,
predictor = for_lift$svmLinear2,
levels=c("M", "R"))$auc),
y = c(0.95, 0.9))
auc_ano
#output
model auc y
1 xgbTree 0.9000756 0.95
2 svmLinear2 0.5041086 0.90
and pass it to geom_text:
ggplot(lift$data)+
geom_line(aes(1-Sp , Sn, color = liftModelVar))+
scale_color_discrete(guide = guide_legend(title = "method"))+
geom_text(data = auc_ano, aes(label = round(auc, 4), color = model, y = y), x = 0.1)

r: coefficients from glmnet and caret are different for the same lambda

I've read a few Q&As about this, but am still not sure I understand, why the coefficients from glmnet and caret models based on the same sample and the same hyper-parameters are slightly different. Would greatly appreciate an explanation!
I am using caret to train a ridge regression:
library(ISLR)
Hitters = na.omit(Hitters)
x = model.matrix(Salary ~ ., Hitters)[, -1] #Dropping the intercept column.
y = Hitters$Salary
set.seed(0)
train = sample(1:nrow(x), 7*nrow(x)/10)
library(caret)
set.seed(0)
train_control = trainControl(method = 'cv', number = 10)
grid = 10 ^ seq(5, -2, length = 100)
tune.grid = expand.grid(lambda = grid, alpha = 0)
ridge.caret = train(x[train, ], y[train],
method = 'glmnet',
trControl = train_control,
tuneGrid = tune.grid)
ridge.caret$bestTune
# alpha is 0 and best lambda is 242.0128
Now, I use the lambda (and alpha) found above to train a ridge regression for the whole data set. At the end, I extract the coefficients:
ridge_full <- train(x, y,
method = 'glmnet',
trControl = trainControl(method = 'none'),
tuneGrid = expand.grid(
lambda = ridge.caret$bestTune$lambda, alpha = 0)
)
coef(ridge_full$finalModel, s = ridge.caret$bestTune$lambda)
Finally, using exactly the same alpha and lambda, I try to fit the same ridge regression using glmnet package - and extract coefficients:
library(glmnet)
ridge_full2 = glmnet(x, y, alpha = 0, lambda = ridge.caret$bestTune$lambda)
coef(ridge_full2)
The reason is the fact the exact lambda you specified was not used by caret. You can check this by:
ridge_full$finalModel$lambda
closest values are 261.28915 and 238.07694.
When you do
coef(ridge_full$finalModel, s = ridge.caret$bestTune$lambda)
where s is 242.0128 the coefficients are interpolated from the coefficients actually calculated.
Wheres when you provide lambda to the glmnet call the model returns exact coefficients for that lambda which differ only slightly from the interpolated ones caret returns.
Why this happens:
when you specify one alpha and one lambda for a fit on all of the data caret will actually fit:
fit = function(x, y, wts, param, lev, last, classProbs, ...) {
numLev <- if(is.character(y) | is.factor(y)) length(levels(y)) else NA
theDots <- list(...)
if(all(names(theDots) != "family")) {
if(!is.na(numLev)) {
fam <- ifelse(numLev > 2, "multinomial", "binomial")
} else fam <- "gaussian"
theDots$family <- fam
}
## pass in any model weights
if(!is.null(wts)) theDots$weights <- wts
if(!(class(x)[1] %in% c("matrix", "sparseMatrix")))
x <- Matrix::as.matrix(x)
modelArgs <- c(list(x = x,
y = y,
alpha = param$alpha),
theDots)
out <- do.call(glmnet::glmnet, modelArgs)
if(!is.na(param$lambda[1])) out$lambdaOpt <- param$lambda[1]
out
}
this was taken from here.
in your example this translates to
fit <- glmnet::glmnet(x, y,
alpha = 0)
lambda <- unique(fit$lambda)
these lambda values correspond to ridge_full$finalModel$lambda:
all.equal(lambda, ridge_full$finalModel$lambda)
#output
TRUE

R - How to let glmnet select lambda, while providing an alpha range in caret?

This question appears to have been asked before here but was correctly closed as off-topic. I'm now experiencing the same issue and figured that stack overflow is a better place for this issue.
I want to use glmnet's warm start for selecting lambda to speed up the model building process, but I want to keep using tuneGrid from caret in order to supply a large sequence of alpha's (glmnet's default alpha range is too narrow). the following attempt returns the error: Error: The tuning parameter grid should have columns alpha, lambda
fitControl <- trainControl(method = 'cv', number = 10, classProbs = TRUE, summaryFunction = twoClassSummary)
tuneGridb <- expand.grid(.alpha = seq(0, 1, 0.05))
model.caretb <- caret::train(y ~ x1 + x2 + x3, data=train, method="glmnet",
family = "binomial", trControl = fitControl,
tuneGrid = tuneGridb, metric = "ROC")
How can I supply a range of values for alpha via caret whilst using the glmnet default lambda selection process?
If you check the default grid search method for glmnet model in caret
you will notice that if a grid search is specified, but without the actual grid, caret will provide alpha values with:
alpha = seq(0.1, 1, length = len)
while lambda values will be provided by the glmnet "warm start" at alpha = 0.5:
init <- glmnet::glmnet(Matrix::as.matrix(x), y,
family = fam,
nlambda = len+2,
alpha = .5)
lambda <- unique(init$lambda)
lambda <- lambda[-c(1, length(lambda))]
lambda <- lambda[1:min(length(lambda), len)]
so if you do:
library(caret)
library(mlbench)
data(Sonar)
fitControl <- trainControl(method = 'cv',
number = 10,
classProbs = TRUE,
summaryFunction = twoClassSummary,
search = "grid")
model.caret <- caret::train(Class~ .,
data = Sonar,
method="glmnet",
family = "binomial",
trControl = fitControl,
tuneLength = 20,
metric = "ROC")
you will not get a grid of 20 combinations but a grid of 400 combinations, for each alpha 20 lambda values:
nrow(model.caret$results)
#output
400
I understand this is not exactly what you are after but it is pretty close without resorting to a custom train function.
To get closer to the desired result you can manually get the range of lambda values from glmnet for each desired alpha:
lambda <- unique(unlist(lapply(seq(0, 1, 0.05), function(x){
init <- glmnet::glmnet(Matrix::as.matrix(Sonar[,1:60]), Sonar$Class,
family = "binomial",
nlambda = 100,
alpha = x)
lambda <- c(min(init$lambda), max(init$lambda))
}
)))
create a grid of many lambda:
tuneGridb <- expand.grid(.alpha = seq(0, 1, 0.05),
.lambda = seq(min(lambda), max(lambda), length.out = 100))
caret is smart enough just to pass the lambda values to glmnet and not fit all the models
model.caret <- caret::train(Class~ .,
data = Sonar,
method="glmnet",
family = "binomial",
trControl = fitControl,
tuneGrid = tuneGridb,
metric = "ROC")
model.caret$bestTune
#output
alpha lambda
1 0 2.159367e-05
Ridge is the way to go in this case. Since this best lambda was in fact the lowest lambda tested
min(lambda)
#output
2.159367e-05
perhaps it would be wise to explore lower lambda values in the grid than glmnet "warm" start suggested.

Resources