I keep running into an error while attempting to plot variable importance from ensemble of models.
I have ensemble of models I've fitted and now I am trying to create multiple variable importance plots for each algorithm I've fitted. I am using varImp() function from caret to extract variable importance, then plot() it. To fit ensemble of models, I am using caretEnsemble package.
Thank you for any help, please see the example of code below.
# Caret ensemble is needed to produce list of models
library(caret)
library(caretEnsemble)
# Set algorithms I wish to fit
my_algorithms <- c("glmnet", "svmRadial", "rf", "nnet", "knn", "rpart")
# Define controls
my_controls <- trainControl(
method = "cv",
savePredictions = "final",
number = 3
)
# Run the models all at once with caretEnsemble
my_list_of_models <- caretEnsemble::caretList(Species ~ .,
data = iris,
trControl = my_controls,
methodList = my_algorithms)
# Subset models
list_of_algorithms <- my_list_of_models[my_algorithms]
# Create first for loop to extract variable importance via caret::varImp()
importance <- list()
for (algo in seq_along(list_of_algorithms)) {
importance[[algo]] <- varImp(list_of_algorithms[[algo]])
}
# Create second loop to go over extracted importance and plot it using plot()
importance_plots <- list()
for (imp in seq_along(importance)) {
importance_plots[[imp]] <- plot(importance[[imp]])
}
# Error occurs during the second for loop:
Error in data.frame(values = unlist(unname(x)), ind, stringsAsFactors = FALSE):arguments imply differing number of rows: 16,
I've come up with the solution to the problem above and decided to post it as my own answer. I've written a small function to plot variable importance without relying on caret helper functions to create plots. I used dotplot and levelplot because caret returns data.frame that differs based on provided algorithm. It may not work on different algorithms and models that didn't fit.
# Libraries ---------------------------------------------------------------
library(caret) # To train ML algorithms
library(dplyr) # Required for %>% operators in custom function below
library(caretEnsemble) # To train multiple caret models
library(lattice) # Required for plotting, should be loaded alongside caret
library(gridExtra) # Required for plotting multiple plots
# Custom function ---------------------------------------------------------
# The function requires list of models as input and is used in for loop
plot_importance <- function(importance_list, imp, algo_names) {
importance <- importance_list[[imp]]$importance
model_title <- algo_names[[imp]]
if (ncol(importance) < 2) { # Plot dotplot if dim is ncol < 2
importance %>%
as.matrix() %>%
dotplot(main = model_title)
} else { # Plot heatmap if ncol > 2
importance %>%
as.matrix() %>%
levelplot(xlab = NULL, ylab = NULL, main = model_title, scales = list(x = list(rot = 45)))
}
}
# Tuning parameters -------------------------------------------------------
# Set algorithms I wish to fit
# Rather than using methodList as provided above, I've switched to tuneList because I need to control tuning parameters of random forest algorithm.
my_algorithms <- list(
glmnet = caretModelSpec(method = "glmnet"),
rpart = caretModelSpec(method = "rpart"),
svmRadial = caretModelSpec(method = "svmRadial"),
rf = caretModelSpec(method = "rf", importance = TRUE), # Importance is not computed for "rf" by default
nnet = caretModelSpec(method = "nnet"),
knn = caretModelSpec(method = "knn")
)
# Define controls
my_controls <- trainControl(
method = "cv",
savePredictions = "final",
number = 3
)
# Run the models all at once with caretEnsemble
my_list_of_models <- caretList(Species ~ .,
data = iris,
tuneList = my_algorithms,
trControl = my_controls
)
# Extract variable importance ---------------------------------------------
importance <- lapply(my_list_of_models, varImp)
# Plotting variable immportance -------------------------------------------
# Create second loop to go over extracted importance and plot it using plot()
importance_plots <- list()
for (imp in seq_along(importance)) {
# importance_plots[[imp]] <- plot(importance[[imp]])
importance_plots[[imp]] <- plot_importance(importance_list = importance, imp = imp, algo_names = names(my_list_of_models))
}
# Multiple plots at once
do.call("grid.arrange", c(importance_plots))
Related
I am very new to deep learning. I trained a neural net using the packages deepnet and caret. For this regression problem caretuses a sigmoid function as activation function and a linear one as output function.
I preprocessed the predictors using preprocess = "range" (which I thought normalizes the predictors).
library(caret)
library(deepnet)
set.seed(123, kind = "Mersenne-Twister", normal.kind = "Inversion")
# create data
dat <- as.data.frame(ChickWeight)
dat$vari <- sample(LETTERS, nrow(dat), replace = TRUE)
dat$Chick <- as.character(dat$Chick)
preds <- dat[1:100,2:5]
response <- dat[1:100,1]
vali <- dat[101:150,]
# change format of categorical predictors to one-hot encoded format
dmy <- dummyVars(" ~ .", data = preds)
preds_dummies <- data.frame(predict(dmy, newdata = preds))
# specifiy trainControl for tuning mtry and with specified folds
control <- caret::trainControl(search = "grid", method="repeatedcv", number=3,
repeats=2,
savePred = T)
# tune hyperparameters and build final model
tunegrid <- expand.grid(layer1 = c(5,50),
layer2 = c(0,5,50),
layer3 = c(0,5,50),
hidden_dropout = c(0, 0.1),
visible_dropout = c(0, 0.1))
model <- caret::train(x = preds_dummies,
y = response,
method="dnn",
metric= "RMSE",
tuneGrid=tunegrid,
trControl= control,
preProcess = "range"
)
When I predict using the validation set with the tuned neural network model, it produces only one prediction value despite of various input predictors.
# predict with validation set
# create dummies
dmy <- dummyVars(" ~ .", data = vali)
vali_dummies <- data.frame(predict(dmy, newdata = vali))
vali_dummies <- vali_dummies[,which(names(vali_dummies) %in% model$finalModel$xNames)]
# add empty columns for categorical preds of the one used in the model (to have the same matix)
not_included <- setdiff(model$finalModel$xNames, names(vali_dummies))
vali_add <- as.data.frame(matrix(rep(0, length(not_included)*nrow(vali_dummies)),
nrow = nrow(vali_dummies),
ncol = length(not_included))
)
# change names
names(vali_add) <- not_included
# add to vali_dummies
vali_dummies <- cbind(vali_dummies, vali_add)
# put it in the same order as preds_dummies (sort the columns)
vali_dummies <- vali_dummies[names(preds_dummies)]
# normalize also the validation set
pp = preProcess(vali_dummies, method = c("range"))
vali_dummies <- predict(pp, vali_dummies)
# save obs and pred for predictions with the outer CV out-of-fold test set
temp <- data.frame(obs = vali[,1],
pred = caret::predict.train(object = model, newdata = vali_dummies))
temp
When I am using the Boston data set from the MASS package where no categorical predictors are present, I get slightly different prediction values for all the different input predictors of the validation set.
How can I fix this and create a neural network which predicts "different" predictions when using numeric as well as categorical predictors? What else besides normalization should I try?
I have a dataset with both continuous and categorical variables. I am running regression to predict one of the variables based on the other variables in the dataset. After comparing the results of ridge, lasso and elastic-net regression, the lasso regression is the best model to proceed with.
I used the 'coef' function to extract the model's coefficients, however, the result is a very long list with over 800 variables (as some of my categorical variables have many levels). Is there a way I can quickly rank the coefficients from largest to smallest? This is a glmnet model output
Reproducible problem with example code:
# Libraries Needed
library(caret)
library(glmnet)
library(mlbench)
library(psych)
# Data
data("BostonHousing")
data <- BostonHousing
str(data)
# Data Partition
set.seed(222)
ind <- sample(2, nrow(data), replace = T, prob = c(0.7, 0.3))
train <- data[ind==1,]
test <- data[ind==2,]
# Custom Control Parameters
custom <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
verboseIter = T)
# Linear Model
set.seed(1234)
lm <- train(medv ~.,
train,
method='lm',
trControl = custom)
# Results
lm$results
lm
summary(lm)
plot(lm$finalModel)
# Ridge Regression
set.seed(1234)
ridge <- train(medv ~.,
train,
method = 'glmnet',
tuneGrid = expand.grid(alpha = 0,
lambda = seq(0.0001, 1, length=5)),#try 5 values for lambda between 0.0001 and 1
trControl=custom)
#increasing lambda = increasing penalty and vice versa
#increase lambda therefore will cause coefs to shrink
# Plot Results
plot(ridge)
plot(ridge$finalModel, xvar = "lambda", label = T)
plot(ridge$finalModel, xvar = 'dev', label=T)
plot(varImp(ridge, scale=T))
# Lasso Regression
set.seed(1234)
lasso <- train(medv ~.,
train,
method = 'glmnet',
tuneGrid = expand.grid(alpha=1,
lambda = seq(0.0001,1, length=5)),
trControl = custom)
# Plot Results
plot(lasso)
lasso
plot(lasso$finalModel, xvar = 'lambda', label=T)
plot(lasso$finalModel, xvar = 'dev', label=T)
plot(varImp(lasso, scale=T))
# Elastic Net Regression
set.seed(1234)
en <- train(medv ~.,
train,
method = 'glmnet',
tuneGrid = expand.grid(alpha = seq(0,1,length=10),
lambda = seq(0.0001,1,length=5)),
trControl = custom)
# Plot Results
plot(en)
plot(en$finalModel, xvar = 'lambda', label=T)
plot(en$finalModel, xvar = 'dev', label=T)
plot(varImp(en))
# Compare Models
model_list <- list(LinearModel = lm, Ridge = ridge, Lasso = lasso, ElasticNet=en)
res <- resamples(model_list)
summary(res)
bwplot(res)
xyplot(res, metric = 'RMSE')
# Best Model
en$bestTune
best <- en$finalModel
coef(best, s = en$bestTune$lambda)
For most models all you'd have to do would be:
sort(coef(model), decreasing=TRUE)
Since you're using glmnet it's a little bit more complicated. I'm going to replicate a minimal version of your example here (the other models, plots, etc. are not necessary in order for us to be able to reproduce your problem ...)
## Packages
library(caret)
library(glmnet)
library(mlbench) ## for BostonHousing data
# Data
data("BostonHousing")
data <- BostonHousing
# Data Partition
set.seed(222)
ind <- sample(2, nrow(data), replace = TRUE, prob = c(0.7, 0.3))
train <- data[ind==1,]
test <- data[ind==2,]
# Custom Control Parameters
custom <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
verboseIter = TRUE)
# Elastic Net Regression
set.seed(1234)
en <- train(medv ~.,
train,
method = 'glmnet',
tuneGrid = expand.grid(alpha = seq(0,1,length=10),
lambda = seq(0.0001,1,length=5)),
trControl = custom)
# Best Model
best <- en$finalModel
coefs <- coef(best, s = en$bestTune$lambda)
(This could probably be made simpler: for example, do you really need the custom control parameters to show us the example? This would be even simpler without using caret - just using `glmnet - but I was afraid I might leave something out.)
Once you've got the coefficients, sorting does appear to work, albeit with a message about possible inefficiency:
sort(coefs, decreasing=TRUE)
## <sparse>[ <logic> ] : .M.sub.i.logical() maybe inefficient
## [1] 25.191049410 5.078589706 1.389548822 0.244605193 0.045600250
## [6] 0.008840485 0.004372752 -0.012701593 -0.028337745 -0.162794401
## [11] -0.335062819 -0.901475516 -1.395091095 -12.632336419
sort(as.numeric(coefs)) also appears to work fine.
If you want to sort the entire matrix (i.e. keeping the values for all penalization levels), you can take advantage of the fact that the penalization doesn't change the rank-order of the parameters:
coeftab <-coef(best)
lastvals <- coeftab[,ncol(coeftab)]
coeftab_s <- coeftab[order(lastvals,decreasing=TRUE),]
## plot, leaving out the intercept
matplot(t(coeftab_s)[,-1],type="l")
I am practicing SVM in R using the iris dataset and I want to get the feature weights/coefficients from my model, but I think I may have misinterpreted something given that my output gives me 32 support vectors. I was under the assumption I would get four given I have four variables being analyzed. I know there is a way to do it when using the svm() function, but I am trying to use the train() function from caret to produce my SVM.
library(caret)
# Define fitControl
fitControl <- trainControl(## 5-fold CV
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = twoClassSummary )
# Define Tune
grid<-expand.grid(C=c(2^-5,2^-3,2^-1))
##########
df<-iris head(df)
df<-df[df$Species!='setosa',]
df$Species<-as.character(df$Species)
df$Species<-as.factor(df$Species)
# set random seed and run the model
set.seed(321)
svmFit1 <- train(x = df[-5],
y=df$Species,
method = "svmLinear",
trControl = fitControl,
preProc = c("center","scale"),
metric="ROC",
tuneGrid=grid )
svmFit1
I thought it was simply svmFit1$finalModel#coefbut I get 32 vectors when I believe I should get 4. Why is that?
So coef is not the weight W of the support vectors. Here's the relevant section of the ksvm class in the docs:
coef The corresponding coefficients times the training labels.
To get what you are looking for, you'll need to do the following:
coefs <- svmFit1$finalModel#coef[[1]]
mat <- svmFit1$finalModel#xmatrix[[1]]
coefs %*% mat
See below for a reproducible example.
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 3.5.2
# Define fitControl
fitControl <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = twoClassSummary
)
# Define Tune
grid <- expand.grid(C = c(2^-5, 2^-3, 2^-1))
##########
df <- iris
df<-df[df$Species != 'setosa', ]
df$Species <- as.character(df$Species)
df$Species <- as.factor(df$Species)
# set random seed and run the model
set.seed(321)
svmFit1 <- train(x = df[-5],
y=df$Species,
method = "svmLinear",
trControl = fitControl,
preProc = c("center","scale"),
metric="ROC",
tuneGrid=grid )
coefs <- svmFit1$finalModel#coef[[1]]
mat <- svmFit1$finalModel#xmatrix[[1]]
coefs %*% mat
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,] -0.1338791 -0.2726322 0.9497457 1.027411
Created on 2019-06-11 by the reprex package (v0.2.1.9000)
Sources
https://www.researchgate.net/post/How_can_I_find_the_w_coefficients_of_SVM
http://r.789695.n4.nabble.com/SVM-coefficients-td903591.html
https://stackoverflow.com/a/1901200/6637133
As more folks start moving from Caret to Tidymodels I thought I'd put a version of the above solution for Tidymodels Aug 2020 because I don't see many discussions about this so far and it isn't that straightforward to do.
Outlining the main steps here but please review the links at the end for detail for why it was done this way.
1. Get Your Final Model
set.seed(2020)
# Assuming kernlab linear SVM
# Grid Search Parameters
tune_rs <- tune_grid(
model_wf,
train_folds,
grid = param_grid,
metrics = classification_measure,
control = control_grid(save_pred = TRUE)
)
# Finalise workflow with the parameters for best accuracy
best_accuracy <- select_best(tune_rs, "accuracy")
svm_wf_final <- finalize_workflow(
model_wf,
best_accuracy
)
# Fit on your final model on all available data at the end of experiment
final_model <- fit(svm_wf_final, data)
# fit takes a model spec and executes the model fit routine (Parsnip)
# model_spec, formula and data to fit upon
2. Extract the KSVM Object, Pull Required Info, Calculate Variable Importance
ksvm_obj <- pull_workflow_fit(final_model)$fit
# Pull_workflow_fit returns the parsnip model fit object
# $fit returns the object produced by the fitting fn (which is what we need! and is dependent on the engine)
coefs <- ksvm_obj#coef[[1]]
# first bit of info we need are the coefficients from the linear fit
mat <- ksvm_obj#xmatrix[[1]]
# xmatrix that we need to matrix multiply against
var_impt <- coefs %*% mat
# var importance
Ref:
Extracting the Weights of Support Vectors using Caret: Linear SVM and extracting the weights
Variable Importance (Last Section of this post): http://www.rebeccabarter.com/blog/2020-03-25_machine_learning/#finalize-the-workflow
After specifiying a recipe to use in caret::train I am trying to predict new samples. I have a couple of questions around this as I can not find in caret/recipes documentation.
Should I use predict() or predict.train()? Whats the difference?
Should I bake the test data with the prepared recipe first before using predict? When using preProcess directly in train() you are advised not to preProcess new data as the train object will automatically do that. Is this the same when using recipes?
Below is a reproducible example illustrating my process and the difference in predictions when using predict vs predict.train
library(recipes)
library(caret)
# Data ----
data("credit_data")
credit_train <- credit_data[1:3500,]
credit_test <- credit_data[-(1:3500),]
# Set up recipe ----
set.seed(0)
Rec.Obj = recipe(Status ~ ., data = credit_train) %>%
step_knnimpute(all_predictors()) %>%
step_center(all_numeric())%>%
step_scale(all_numeric())
# Control parameters ----
set.seed(0)
TC = trainControl("cv",number = 10, savePredictions = "final", classProbs = TRUE, returnResamp = "final")
set.seed(0)
Model.Output = train(Rec.Obj,
credit_train,
trControl = TC,
tuneLength = 1,
metric = "Accuracy",
method = "glm")
# Preped recipe ----
set.seed(0)
prep.rec <-
prep(Rec.Obj, newdata = credit_train)
# Baked data for observation ----
set.seed(0)
bake.train <- bake(prep.rec, new_data = credit_train)
bake.test <- bake(prep.rec, new_data = credit_test)
# investigation of prediction methods ----
# no application of recipe to newdata
set.seed(0)
predict.norm = predict(Model.Output, credit_test, type = "raw")
predict.train = predict.train(Model.Output, credit_test, type = "raw")
identical(predict.norm,predict.train)
# evaluates to FALSE
# Apply recipe to new data (bake.test)
predict.norm.baked = predict(Model.Output, bake.test, type = "raw")
predict.train.baked = predict.train(Model.Output, bake.test, type = "raw")
identical(predict.norm.baked, predict.train.baked)
# evaluates to FALSE
# Comparison of both predict() funcs
identical(predict.norm, predict.norm.baked)
# evaluates to FALSE
The recipe is embedded into the train object. The answers are different for two reasons:
Since you are giving the recipe (inside of Model.Output) the processed data to be re-processed. You should not give predict() baked data; just use predict() and give it the original test set..
Let S3 do its thing: predict.train is for the x/y interface and predict.train.recipe is for the recipe interface. Just using predict() will do the appropriate thing.
In R, I am trying to use the bag function with the train function. I start with using train and rpart for a classification tree model, on the simple iris data set. Now I want to create a bag of such 10 trees with the bag function. The documentation says that the aggregate parameter must be a function to choose a value from all bagged models, so I created one called agg, which chooses the string of greatest frequency. However, the bag function gives the following error:
Error in fitter(btSamples[[iter]], x = x, y = y, ctrl = bagControl, v = vars, :
task 1 failed - "attempt to apply non-function"
Here is my complete code:
# Use bagging to create a bagged classification tree from 10 classification trees created with rpart.
data(iris)
# Create training and testing data sets:
inTrain = createDataPartition(y=iris$Species, p=0.7, list=F)
train = iris[inTrain,]
test = iris[-inTrain,]
# Create regressor and outcome datasets for bag function:
regressors = train[,-5]
species = train[,5]
# Create aggregate function:
agg = function(x, type) {
y = count(x)
y = y[order(y$freq, decreasing=T),]
as.character(y$x[1])
}
# Create bagged trees with bag function:
treebag = bag(regressors, species, B=10,
bagControl = bagControl(fit = train(data=train, species ~ ., method="rpart"),
predict = predict,
aggregate = agg
)
)
This gives the error message stated above. I don't understand why it rejects the agg function.
from ?bag()
When using bag with train, classification models should use type =
"prob" inside of the predict function so that predict.train(object,
newdata, type = "prob") will work.
So I guess you might want to try:
bagControl = bagControl(fit = train(data=train, species ~ .,
method="rpart", type="prob"),
predict = predict,
aggregate = agg
)