Generate SHAP dependence plots - r

Is there a package that allows for estimation of shap values for multiple observations for models that are not XGBoost or decision tree based? I created a neural network using Caret and NNET. I want to develop a beeswarm plot and shap dependence plots to explore the relationship between certain variables in my model and the outcome. The only success I have had is using the DALEX package to estimate SHAP values, but DALEX only does this for single instances and cannot do a global analysis using SHAP values. Any insight or help would be appreciated!
I have tried using different shap packages (fastshap, shapr) but these require decision tree based models. I tried creating an XGBoost model in caret but this did not implement well with the shap packages in r and I could not get the outcome I wanted.

I invested a little bit of time to push R in this regard:
shapviz plots SHAP values from any source, including XGBoost, LightGBM, H2O, kernelshap, and fastshap
kernelshap calculates Kernel SHAP values for all models with numeric output, even multivariate output. This will be your friend when it comes to models outside the TreeSHAP confort zone...
Put differently: kernelshap + shapviz = explain any model.
Here an example using "caret" for linear regression, but nnet works identically.
library(caret)
library(kernelshap)
library(shapviz)
fit <- train(
Sepal.Length ~ .,
data = iris,
method = "lm",
tuneGrid = data.frame(intercept = TRUE),
trControl = trainControl(method = "none")
)
# Explain rows in `X` based on background data `bg_X` (50-200 rows, not the full training data!)
shap <- kernelshap(fit, X = iris[, -1], bg_X = iris)
sv <- shapviz(shap)
sv_importance(sv)
sv_importance(sv, kind = "bee")
sv_dependence(sv, "Species", color_var = "auto")
# Single observations
sv_waterfall(sv, 1)
sv_force(sv, 1)

Related

is there an R function to obtain the minimal depth distribution from a conditional random forest estimated with the party package?

I ran a conditional random forest regression model using the cforest function from the party package because I have both categorical and continuous predictor variables that are correlated with each other, and a continuous outcome variable.
Here is my code to run the conditional random forest model, obtain out-of-bag estimates, and estimate the permutation variable importance.
# 1. fit the random forest
crf <- party::cforest(Y ~ ., data = df,
controls = party::cforest_unbiased(ntree = 10000, mtry = 7))
# 2. obtain out-of-bag estimates
pred_oob <- as.data.frame(caret::predict(crf, OOB = T, newdata = NULL))
# 3. estimate permutation variable importance
vi <- permimp::permimp(crf, condition = T, threshold = 0.5, nperm = 1000, OOB = T,
mincriterion = 0)))
I would like to visualize the minimal depth distribution and calculate mean minimal depth similar to the output from the RandomForestExplainer package. However, the RandomForestExplainer package only takes in objects from the randomForest function in the randomForest package. It's not an option for me to use this function due to the nature of my data (described above).
I have been combing the internet and have not been able to find a solution. Can someone point me to a way to visualize the minimal depth distribution for all predictors and calculate the mean minimal depth?

Does *metafor* package in R provide forest plot for robust random effects models

I have fit a robust random-effects meta-regression model using metafor package in R.
My full data, as well as reproducible R code appear below.
Questions:
(1) What are the meaning and interpretation of grey-colored diamonds appearing over CIs?
(2) I won't get an overall mean effect when I have moderators, correct?
library(metafor)
d <- read.csv("https://raw.githubusercontent.com/izeh/m/master/d.csv", h = T) ## DATA
res <- robust(rma.uni(yi = dint, sei = SD, mods = ~es.type, data = d, slab = d$study.name),
cluster = d$id)
forest(res)
1) Quoting from help(forest.rma): "For models involving moderators, the fitted value for each study is added as a polygon to the plot." So, the grey-colored diamonds (or polygons) are the fitted values and the width of the diamonds/polygons reflects the width of the CI for the fitted values.
2) No, since there is no longer a single overall effect when your model includes moderators.

Predicted probabilities in R ranger package

I am trying to build a model in R with random forest classification. (By editing the code by Ned Horning) I first used randomForest package but then found ranger, which promises faster calculations.
At first, I used the code below to get predicted probabilities for each class after fitting the model with randomForest as:
predProbs <- as.data.frame(predict(randfor, imageBlock, type='prob'))
The type of probability here is as follows:
We have 500 trees in the model and 250 of them says the observation is class 1, hence the probability is 250/500 = 50%
In ranger, I realized that there is no type = 'prob' option.
I searched and tried some adjustments but couldn't get any progress. I need an object or so containing probabilities as mentioned above with ranger package.
Could anyone give some advice about the issue?
You need to train a "probabilistic classifier"-type ranger object:
library("ranger")
iris.ranger = ranger(Species ~ ., data = iris, probability = TRUE)
This object computes a matrix (n_samples, n_classes) when used in the predict.ranger function:
probabilities = predict(iris.ranger, data = iris)$predictions

How to understand boxplot comparing performance metrics of machine learning models?

I have created several models and plot their accuracy using a boxplot. What does the circles outside the box mean?
I used caret package for building the models and the following code to make a picture.
model_list <- list(rf = rf_model, gbm=gbm_model, xgboost = xgboost_model, treebag = treebag_model)
resamples <- resamples(model_list)
bwplot(resamples, metric="Accuracy")

Variable importance in Caret

I am using the Caret package in R for training logistic regression model for a binary classification problem. I have been able to get the results, accuracy, etc., but I also want the importance of the variables (in decreasing order of importance). I used varImp() function. But according to the documentation, the importance depends on the class :
"For most classification models, each predictor will have a separate variable importance for each class (the exceptions are classification trees, bagged trees and boosted trees)."
How can I get the variable importance for each class ?
Thank you
For the first part, have you tried:
round(importance(myModel$finalModel), 2)
For putting that in decreasing order:
imp <- round(importance(myModel$finalModel), 2)
dfimp <- data.frame(feature = rownames(imp), MeanDecreaseGini = as.numeric(imp))
dfimp[order(dfimp$MeanDecreaseGini, decreasing = T),]

Resources