How do I create shap plot in R for GBM model? - r
I want to creat a shap plot for feature importance, for GBM model:
ctrlCV = trainControl(method = 'repeatedcv', repeats = 5 , number = 10 , classProbs = TRUE , savePredictions = TRUE, summaryFunction = twoClassSummary )
gbmFit = train(CR~., data = training_set,
method = "gbm",
trControl = ctrlCV,
tuneGrid = gbmGRID,
verbose = FALSE)
however, all examples I found are for xgboost model, packages like SHAPforxgboost and shapr, not working for me. for example:
shap_values <- shap.values(xgb_model = gbm_fit, X_train = tarining_set)
produces and error:
error in `colnames<-`(`*tmp*`, value = c(colnames(x_train), "bias")) : attempt to set 'colnames' on an object with less than two dimensions
I need a plot like this:
How can I do that?
EDIT - my train set using dput():
structure(list(CR = c("nonComplete", "nonComplete", "nonComplete",
"nonComplete", "nonComplete", "nonComplete", "nonComplete", "nonComplete",
"nonComplete", "nonComplete"), gender = c(1, 0, 0, 0, 1, 0, 0,
1, 0, 1), CD4.T.cells = c(-0.0741098696855045, -0.094401270881699,
0.0410284948786532, -0.163302950330185, -0.0942478217207681,
-0.167314411991775, -0.118272811489486, -0.0366277340916379,
-0.0809646843667242, -0.140727850456348), CD8.T.cells = c(-0.178835447722468,
-0.253897294559596, -0.0372301980787381, -0.230579110769457,
-0.224125346052727, -0.196933050675633, -0.344608041139497, -0.0550538743643369,
-0.276178546845023, -0.235047665605314), T.helpers = c(-0.0384421660291032,
-0.0275306107582565, 0.186447606591857, -0.124972070102036, -0.15348122673842,
-0.106812144494277, -0.104757782473888, 0.0686746776877563, -0.0729755869081981,
-0.0783448555726869), NK.cells = c(-0.0924083910597563, -0.172356328661097,
-0.0172673823614314, 0.0280649471541352, -0.128925304635747,
-0.0875076743713435, -0.188649323737844, -0.0518877213975413,
-0.184546079512101, -0.100562282085102), Monocytes = c(-0.0680848706469295,
-0.173427291586957, -0.0106773958944477, -0.0015805672257001,
-0.0751114943036091, -0.0737177243152751, -0.211297995211542,
-0.0674023045286274, -0.149380203815874, -0.0352058106388986),
Neutrophils = c(-0.0391833488213571, -0.0275279418713283,
0.0156454755097513, 0.0285160860867748, -0.0633367938488132,
0.0252778805872529, -0.0827920017974784, 0.0432343965225797,
-0.0693846217599099, -0.0249227307025501), gd.T.Cells = c(-0.162246594987039,
-0.297759223265742, -0.0814825699645205, -0.0688779846190755,
-0.222281334925374, -0.264420103679214, -0.251924422671008,
-0.162709306032616, -0.292342418053931, -0.246818199922858
), Non.plasma.B.cells = c(-0.0384755654971015, -0.114370815587458,
0.161268251261644, -0.0571463865006043, -0.112851511342984,
-0.0822058328898433, -0.118367014322845, 0.114155959200915,
-0.0923514068231641, -0.115614038543851)), row.names = c("Pt1",
"Pt10", "Pt101", "Pt103", "Pt106", "Pt11", "Pt17", "Pt18", "Pt26",
"Pt27"), class = "data.frame")
I've faced this probelm before and for me it only worked for xgboost models. This should work for you, using the shapviz package:
shp = shapviz(model, X_pred = data.matrix(data[,-1]), X = data)
sv_waterfall(shp, row_id = 1)
sv_importance(shp, kind = 'beeswarm')
