How do I create shap plot in R for GBM model? - r

I want to creat a shap plot for feature importance, for GBM model:
ctrlCV = trainControl(method = 'repeatedcv', repeats = 5 , number = 10 , classProbs = TRUE , savePredictions = TRUE, summaryFunction = twoClassSummary )
gbmFit = train(CR~., data = training_set,
method = "gbm",
metric="ROC",
trControl = ctrlCV,
tuneGrid = gbmGRID,
verbose = FALSE)
however, all examples I found are for xgboost model, packages like SHAPforxgboost and shapr, not working for me. for example:
shap_values <- shap.values(xgb_model = gbm_fit, X_train = tarining_set)
produces and error:
error in `colnames<-`(`*tmp*`, value = c(colnames(x_train), "bias")) : attempt to set 'colnames' on an object with less than two dimensions
I need a plot like this:
How can I do that?
EDIT - my train set using dput():
structure(list(CR = c("nonComplete", "nonComplete", "nonComplete",
"nonComplete", "nonComplete", "nonComplete", "nonComplete", "nonComplete",
"nonComplete", "nonComplete"), gender = c(1, 0, 0, 0, 1, 0, 0,
1, 0, 1), CD4.T.cells = c(-0.0741098696855045, -0.094401270881699,
0.0410284948786532, -0.163302950330185, -0.0942478217207681,
-0.167314411991775, -0.118272811489486, -0.0366277340916379,
-0.0809646843667242, -0.140727850456348), CD8.T.cells = c(-0.178835447722468,
-0.253897294559596, -0.0372301980787381, -0.230579110769457,
-0.224125346052727, -0.196933050675633, -0.344608041139497, -0.0550538743643369,
-0.276178546845023, -0.235047665605314), T.helpers = c(-0.0384421660291032,
-0.0275306107582565, 0.186447606591857, -0.124972070102036, -0.15348122673842,
-0.106812144494277, -0.104757782473888, 0.0686746776877563, -0.0729755869081981,
-0.0783448555726869), NK.cells = c(-0.0924083910597563, -0.172356328661097,
-0.0172673823614314, 0.0280649471541352, -0.128925304635747,
-0.0875076743713435, -0.188649323737844, -0.0518877213975413,
-0.184546079512101, -0.100562282085102), Monocytes = c(-0.0680848706469295,
-0.173427291586957, -0.0106773958944477, -0.0015805672257001,
-0.0751114943036091, -0.0737177243152751, -0.211297995211542,
-0.0674023045286274, -0.149380203815874, -0.0352058106388986),
Neutrophils = c(-0.0391833488213571, -0.0275279418713283,
0.0156454755097513, 0.0285160860867748, -0.0633367938488132,
0.0252778805872529, -0.0827920017974784, 0.0432343965225797,
-0.0693846217599099, -0.0249227307025501), gd.T.Cells = c(-0.162246594987039,
-0.297759223265742, -0.0814825699645205, -0.0688779846190755,
-0.222281334925374, -0.264420103679214, -0.251924422671008,
-0.162709306032616, -0.292342418053931, -0.246818199922858
), Non.plasma.B.cells = c(-0.0384755654971015, -0.114370815587458,
0.161268251261644, -0.0571463865006043, -0.112851511342984,
-0.0822058328898433, -0.118367014322845, 0.114155959200915,
-0.0923514068231641, -0.115614038543851)), row.names = c("Pt1",
"Pt10", "Pt101", "Pt103", "Pt106", "Pt11", "Pt17", "Pt18", "Pt26",
"Pt27"), class = "data.frame")

I've faced this probelm before and for me it only worked for xgboost models. This should work for you, using the shapviz package:
library(shapviz)
shp = shapviz(model, X_pred = data.matrix(data[,-1]), X = data)
sv_waterfall(shp, row_id = 1)
sv_importance(shp, kind = 'beeswarm')

Related

How to add custom bias/offset in modeling neural network through neuralnet in R?

I have a code for a neural network model which uses keras.
features <- layer_input ( shape=c(ncol(feature_matrix)))
net <- features %>%
layer_dense(units=q,activation='tanh') %>%
layer_dense(units=1,activation=k_exp)
volumes <- layer_input(shape=c(1))
offset <- volumes %>%
layer_dense(units=1,activation='linear',use_bias=FALSE,trainable=FALSE,
weights=list(array(1,dim=c(1,1))))
merged <- list(net,offset) %>%
layer_multiply()
model <- keras_model(inputs=list(features,volumes),outputs=merged)
model %>% compile(loss='mse',optimizer='rmsprop')
fit <- model %>% fit(list(feature_matrix,offset_matrix),response_matrix,epochs=100,
batch_size=10000,validation_split=0.1)
However, I cannot find a way to display the network architecture with keras. I want to redefine my neural network using the neuralnet package instead.
I just encountered neuralnet and am clueless on where I should insert the custom bias/offset that I have.
Its usage is given by
neuralnet(formula, data, hidden = 1, threshold = 0.01,
stepmax = 1e+05, rep = 1, startweights = NULL,
learningrate.limit = NULL, learningrate.factor = list(minus = 0.5,
plus = 1.2), learningrate = NULL, lifesign = "none",
lifesign.step = 1000, algorithm = "rprop+", err.fct = "sse",
act.fct = "logistic", linear.output = TRUE, exclude = NULL,
constant.weights = NULL, likelihood = FALSE)
How do I do that?

Activation function used for mlpML in Caret

I am using the Caret package in R, trying to implement multi-layer perceptron for classifying satellite images. I am using method=mlpML, and I would like to know which activation function is being used.
Here is my code:
controlparameters<-trainControl(method = "repeatedcv",
number=5,
repeats = 5,
savePredictions=TRUE,
classProbs = TRUE)
mlp_grid<-expand.grid(layer1=13,
layer2=0,
layer3=0)
model< train(as.factor(Species)~.,
data = smotedata,
method='mlpML',
preProc = c('center', 'scale'),
trcontrol=controlparameters,
tuneGrid=mlp_grid,
importance=T)
I used a single layer since it performed the best than using multi-layers.
Looking at the caret source code for mlpML, it turns out that it uses the mlp function of the RSNNS package.
According to the RSNNS mlp documentation, its default arguments are:
mlp(x, ...)
## Default S3 method:
mlp(x, y, size = c(5), maxit = 100,
initFunc = "Randomize_Weights", initFuncParams = c(-0.3, 0.3),
learnFunc = "Std_Backpropagation", learnFuncParams = c(0.2, 0),
updateFunc = "Topological_Order", updateFuncParams = c(0),
hiddenActFunc = "Act_Logistic", shufflePatterns = TRUE,
linOut = FALSE, outputActFunc = if (linOut) "Act_Identity" else
"Act_Logistic", inputsTest = NULL, targetsTest = NULL,
pruneFunc = NULL, pruneFuncParams = NULL, ...)
from which it is apparent that hiddenActFunc = "Act_Logistic", i.e the activation function for the hidden layers, is the logistic one.

Error in h2o.deeplearning

I am working on h20 deep learning - autoencoders and Ihave initiated h20 and also I have created training and testing datasets,
But when I use h2o.deeplearning am getting the below error, any help would be helpful.
The error is
Error in h2o.deeplearning(x = features, training_frame = train_unsupervised, : unused arguments (training_frame = train_unsupervised, model_id = "model_nn")"
Code Snippet:
model_nn <- h2o.deeplearning(x = features,
training_frame = train_unsupervised,
model_id = "model_nn",
autoencoder = TRUE,
reproducible = TRUE,
ignore_const_cols = FALSE,
seed = 42,
hidden = c(10, 2, 10),
epochs = 100,
activation = "Tanh")

Error when using neural networks (CARET package)

Code:
library(nnet)
library(caret)
#K-folds resampling method for fitting model
ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10,
allowParallel = TRUE) #10 separate 10-fold cross-validations
nnetGrid <- expand.grid(decay = seq(0.0002, .0008, length = 4),
size = seq(6, 10, by = 2),
bag = FALSE)
set.seed(100)
nnetFitcv <- train(R ~ .,
data = trainSet,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = ctrl,
preProc = c("center", "scale"),
linout = TRUE,
## Reduce the amount of printed output
trace = FALSE,
## Expand the number of iterations to find
## parameter estimates..
maxit = 2000,
## and the number of parameters used by the model
MaxNWts = 5 * (34 + 1) + 5 + 1)
Error:
Error in train.default(x, y, weights = w, ...) :
final tuning parameters could not be determined
In addition: Warning messages:
1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
2: In train.default(x, y, weights = w, ...) :
missing values found in aggregated results
data:
dput(head(trainSet))
structure(list(fy = c(317.913756282, 365.006253069, 392.548100067,
305.350697829, 404.999341917, 326.558279739), fu = c(538.962896683,
484.423120589, 607.974981919, 566.461909098, 580.287855801, 454.178316794
), E = c(194617.707566, 181322.455065, 206661.286272, 182492.029532,
189867.929239, 181991.379749), eu = c(0.153782620813, 0.208857408687,
0.29933255604, 0.277013319499, 0.251278125174, 0.20012525805),
imp_local = c(1555.3450957, 1595.41614044, 763.56392418,
1716.78277731, 1045.72429616, 802.742305814), imp_global = c(594.038972858,
1359.48216529, 1018.89209367, 850.887850177, 1381.3557372,
1714.66351462), teta1c = c(0.033375064111, 0.021482368218,
0.020905367537, 0.006956337817, 0.034913536977, 0.03009770223
), k1c = c(4000921.55552, 4499908.41979, 9764999.26902, 9273400.46159,
6163057.88855, 12338543.5703), k2_2L = c(98633499.5682, 53562216.5496,
51597126.6866, 79496746.0098, 54060378.6334, 88854286.5457
), k2_3L = c(53752551.0262, 125020222.794, 124021434.482,
125817803.431, 75021821.6702, 35160224.288), k2_4L = c(56725106.5978,
126865701.893, 145764489.664, 64837586.8755, 49128911.0832,
70088564.0166), bmaxc = c(3481281.32908, 4393584.00639, 2614830.02391,
3128593.72039, 3179348.29527, 4274637.35956), dfactorc = c(2.5474729895,
2.94296926288, 2.79505551368, 2.47882735165, 2.46407943564,
1.41121223341), amaxc = c(73832.9746763, 99150.5068997, 77165.4338508,
128546.996471, 53819.0447533, 54870.9707106), teta1s = c(0.015467320192,
0.013675755546, 0.031668366149, 0.028898297322, 0.019211801086,
0.013349768955), k1s = c(5049506.54552, 11250622.6842, 13852560.5089,
18813117.5726, 18362782.7372, 14720875.0829), k2_ab1s = c(276542468.441,
275768806.723, 211613299.608, 264475187.749, 162043062.526,
252936228.465), k2_ab2s = c(108971516.033, 114017918.32,
248886114.151, 213529935.615, 236891513.077, 142986118.909
), k2_ab3s = c(33306211.9166, 28220338.4744, 40462423.2281,
23450400.4429, 46044346.1128, 23695405.2598), bmaxab1 = c(4763935.86742,
4297372.01966, 3752983.00638, 4861240.46459, 4269771.8481,
4162098.23435), bmaxab2 = c(1864128.647, 1789714.6047, 2838412.50704,
2122535.96812, 2512362.60884, 1176995.61871), ab1 = c(66.4926766666,
42.7771212442, 45.4212664748, 50.3764074404, 35.4792060556,
34.1116517971), ab2 = c(21.0285105309, 23.5869838719, 18.8524808986,
10.1121885612, 10.9695055644, 12.1154127169), dfactors = c(2.47803921947,
0.874644748155, 0.749837099991, 1.96711589185, 2.5407774352,
1.28554379333), teta1f = c(0.037308451805, 0.035718600749,
0.012495093438, 0.000815957999, 0.002155991091, 0.02579104469
), k1f = c(14790480.9871, 17223538.1853, 19930679.8931, 3524230.46974,
15721827.0137, 13599317.0371), k2f = c(55614283.976, 54695745.7762,
86690362.7036, 99857853.7312, 63119072.711, 37510791.5472
), bmaxf = c(2094770.19484, 3633133.51482, 1361188.05421,
2001027.51219, 2534273.6726, 3765850.14143), dfactorf = c(0.745459795314,
2.04869176933, 0.853221909609, 1.76652410119, 0.523675021418,
1.0808768613), k2b = c(1956.92858062, 1400.78738327, 1771.23607857,
1104.05501369, 1756.6767193, 1509.9294956), amaxb = c(38588.0915097,
35158.1672213, 25711.062782, 21103.1603387, 27230.6973685,
43720.3558889999), dfactorb = c(0.822346959126, 2.34421354848,
0.79990635332, 2.99070447299, 1.76373031599, 1.38640223249
), roti = c(16.1560390049, 12.7223971386, 6.43238062144,
15.882552267, 16.0836252663, 18.2734832893), rotmaxbp = c(0.235615453341,
0.343204895932, 0.370304533553, 0.488746319999, 0.176135112774,
0.46921999001), R = c(0.022186087, 0.023768855, 0.023911029,
0.023935705, 0.023655335, 0.022402726)), .Names = c("fy",
"fu", "E", "eu", "imp_local", "imp_global", "teta1c", "k1c",
"k2_2L", "k2_3L", "k2_4L", "bmaxc", "dfactorc", "amaxc", "teta1s",
"k1s", "k2_ab1s", "k2_ab2s", "k2_ab3s", "bmaxab1", "bmaxab2",
"ab1", "ab2", "dfactors", "teta1f", "k1f", "k2f", "bmaxf", "dfactorf",
"k2b", "amaxb", "dfactorb", "roti", "rotmaxbp", "R"), row.names = c(7L,
8L, 20L, 23L, 28L, 29L), class = "data.frame")
data has no equal rows or zero values or NaNs. Any help is appreciated.
I guess the problem is caused by MaxNWts, which is The maximum allowable number of weights. The value you gave is less than the weights for networks with size larger than 5 units. It should be at least:
MaxNWts = max(nnetGrid$size)*(ncol(trainSet) + output_neron)
+ max(nnetGrid$size) + output_neron
So, in your case, it should be at least MaxNWts = 10 * (34 + 1) + 10 + 1

Putting series summary of ugarchboot into a dataframe

I am looking at the ugarchboot function in rugarch but I am having trouble getting the Series (summary) into a dataframe.
library(rugarch)
data(dji30ret)
spec = ugarchspec(variance.model=list(model="gjrGARCH", garchOrder=c(1,1)),
mean.model=list(armaOrder=c(1,1), arfima=FALSE, include.mean=TRUE,
archm = FALSE, archpow = 1), distribution.model="std")
ctrl = list(tol = 1e-7, delta = 1e-9)
fit = ugarchfit(data=dji30ret[, "BA", drop = FALSE], out.sample = 0,
spec = spec, solver = "solnp", solver.control = ctrl,
fit.control = list(scale = 1))
bootpred = ugarchboot(fit, method = "Partial", n.ahead = 120, n.bootpred = 2000)
bootpred
as.data.frame(bootpred, which = "sigma", type = "q", qtile = c(0.01, 0.05))
##I am tring to get this into a dataframe:
Series (summary):
min q.25 mean q.75 max forecast
t+1 -0.24531 -0.016272 0.000143 0.018591 0.16263 0.000743
t+2 -0.24608 -0.018006 -0.000290 0.017816 0.16160 0.000232
t+3 -0.24333 -0.017131 0.001007 0.017884 0.31861 0.000413
t+4 -0.26126 -0.018643 -0.000618 0.017320 0.34078 0.000349
t+5 -0.19406 -0.018545 -0.000453 0.016690 0.33356 0.000372
t+6 -0.23864 -0.017268 -0.000113 0.016001 0.18233 0.000364
t+7 -0.27024 -0.018031 -0.000514 0.017852 0.18436 0.000367
t+8 -0.13926 -0.016676 0.000539 0.017904 0.16271 0.000366
t+9 -0.32941 -0.017221 -0.000194 0.016718 0.13894 0.000366
t+10 -0.19013 -0.015845 0.001095 0.017064 0.14498 0.000366
Thank you for your help.

Resources