How to obtain a heatmap or network diagram of variable interaction through a machine learning model - r

I try to use the "vivid" package in R to draw the correlation of variable interactions, because this package can directly analyze the trained machine learning model, and its visualization is very beautiful, but when I execute the vivi() command , I get the wrong information, I don't know what's wrong, because I can execute another instruction in the package: pdpPairs(), and get the PDPS diagram, although I know it's difficult without more information , but I'm hoping to get an answer.Or you can also tell me if there is any other way to directly analyze the trained machine learning model to get a heatmap or network diagram of variable interaction, which will be very useful to me
viFit <- vivi(
fit = gbm_fit,
data = variable_data,
response = "Y1_learning",
gridSize = 10,
importanceType = NULL,
nmax = 100,
reorder = TRUE,
class = 1,
predictFun = NULL)
Agnostic variable importance method used.
Calculating interactions...
Error in mat[vars2] <- res[["value"]] : invalid subscript type 'list'

Related

Writing a prediction equation from plsr model

Greeting to everyone.
I sucessfully computed pls-r model in R using the code below
pls_modB_Kexch_2 <- plsr(Av.K_exc~., data = trainKexch.sar.veg, scale=TRUE,method= "s",validation='CV')
The regression coeffiecents for ncomps =11 were
(
Intercept)= -4.692966e+05,
Easting = 6.068582e+03, Northings= 7.929767e+02,
sigma_vv = 8.024741e+05, sigma_vh = -6.375260e+05,
gamma_vv = -7.120684e+05, gamma_vh = 4.330279e+05,
beta_vv = -8.949598e+04, beta_vh = 2.045924e+05,
c11_db = 2.305016e+01, c22_db = -4.706773e+01,
c12_real = -1.877267e+00.)
It predicts well new data sets when applied with in R enviroment.
My challenge is presenting this model in form of y=sum(AX)+Bo equation where A are coeffiecents of respective variablesX
Or any other mathmetical form, that can be presented academically.
I tried a direct way by multiplying the coeff.to each variable and suming them up, aquick manual trial for predictions gave me strange results. Am missing something here, please help.

Plot ctree using rpart.plot functionality

Been trying to use the rpart.plot package to plot a ctree from the partykit library. The reason for this being that the default plot method is terrible when the tree is deep. In my case, my max_depth = 5.
I really enjoy rpart.plot's output as it allows for deep trees to visually display better. How the output looks for a simple example:
rpart
library(partykit)
library(rpart)
library(rpart.plot)
df_test <- cu.summary[complete.cases(cu.summary),]
multi.class.model <- rpart(Reliability~., data = df_test)
rpart.plot(multi.class.model)
I would like to get this output from the partykit model using ctree
ctree
multi.class.model <- ctree(Reliability~., data = df_test)
rpart.plot(multi.class.model)
>Error: the object passed to prp is not an rpart object
Is there some way one could coerce the ctree object to rpart so this would run?
To the best of my knowledge all the other packages for visualizing rpart trees are really rpart-specific and not based on the agnostic party class for representing trees/recursive partitions. Also, we haven't tried to implement an as.rpart() method for party objects because the rpart class is really not well-suited for this.
But you can try to tweak the partykit visualizations which are customizable through panel functions for almost all aspects of the tree. One thing that might be helpful is to compute a simpleparty object which has all sorts of simple summary information in the $info of each node. This can then be used in the node_terminal() panel function for printing information in the tree display. Consider the following simple example for predicting one of three school types in the German Socio-Economic Panel. To achieve the desired depth I switch significance testing essentiall off:
library("partykit")
data("GSOEP9402", package = "AER")
ct <- ctree(school ~ ., data = GSOEP9402, maxdepth = 5, alpha = 0.5)
The default plot(ct) on a sufficiently big device gives you:
When turning the tree into a simpleparty you get a textual summary by default:
st <- as.simpleparty(ct)
plot(st)
This has still overlapping labels so we could set up a small convenience function that extracts the interesting bits from the $info of each node and puts them into a longer character vector with less wide entries:
myfun <- function(i) c(
as.character(i$prediction),
paste("n =", i$n),
format(round(i$distribution/i$n, digits = 3), nsmall = 3)
)
plot(st, tp_args = list(FUN = myfun), ep_args = list(justmin = 20))
In addition to the arguments of the terminal panel function (tp_args) I have tweaked the arguments of the edge panel function (ep_args) to avoid some of the overplotting in the edges.
Of course, you could also change the entire panel function and roll your own...

How to generate the actual results of an IRF() function within the vars package?

Somehow, I am unable to generate the actual underlying values of the IRFs. See code of a simple VAR model.
irf5<-irf(var2, impulse = "libor", response = "y", n.ahead = 10, ortho = TRUE, boot = TRUE, CI = 0.95, runs = 100)
I can generate the resulting IRF plots just fine with this code:
plot(irf5)
But, I can't generate the underlying values. I'd like to do so to have precise figures. Visually interpreting IRFs is not that accurate. Using the summary() did not provide me this information.
I think you need just to write irf5 in command line and push ctrl+enter. If you can plot irf, so there are no errors, you can easily get IRF's values. In other words, inirf5<-irf(var2,....) you just generate and save new variable, but not call to it
You should be able to get the values by using irf5$irf

R problem with randomForest classification with raster package

I am having an issue with randomForest and the raster package. First, I create the classifier:
library(raster)
library(randomForest)
# Set some user variables
fn = "image.pix"
outraster = "classified.pix"
training_band = 2
validation_band = 1
original_classes = c(125,126,136,137,151,152,159,170)
reclassd_classes = c(122,122,136,137,150,150,150,170)
# Get the training data
myraster = stack(fn)
training_class = subset(myraster, training_band)
# Reclass the training data classes as required
training_class = subs(training_class, data.frame(original_classes,reclassd_classes))
# Find pixels that have training data and prepare the data used to create the classifier
is_training = Which(training_class != 0, cells=TRUE)
training_predictors = extract(myraster, is_training)[,3:nlayers(myraster)]
training_response = as.factor(extract(training_class, is_training))
remove(is_training)
# Create and save the forest, use odd number of trees to avoid breaking ties at random
r_tree = randomForest(training_predictors, y=training_response, ntree = 201, keep.forest=TRUE) # Runs out of memory, does not allow more trees than this...
remove(training_predictors, training_response)
Up to this point, all is good. I can see that the forest was created correctly by looking at the error rates, confusion matrix, etc. When I try to classify some data, however, I run into trouble with the following, which returns all NA's in predictions:
# Classify the whole image
predictor_data = subset(myraster, 3:nlayers(myraster))
layerNames(predictor_data) = layerNames(myraster)[3:nlayers(myraster)]
predictions = predict(predictor_data, r_tree, type='response', progress='text')
And gives this warning:
Warning messages:
1: In `[<-.factor`(`*tmp*`, , value = c(1, 1, 1, 1, 1, 1, ... :
invalid factor level, NAs generated
(keeps going like this)...
However, calling predict.randomForest directly works fine and returns the expected predictions (this is not a good option for me because the image is large, and I cannot store the whole matrix in memory):
# Classify the whole image and write it to file
predictor_data = subset(myraster, 3:nlayers(myraster))
layerNames(predictor_data) = layerNames(myraster)[3:nlayers(myraster)]
predictor_data = extract(predictor_data, extent(predictor_data))
predictions = predict(r_tree, newdata=predictor_data)
How can I get it to work directly with the "raster" version? I know that this is possible, as shown in the examples of predict{raster}.
You could try nesting predict.randomForest within the writeRaster function and write the matrix as a raster in chunks as per the pdf included in the raster package. Before that, try the argument 'na.rm=TRUE' when calling predict in the raster function. You might also assign dummy values to the NAs in the predict rasters, then later rewriting them as NAs using functions in the raster package.
As for memory problems when calling RFs, I've had a plethora of memory issues dealing with BRTs. They're immense on disk and in memory! (Should a model be more complex than the data?) I've not had them run reliably on 32-bit machines (WinXp or Linux). Sometimes tweaking Windows memory allotment to applications has helped, and moving to Linux has helped more, but I get the most from 64-bit Windows or Linux machines, since they impose a higher (or no) limit on the amount of memory applications can take. You may be able to increase the number of trees you can use by doing this.

Bisecting K-Means on a Data Set in R

I am doing my first assignment in Data Science (Masters level) and do not come from a programming background. I have completed a K-Means model on my data (which is a a simple test data set). But now I want to implement bisecting k-means in order to show how this can improve the clustering result. I am coding in R, does anyone have any knowledge on how to code bisecting k-means in R, for someone who is fairly new to the field?
The code I am trying to use is:
bkmeansset <- ml_bisecting_kmeans(x, formula = NULL, k = 3, max_iter = 20,
seed = NULL, min_divisible_cluster_size = 1, features_col = "features",
prediction_col = "prediction", uid =
random_string("bisecting_bisecting_kmeans_"))
I am inputting a test set called "testset" and I am not sure where to but this in the argument of the function. The error message that I am getting is:
Error in UseMethod("ml_bisecting_kmeans") :
no applicable method for 'ml_bisecting_kmeans' applied to an object of class
"character"

Resources