Tree splits in a H2o GBM Model

Tree splits in a H2o GBM Model - r

By running the commands,
m <- h2o.getModel("depth_grid_model_4")
h2o.varimp(m)
I am able to view the model's performance as well as the variable importance.
How do I view the splits used in each tree of the GBM model?
Thanks

There is a tool to create visualizations for H2O-3 MOJO models. See the full documentation here:
http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/overview-summary.html#viewing-a-mojo
Use R to create and download a MOJO:
library(h2o)
h2o.init()
df <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
model <- h2o.gbm(model_id = "model",
training_frame = df,
x = c("Year", "Month", "DayofMonth", "DayOfWeek", "UniqueCarrier"),
y = "IsDepDelayed",
max_depth = 3,
ntrees = 5)
h2o.download_mojo(model, getwd(), FALSE)
Run the PrintMojo tool (packaged inside h2o.jar) on the command line to make a .png file. You need to download the latest stable H2O-3 release from http://www.h2o.ai/download/ and run the PrintMojo tool from the command line.
# (For MacOS: brew install graphviz)
java -cp h2o.jar hex.genmodel.tools.PrintMojo --tree 0 -i model.zip -o model.gv
dot -Tpng model.gv -o model.png
open model.png

New Tree API was added in H2O in 3.22.0.1. It lets you fetch trees into R/Python objects from any tree-based model in H2O (for details see here):
tree <- h2o.getModelTree(model = airlines.model, tree_number = 1, tree_class = "NO")
Having a tree representation from h2o in R plotting a tree explained here: Finally, You Can Plot H2O Decision Trees in R

You can export the model as POJO with h2o.download_pojo() and then look at the full details of each tree in the file.

Related

Cannot get Tensorboard visualization tool to open

I cannot get the TensorBoard to open using RStudio and Keras package.
I am trying to duplicate the TensorBoard using the Keras package with R Studio as shown here: https://tensorflow.rstudio.com/tools/tensorboard/tensorboard/
Either I have a problem or I don't understand what needs to be done.
I am using these instructions:
# launch TensorBoard (data won't show up until after the first epoch)
tensorboard("logs/run_a")
# fit the model with the TensorBoard callback
history <- model %>% fit(
x_train, y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 1,
callbacks = callback_tensorboard("logs/run_a"),
validation_split = 0.2
)
I get this error after running tensorboard("logs/run_a"):
> tensorboard("logs/run_a")
Error in if (tensorboard_version() < "2.0") { :
argument is of length zero
I have tried these versions:
tensorboard("/Users/kevinwilliams/Documents/r-studio-and-git/MNIST/logs/run_a")
tensorboard("logs/run_a")
tensorboard(log_dir = "logs/run_a")
tensorboard(log_dir = "logs/run_a", launch_browser = TRUE)`
The TensorBoard will not open.
The files structure of "logs" and "logs/run_a" was automatically created by these commands.
Training and Validation "events" are being saved to the file locations.
"fitting" the model does run and execute with no error. The output is sent to the RStudio Viewer and not to the TensorBoard.
Keras V2.7.0
RStudio 1.4.1717
R 4.1.1

It is just a file logging me even plots with matlibplot 👈👈👈
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0, 1]) # 0.5 - 1
plt.legend(loc='lower right')
Can you see there is some logging generated at target folder where you specific or checkpoints when you read it.?
Only using a tf.profile services to communicate with sub-services they already listening.
options = tf.profiler.experimental.ProfilerOptions(host_tracer_level = 3,
python_tracer_level = 1, device_tracer_level = 1)
tf.profiler.experimental.start(log_dir, options = options)
tf.profiler.experimental.server.start(6009)
tf.profiler.experimental.stop()

Save non-SparkDataFrame from Azure Databricks to local computer as .RData

In Databricks (SparkR), I run the batch algorithm of the self-organizing map in parallel from the kohonen package as it gives me considerable reductions in computation time as opposed to my local machine. However, after fitting the model I would like to download/export the trained model (a list) to my local machine to continue working with the results (create plots etc.) in a way that is not available in Databricks. I know how to save & download a SparkDataFrame to csv:
sdftest # a SparkDataFrame
write.df(sdftest, path = "dbfs:/FileStore/test.csv", source = "csv", mode = "overwrite")
However, I am not sure how to do this for a 'regular' R list object.
Is there any way to save the output created in Databricks to my local machine in .RData format? If not, is there a workaround that would still allow me to continue working with the model results locally?
EDIT :
library(kohonen)
# Load data
sdf.cluster <- read.df("abfss://cluster.csv", source = "csv", header="true", inferSchema = "true")
# Collet SDF to RDF as kohonen::som is not available for SparkDataFrames
rdf.cluster <- SparkR::collect(sdf.cluster)
# Change rdf to matrix as is required by kohonen::som
rdf.som <- as.matrix(rdf.cluster)
# Parallel Batch SOM from Kohonen
som.grid <- somgrid(xdim = 5, ydim = 5, topo="hexagonal",
neighbourhood.fct="gaussian")
set.seed(1)
som.model <- som(rdf.som, grid=som.grid, rlen=10, alpha=c(0.05,0.01), keep.data = TRUE, dist.fcts = "euclidean", mode = "online")
Any help is very much appreciated!

If all your models can fit into the driver's memory, you can use spark.lapply. It is a distributed version of base lapply which requires a function and a list. Spark will apply the function to each element of the list (like a map) and collect the returned objects.
Here is an example of fitting kohonen models, one for each iris species:
library(SparkR)
library(kohonen)
fit_model <- function(df) {
library(kohonen)
grid_size <- ceiling(nrow(df) ^ (1/2.5))
som_grid <- somgrid(xdim = grid_size, ydim = grid_size, topo = 'hexagonal', toroidal = T)
som_model <- som(data.matrix(df), grid = som_grid)
som_model
}
models <- spark.lapply(split(iris[-5], iris$Species), fit_model)
models
The models variable contains a list of kohonen models fitted in parallel:
$setosa
SOM of size 5x5 with a hexagonal toroidal topology.
Training data included.
$versicolor
SOM of size 5x5 with a hexagonal toroidal topology.
Training data included.
$virginica
SOM of size 5x5 with a hexagonal toroidal topology.
Training data included.
Then you can save/serialise the R object as usual:
saveRDS(models, file="/dbfs/kohonen_models.rds")
Note that any file stored into /dbfs/ path will be available through the Databrick's DBFS, accesible with the CLI or API.

Usage of tf.keras within R

I want to convert my articial neural network implementations to the new tensorflow 2 platform, where keras is an implicit part of (tf.keras). Are there any recommended sources that explain the implementation of ANNs using tensorflow 2/tf.keras within R?
Furthermore, why there is an extra keras package from F. Chollet available, when keras is as mentioned an implicit part of tensorflow now?
Sorry guys maybe for such basic questions, but my own searches were unfortunately not crowned with success.

From original tensorflow documentation I extract the following Python code:
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
input2 = keras.layers.Input(shape=(32,))
x2 = keras.layers.Dense(8, activation='relu')(input2)
added = keras.layers.add([x1, x2])
out = keras.layers.Dense(4)(added)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
My own R conversions are
library(tensorflow)
k <- tf$keras
l <- k$layers
input1 <- k$layers$Input(shape = c(16,?))
x1 <- k$layers$Dense(units = 8, activation = "relu") (input1)
input2 <- k$layers$Input(shape = c(32,?))
x2 <- k$layers$Dense(units = 8, activation = "relu") (input2)
added <- k$layers$add(inputs = c(x1,x2))
My question hopefully seems not to be too stupid, but I've problems to implement a python tuple resp. scalar into its R equivalent. So my question: How must the shape argument in the input layers be converted into R?

I think the following page should provide the answer to your question: https://blogs.rstudio.com/ai/posts/2019-10-08-tf2-whatchanges/.
In essence, your code should stay the same if you are using Keras with a version 2.2.4.1 or above. For more details, refer to the linked site above.

Error in R h2o.predict with xgboost -> java.lang.NullPointerException

First of all thanks for implementing XGBoost in h2o!
Unfortunately I am unable to predict from an h2o xgboost model that's loaded from disk (which I'm sure you can appreciate is really frustrating).
I am using the latest stable release of h2o i.e. 3.10.5.2 & I am using an R client.
I have included an example below that should enable you to reproduce the issue,
Thanks in advance
### Start h2o
require(h2o)
local_h2o = h2o.init()
### Source the base data set
data(mtcars)
h2o_mtcars = as.h2o(x = mtcars,destination_frame = 'h2o_mtcars')
### Fit a model to be saved
mdl_to_save = h2o.xgboost(model_id = 'mdl_to_save',y = 1,x = 2:11,training_frame = h2o_mtcars) ##This class doesnt work
#mdl_to_save = h2o.glm(model_id = 'mdl_to_save',y = 1,x = 2:11,training_frame = h2o_mtcars) ##This class works
### Take some reference predictions
ref_preds = h2o.predict(object = mdl_to_save,newdata = h2o_mtcars)
### Save the model to disk
silent = h2o.saveModel(object = mdl_to_save,path = 'INSERT_PATH',force = TRUE)
### Delete the model to make sure there cant be any strange locking issues
h2o.rm(ids = 'mdl_to_save')
### Load it back up
loaded_mdl = h2o.loadModel(path = 'INSERT_PATH/mdl_to_save')
### Score the model
### The h2o.predict statement below is what causes the error: java.lang.NullPointerException
lod_preds = h2o.predict(object = loaded_mdl,newdata = h2o_mtcars)
all.equal(ref_preds,lod_preds)

At the time I write this (January 2018), this is still a bug for xgboost. See this ticket for more information.
In the meantime, you can download the model as a pojo or mojo file
h2o.download_pojo(model, path = "/media/somewhere/tmp")
Loading the model back isn't that easy, unfortunately, but you can pass the new data via json to the saved pojo model with the function:
h2o.predict_json()
However, the new data must be provided in json format.
See this question for more details

glmmPQL crashes on inclusion of corSpatial object

Link to data (1170 obs, 9 variables, .Rd file)
Simply read it in using readRDS(file).
I´m trying to setup a GLMM using the glmmPQL function from the MASS package including a random effects part and accounting for spatial autocorrelation. However, R (Version: 3.3.1) crashes upon execution.
library(nlme)
# setup model formula
fo <- hail ~ prec_nov_apr + t_min_nov_apr + srad_nov_apr + age
# setup corSpatial object
correl = corSpatial(value = c(10000, 0.1), form = ~ry + rx, nugget = TRUE,
fixed = FALSE, type = "exponential")
correl = Initialize(correl, data = d)
# fit model
fit5 <- glmmPQL(fo, random = ~1 | date, data = d,
correl = correl, family = binomial)
What I tried so far:
reduce number of observation
play with corSpatial parameters (range and nugget)
reduce number of fixed predictors
execute code on Windows, Linux (Debian) and Mac R installations
While I get no error message on my local pc (RStudio just crashes), running the script on a server returns the following error message:
R: malloc.c:3540: _int_malloc: Assertion (fwd->size & 0x4) == 0' failed. Aborted

I'd use the INLA package to model this, as it allows to use spatially correlated random effects. The required code is a bit too long to place here. Therefore I've place it in a document on http://rpubs.com/INBOstats/spde

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Tree splits in a H2o GBM Model - r

By running the commands, m <- h2o.getModel("depth_grid_model_4") h2o.varimp(m) I am able to view the model's performance as well as the variable importance. How do I view the splits used in each tree of the GBM model? Thanks

You can export the model as POJO with h2o.download_pojo() and then look at the full details of each tree in the file.

Related

Cannot get Tensorboard visualization tool to open

Save non-SparkDataFrame from Azure Databricks to local computer as .RData

Usage of tf.keras within R

Error in R h2o.predict with xgboost -> java.lang.NullPointerException

glmmPQL crashes on inclusion of corSpatial object

Categories

Resources