h2o implementation in R - r

I am learning h2o package now,
I installed h2o package from CRAN and couln't run this code
&#35&#35 To import small iris data file from H\ :sub:`2`\ O's package
irisPath = system.file("extdata", "iris.csv", package="h2o")
iris.hex = h2o.importFile(localH2O, path = irisPath, key = "iris.hex")
I am getting the below error,
Error in h2o.importFile(localH2O, path = irisPath, key = "iris.hex") :
unused argument (key = "iris.hex")
My second question is, Do we have good resources to learn h2o in R apart from this:
http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/Ruser/rtutorial.html
My third question is I want to know how the h2o works in simple words.?

The reason this code no longer works is that there was a breaking API change from H2O 2.0 to H2O 3.0 back in 2015. The docs you have discovered (probably via a Google search) are for a very old version of H2O 2.0. The up-to-date docs can always be found at http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html

Answering your error question:
H2O changed a bit from this documentation. Reading the iris file works as follows:
iris.hex = h2o.importFile(path = irisPath, destination_frame = "iris.hex")
Your second (and third question) is against SO rules. But below you will find a short list of resources:
H2O training materials (go to the h2o.ai
website) and go to general documentation. You can find all the
material there presented on h2o world 2015. There is also a link to
h2o university.
Check their blog. There are some gold nuggets in there.
Read the booklets they have available on GBM, GLM, Deep Learning. They contain examples in R and Python.
Kaggle. Search the scripts / kernels for h2o.
As for your third question, read their "Why H2O pages".

To answer your question about how H2O works it is little hard to put together here. however in nutshell, H2O is an open source enterprise ready machine intelligence engine with accessibility from popular machine learning languages i.e. R, Python as well as programming languages Java and Scala. Enterprise ready means users can distribute execution to multiple machines depending on extremely large size of data. The Java based core engine has builtin multiple algorithms implementation and any language interface goes through interpreter to H2O core engine which could be a distributed cluster to build models and score results. There is a lot in between so I would suggest visiting link below to learn more about H2O architecture and execution from various supported language:
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/architecture.html

You can dig out more on H2O implementation in R starting from installation to implementation of h2o machine learning library in R. Go through this link.
This also helps you in order to implement h2o machine learning on top of SparkR framework.
If you want to get an idea of h2o working prototype from very basic than follow this link. It provides the basic flavor of working prototype with code walk-through (quick learning tutorial).
Apart from above points, it also covers the following key points:
How to convert H2O data frame to R and Spark data frame and vice-versa
What are the pros and cons between SparkMLlib and H2O machine library
What are the strengths of h2o compare to other ML library
How to apply ML algorithm to R and Spark data frame etc.

Related

How to use R script inside H2O Flow?

I found this information into the H2O Flow documentation :
H2O Flow supports REST API, R scripts, and CoffeeScript
H2O Flow Documentation
Into H2O Flow there are special cells for Scala code, but I didn't found any way to use R code inside the flow.
Sorry, R and Python are not supported inside the H2O Flow Web UI.
Try installing RStudio for a nice R IDE.

Run a R Model using SparkR

Thanks in advance for your input. I am a newbie to ML.
I've developed a R model (using R studio on my local) and want to deploy on the hadoop cluster having R Studio installed. I want to use SparkR to leverage high performance computing. I just want to understand the role of SparkR here.
Will SparkR enable the R model to run the algorithm within Spark ML on the Hadoop Cluster?
OR
Will SparkR enable only the data processing and still the ML algorithm will run within the context of R on the Hadoop Cluster?
Appreciate your input.
These are general questions, but they actually have a very simple & straightforward answer: no (to both); SparkR wiil do neither.
From the Overview section of the SparkR docs:
SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.
SparkR cannot even read native R models.
The idea behind using SparkR for ML tasks is that you develop your model specifically in SparkR (and if you try, you'll also discover that it is much more limited in comparison to the plethora of models available in R through the various packages).
Even conveniences like, say, confusionMatrix from the caret package, are not available, since they operate on R dataframes and not on Spark ones (see this question & answer).

Running h2o in R script in Azure machine learning

I know how to access packages in R scripts in Azure machine learning by either using the Azure supported ones or by zipping up the packages.
My problem now is that Azure machine learning does not support the h2o package and when I tried using the zipped file - it gave an error.
Has anyone figured out how to use h2o in R in Azure machine learning?
So since there was no reply to my question, I made some research and came up with the following:
H2O cannot be ran in a straightforward manner in Azure machine learning embedded R scripts. A workaround the problem is to consider using an Azure created environment - specially for H2O. The options available are:
Spinning up an H2O Artificial Intelligence Virtual Machine solution
Using an H2O application for HDInsight
For more reading, you can go to: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/azure.html

Is there a way to use saved model between different versions of H2O?

I have saved a trained model (deep net, but it is more general I think) in H2O. Now I want to load it by another instance of H2O and use it for scoring, but the problem is, that the version of H2O used for training (3.10.0.3) was different than the one I started the production cluster with (3.10.0.6). The error message is quite self-explanatory
ERROR MESSAGE:
Found version 3.10.0.3, but running version 3.10.0.6
Is there a way to migrate the saved model between versions? Or am I stuck with using the same version of H2O for training and scoring?
Yes, you are stuck using the same version for training and scoring. No migration route.
(You can export a model as a POJO, which can be bundled with the version of h2o-genmodel.jar that it needs. But that requires writing Java code to get the data in and results out, which is not ideal if you are using R code for data preparation.)
This has been discussed on the h2o-stream mailing list before, but I couldn't see a feature request ticket for it, so I just created one: https://0xdata.atlassian.net/browse/PUBDEV-3432

R 3.2.3 new H2o Package (2015) cannot achieve the same results from old package

I was using H2o R package (2014 version) to perform a deep learning task using textual data. I did my research in early 2015 and obtained promising results using deep learning method (function - h2o.deeplearning; e.g. fscore and recall always achieve >0.9). I found that my original R code doesn't work now (due to the change of H2o package in Nov 2015) and i revised my code. However, when i tried to run the same deep learning model (same setting), I could not achieve an outperfom results anymore!! please, I wish to know if H2o has changed any internal modeling settings since the revision of the H2o package? I wish to reproduce my old results with the new package... please kindly help.
H2O Deep Learning (2.0 and 3.0) is not reproducible by default -- you can change this by setting reproducible = TRUE, however that will slow things down quite a bit, as reproducibility requires the code to be run on a single core. Therefore the variability could be due to the randomness in the algorithm alone, rather than from the upgrade of H2O 2.0 to 3.0.
If you want to use H2O Classic (2.0), then your old code will still work, as is. You might try running that first to see if you can track down the source of the variability. There is nothing wrong with using H2O Classic to finish a project that you started a while ago.
Implementation details for H2O 3.0 Deep Learning are available in the Deep Learning booklet.
There is more information on what has changed in H2O DL between H2O 2.0 and 3.0 here.

Resources