How to use R script inside H2O Flow? - r

I found this information into the H2O Flow documentation :
H2O Flow supports REST API, R scripts, and CoffeeScript
H2O Flow Documentation
Into H2O Flow there are special cells for Scala code, but I didn't found any way to use R code inside the flow.

Sorry, R and Python are not supported inside the H2O Flow Web UI.
Try installing RStudio for a nice R IDE.

Related

Spawn subprocess in R

I'm trying to spawn a sub-process in R using the subprocess library, as presented in this tutorial. The Problem is that the program I'm trying to launch requires an additional command after the executable.
Example:
I would launch the command from the shell like this:
monetdbd create mydb
where 'create' is the additional command and 'mydb' a parameter.
I tried giving 'create mydb' as parameters in R like this:
handle <- spawn_process('/usr/local/bin/monetdb', c('create mydb'))
However from the output I got with
process_read(handle, PIPE_STDOUT, timeout = 3000)
I conclude that the parameters don't work as I'm getting the info message from monetdb on how to call it, just as if I call only 'monetdb' without the create command from the shell:
Usage: monetdb [options] command [command-options-and-arguments]
The second thing I tried is to include the create command into the path, but this leads to a "No such file and directory" error.
Any hints are appreciated.
MonetDB is the daemon process for MonetDB and has little to do with the (now old) version of MonetDBlite used in R. The latter one is decommissioned from CRAN and a newer version of MonetDBlite is expected to arrive early next year.
Without knowing anything about the package you’re using, and going purely by the documentation, I think you need to separate the command line arguments you pass to the functions:
handle <- spawn_process('/usr/local/bin/monetdb', c('create', 'mydb'))
This also follows the “conventional” API of spawn/fork/exec functions.
In addition, using c(…) is (almost) only necessary when creating a vector of multiple elements. In your code (and in the tutorial) it’s unnecessary around a single character string.
Furthermore, contrary to what the tutorial claims, this functionality is actually already built into R via the system2 and pipe functions (although I don’t doubt that the subprocess package is more feature-complete, and likely easier to use).
But if your ultimate goal is to use MonetDB in R then you’re probably better advised following the other answer, and using dedicated MonetDB R bindings rather than interacting with the daemon binary via subprocess communication.

Keras in R and tensorflow backend

I need to limit the number of threads running my neural network using these instructions here: https://github.com/keras-team/keras/issues/4740.
However, I am using keras in R, and I am not sure how do I access the tensorflow implementation used in keras I load in R using
library("keras")
I can call library(tensorflow), however, isn't it loading a library copy unrelated to the one loaded by keras? And I cannot find any functionality in R that allows to load tensorflow backend associated with keras in Rstudio. Also I cannot find any links to anyone doing the same.
Can someone suggest a way to do the operations in the link from R, given keras loaded with library("keras") (in the link tensorflow backend for keras is used to set the number of threads per core). It would also be good to know how to check which version is loaded into R by keras.

From R to scala: Importing Libraries

I came from R and I am trying to use scala to explore the possibilities to do data science. I don't have any background in programming or computer science, my background is pretty much statistical. So far I am only using scala from the REPL, which I like because it reminds my of the R console.
I am encountering problems when I am trying to import new libraries. In R, within the R console, I would just type
library(tidyverse)
In scala I am trying to do something similar, however it doesn't really work. Here what I see:
Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172).
Type in expressions for evaluation. Or try :help.
scala> import org.apache.spark.mllib.linalg.vectors
<console>:11: error: object apache is not a member of package org
import org.apache.spark.mllib.linalg.vectors
^
What am I doing wrong?
Thanks
Apache Spark is not a simple package that you can import from the standard Scala library, but rather somewhat of an ecosystem on its own, consisting of JARs with Java/Scala API's, cluster managers, distributed file systems, various launcher scripts and interactive shells (for Scala, but e.g. also for Python).
It's not a single interactive script that you run on your computer. It's rather a complex conglomerate of cooperating programs running on a cluster.
You have several options:
Use SBT: declare spark as a dependency in build.sbt, run it in standalone-mode from the SBT console or as properly built project, with run
Essentially same as 1., but use Ammonite with $ivy imports for managing dependencies.
Just go to the Spark website and follow installation instructions there. Among many other things, it should sooner or later give you a script that starts an interactive Scala REPL with all the dependencies that are needed to run Spark jobs.
I'd suggest to go right to step 3. and download Spark from here.

Run a R Model using SparkR

Thanks in advance for your input. I am a newbie to ML.
I've developed a R model (using R studio on my local) and want to deploy on the hadoop cluster having R Studio installed. I want to use SparkR to leverage high performance computing. I just want to understand the role of SparkR here.
Will SparkR enable the R model to run the algorithm within Spark ML on the Hadoop Cluster?
OR
Will SparkR enable only the data processing and still the ML algorithm will run within the context of R on the Hadoop Cluster?
Appreciate your input.
These are general questions, but they actually have a very simple & straightforward answer: no (to both); SparkR wiil do neither.
From the Overview section of the SparkR docs:
SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.
SparkR cannot even read native R models.
The idea behind using SparkR for ML tasks is that you develop your model specifically in SparkR (and if you try, you'll also discover that it is much more limited in comparison to the plethora of models available in R through the various packages).
Even conveniences like, say, confusionMatrix from the caret package, are not available, since they operate on R dataframes and not on Spark ones (see this question & answer).

h2o implementation in R

I am learning h2o package now,
I installed h2o package from CRAN and couln't run this code
&#35&#35 To import small iris data file from H\ :sub:`2`\ O's package
irisPath = system.file("extdata", "iris.csv", package="h2o")
iris.hex = h2o.importFile(localH2O, path = irisPath, key = "iris.hex")
I am getting the below error,
Error in h2o.importFile(localH2O, path = irisPath, key = "iris.hex") :
unused argument (key = "iris.hex")
My second question is, Do we have good resources to learn h2o in R apart from this:
http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/Ruser/rtutorial.html
My third question is I want to know how the h2o works in simple words.?
The reason this code no longer works is that there was a breaking API change from H2O 2.0 to H2O 3.0 back in 2015. The docs you have discovered (probably via a Google search) are for a very old version of H2O 2.0. The up-to-date docs can always be found at http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html
Answering your error question:
H2O changed a bit from this documentation. Reading the iris file works as follows:
iris.hex = h2o.importFile(path = irisPath, destination_frame = "iris.hex")
Your second (and third question) is against SO rules. But below you will find a short list of resources:
H2O training materials (go to the h2o.ai
website) and go to general documentation. You can find all the
material there presented on h2o world 2015. There is also a link to
h2o university.
Check their blog. There are some gold nuggets in there.
Read the booklets they have available on GBM, GLM, Deep Learning. They contain examples in R and Python.
Kaggle. Search the scripts / kernels for h2o.
As for your third question, read their "Why H2O pages".
To answer your question about how H2O works it is little hard to put together here. however in nutshell, H2O is an open source enterprise ready machine intelligence engine with accessibility from popular machine learning languages i.e. R, Python as well as programming languages Java and Scala. Enterprise ready means users can distribute execution to multiple machines depending on extremely large size of data. The Java based core engine has builtin multiple algorithms implementation and any language interface goes through interpreter to H2O core engine which could be a distributed cluster to build models and score results. There is a lot in between so I would suggest visiting link below to learn more about H2O architecture and execution from various supported language:
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/architecture.html
You can dig out more on H2O implementation in R starting from installation to implementation of h2o machine learning library in R. Go through this link.
This also helps you in order to implement h2o machine learning on top of SparkR framework.
If you want to get an idea of h2o working prototype from very basic than follow this link. It provides the basic flavor of working prototype with code walk-through (quick learning tutorial).
Apart from above points, it also covers the following key points:
How to convert H2O data frame to R and Spark data frame and vice-versa
What are the pros and cons between SparkMLlib and H2O machine library
What are the strengths of h2o compare to other ML library
How to apply ML algorithm to R and Spark data frame etc.

Resources