Trouble with data loading to H2O on R in Windows - r

I have problems with data loading to H2O in R on Windows. When I run basic commands such as h2o.clusterInfo or as.h2o(localH2O, dat, key = 'dat'), I got an error message - Error in .... : unused argument (...). Like on screen. I use RTVS na Microsoft R Open 3.2.5

The reason that code no longer works is that it's syntax from the H2O 2.0 API, which has been retired for about a year or longer. Since H2O 3.0, h2o.clusterInfo() no longer has arguments and as.h2o() no longer has the key argument. Check out the documentation for these functions inside your H2O R package, or here and here.

Related

Model training fails with h2o deepwater

While trying to train a lenet model for multiclass classification using h2o deepwater using mxnet backed I am getting the following errors:
Loading H2O mxnet bindings.
Found CUDA_HOME or CUDA_PATH environment variable, trying to connect to GPU devices.
Loading CUDA library.
Loading mxnet library.
Loading H2O mxnet bindings.
Done loading H2O mxnet bindings.
Constructing model.
Done constructing model.
Building network.
mxnet data input shape: (32,100)
[10:40:16] /home/jenkins/slave_dir_from_mr-0xb1/workspace/deepwater-master/thirdparty/mxnet/dmlc-core/include/dmlc/logging.h:235: [10:40:16] src/operator/./convolution-inl.h:349: Check failed: (dshape.ndim()) == (4) Input data should be 4D in batch-num_filter-y-x
[10:40:16] src/symbol.cxx:189: Check failed: (MXSymbolInferShape(GetHandle(), keys.size(), keys.data(), arg_ind_ptr.data(), arg_shape_data.data(), &in_shape_size, &in_shape_ndim, &in_shape_data, &out_shape_size, &out_shape_ndim, &out_shape_data, &aux_shape_size, &aux_shape_ndim, &aux_shape_data, &complete)) == (0)
The details of my setup :
* Ubuntu : 16.04
* Ram : 12gb
* Graphics card : Nvidia 920mx driver version : 384.90
* Cuda : 8.0.61
* cudnn : 6.0
* R version : 3.4.3
* H2o version : 3.15.0.393 & h2o-R package : 3.16.0.2
* mxnet : 0.11.0
* Train data size : 400mb (when converting to the h2o frame object it comes around 822mb)
Things I have done :
1.) Gave enough memory to java heap while running h2o cluster (java -Xmx9g -jar h2o.jar)
2.) Build the mxnet from source for gpu
3.) Monitored the gpu and system via nvidia-smi and system monitor. At no point do they eat up all the ram to show "out of memory" issue. I still will be having around 2-3gb free before the error shows up
4.) Have tried with tensorflow-gpu(build from source). Checking the pip list made sure that its installed but during model creation in R it gives the error :
Error: java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: null
5.) The only method I got it the h2o deepwater to work with all the backend and w/wo GPU is through docker setup provided in the installation tutorials.
I wanted the same functionality on my laptop instead of using Docker. Also is there any way to run deepwater using just CPU? The link Is it possible to build Deep Water/TensorFlow model in H2O without CUDA doesn't provide any helpful answers. Any help or advice will be greatly appreciated!
As evident from the error logs and from documentation of mxnet.sym.Convolution your data needs to be in [batch, channels, height, width] format. However it looks like your data contains only two dimensions (based on this log: mxnet data input shape: (32,100)). Reformatting the data, even including two dimensions of size 1 such that your input shape is (1,1,32,100) should resolve this issue.

Error with H2O in R - can't connect to local host

I can't get the h2o to work in my R. It shows the following error. Have no clue what it means. Previously it gave me an error because I didn't have Java 64 bit version. I downloaded the 64bit - restarted my pc - and started the process again and now it gives me this error.
Any suggestions?
library(h2o)
----------------------------------------------------------------------
Your next step is to start H2O:
> h2o.init()
For H2O package documentation, ask for help:
> ??h2o
After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai
----------------------------------------------------------------------
Attaching package: ‘h2o’
The following objects are masked from ‘package:stats’:
cor, sd, var
The following objects are masked from ‘package:base’:
%*%, %in%, &&, ||, apply, as.factor, as.numeric, colnames, colnames<-, ifelse,
is.character, is.factor, is.numeric, log, log10, log1p, log2, round, signif, trunc
> h2o.init(nthreads = -1)
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
C:\Users\ADM_MA~1\AppData\Local\Temp\RtmpygK1EJ/h2o_Adm_Mayur_started_from_r.out
C:\Users\ADM_MA~1\AppData\Local\Temp\RtmpygK1EJ/h2o_Adm_Mayur_started_from_r.err
java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)
Starting H2O JVM and connecting: ............................................................
[1] "localhost"
[1] 54321
[1] TRUE
[1] -1
[1] "Failed to connect to localhost port 54321: Connection refused"
[1] 127
Error in h2o.init(nthreads = -1) :
H2O failed to start, stopping execution.
In addition: Warning message:
running command 'curl 'http://localhost:54321'' had status 127
Screenshot for h2o error in R
Based on the error message and the troubleshooting we carried out in the comments, it seems that you are using a version of Java (Java 1.9) which is too new for your version of H2O.
Your 2 options seem to be:
Verify that your version of H2O is up to date. If not, update it.
Download a compatible version of Java, i.e. Java 1.8 (you can just use it for this 1 task rather than for everything, if you prefer)
Note that on the main documentation page of H2O v3 it says:
Java 7 or later. Note: Java 9 is not yet released and is not currently
supported.
But at the same time they usually have several Beta and Alpha development branches going, so you might find one of those that works with Java 9.
So if anyone else is facing the same issue.
My recommendation (after spending about over 10 hours trying to figure this out (worth mentioning)) is check your version of java.
If it's higher than 8 then either keep it remove it.
I removed it because I didn't want to deal with setting the JAVA Home function in R and to reduce work.
Make sure you install Java 7 or 8 but a 64 bit version. h2o doesn't work if you have 32 bit.
Then voila! Just go ahead and type install.package('h2o') in your rstudio.
I wanted to be extra careful in my final attempt of this so unloaded and uninstalled the library because I had installed it before and then installed it again and then loaded it using library(h2o) and then h20.init() worked just fine.

Create Graphs and Plots Using R in Microsoft R Server

I am going over 'Create Graphs and Plots Using R (Data Science End-to-End Walkthrough)' procedure.
Please check link https://msdn.microsoft.com/en-us/library/mt629162.aspx
I have issue with step 'Create a Map Plot', when executing:
myplots <- rxExec(mapPlot, inDataSource, googMap, timesToRun = 1)
plot(myplots[[1]][["myplot"]])
I am getting error:
Warning: namespace 'CompatibilityAPI' is not available and has been replaced
by .GlobalEnv when processing object 'inputObject'
====== DESKTOP-PHAA5KQ ( process 1 ) has started run
at 2017-01-24 11:39:07.56 ======
Warning: namespace 'CompatibilityAPI' is not available and has been replaced
by .GlobalEnv when processing object 'inputObject'
Loading required package: ggplot2
Loading required package: maps
# ATTENTION: maps v3.0 has an updated 'world' map. #
# Many country borders and names have changed since 1990. #
# Type '?world' or 'news(package="maps")'. See README_v3. #
Error in slot(from, what) :
no slot of name "maxColWidth" for this object of class "RxSqlServerData"
Calls: source ... anyStrings -> validityMethod -> as -> asMethod -> slot
Execution halted
Error in rxCompleteClusterJob(hpcServerJob, consoleOutput, autoCleanup) :
No results available - final job state: failed
> plot(myplots[[1]][["myplot"]])
Error in plot(myplots[[1]][["myplot"]]) : object 'myplots' not found
Thanks in advance for any suggestion.
I think this is a version issue.
This question gave me some context. Your IDE or client may not be using the same version of R as R-Services.
Check the version of R in your IDE and R-Services using:
R.Version()
For R-Services, navigate to C:\Program Files\Microsoft SQL Server\MSSQL13.YOUR_SERVER_NAME\R_SERVICES\bin then run R.exe as admin.
You might see that the versions are different. In my case I was running 3.3.2 in RStudio but have 3.2.2 in R-Services.
For RStudio, here's how to use different versions of R. Starting RStudio with the Control key held down allowed me to select the R-Services instance and run the code successfully.
You can also change your default library path so that whenever you open your IDE you're working with the server's version of R.
In my case both R services and R studio had same version. And the error message was ggplot2 is required by ggmap. For my solution I uninstalled both ggplot2 and ggmap, Closed R studio and reopened it with administrative rights.

Running R code on linux in parallel on computing cluster

I've recently converted my windows R code to a Linux installation for running DEoptim on a function. On my windows system it all worked fine using:
ans <- DEoptim1(Calibrate,lower,upper,
DEoptim.control(trace=TRUE,parallelType=1,parVAr=parVarnames3,
packages=c("hydromad","maptools","compiler","tcltk","raster")))
where the function 'Calibrate' consisted of multiple functions. On the windows system I simply downloaded the various packages needed into the R library. The option paralleType=1 ran the code across a series of cores.
However, now I want to put this code onto a Linux based computing cluster - the function 'Calibrate' works fine when stand alone, as does DEoptim if I want to run the code on one core. However, when I specify the parelleType=1, the code fails and returns:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
7 nodes produced errors; first error: there is no package called ‘raster’
This error is reproduced whatever package I try and recall, even though the
library(raster)
command worked fine and 'raster' is clearly shown as okay when I call all the libraries using:
library()
So, my gut feeling is, is that even though all the packages and libraries are loaded okay, it is because I have used a personal library and the packages element of DEoptim.control is looking in a different space. An example of how the packages were installed is below:
install.packages("/home/antony/R/Pkges/raster_2.4-15.tar.gz",rpeo=NULL,target="source",lib="/home/antony/R/library")
I also set the lib paths option as below:
.libPaths('/home/antony/R/library')
Has anybody any idea of what I am doing wrong and how to set the 'packages' option in DEoptim control so I can run DEoptim across multiple cores in parallel?
Many thanks, Antony

Java runtime segfaults when saving large data.frame from JRI

I've followed the rtest.java example code from the rJava installation (/usr/lib/R/site-library/rJava/jri/examples/rtest.java on Debian and derivatives) for building data.frames from java arrays.
This works well for small data frames (~10000 rows), however when I try to do this in anger (i.e. > 1000000 rows) it causes the java runtime to segfault.
Oddly, I appear to be able to create the data.frame ok (making the usual rniPutXXXArray calls), however when I come to save the data.frame (using an eval, after assigning the data.frame to an R symbol) the issue occurs.
I can see some debug when I make calls to eval on the R engine, however when I go via the low level interface (rniXXX) I get no debug at all. Is there a way to switch more debug on than I already have?
For what it's worth, here's the top of the segv message. I can of course provide more detail on request.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f1be6259ea5, pid=6898, tid=139758087001856
#
# JRE version: 7.0_03-b21
# Java VM: OpenJDK 64-Bit Server VM (22.0-b10 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea7 2.1.3
# Distribution: Debian GNU/Linux unstable (sid), package 7u3-2.1.3-1
# Problematic frame:
# C [libR.so+0x117ea5] SET_VECTOR_ELT+0x11f5
...
Please ask on stats-rosuda-devel including the actual code you're using. Note that with RNI calls you're responsible for protection of the objects - unfortunately the example code skips that aspect so what probably happens is that due to the size of your objects the garbage collection occurs before you are done with the construction so some of the objects get collected and thus are invalid and R crashes on you. If you want to be safe, protect the columns and then the generic vector you create out of it.
BTW: It is much safer to use the org.rosuda.REngine API instead of using RNI directly. It even provides REXP.createDataFrame() method that does all the work for you.

Resources