H2o with predict_json: "Error: Could not find or load main class water.util.H2OPredictor"? - r

I tried to use the H2o predict_json in R,
h2o.predict_json(modelpath, jsondata)
and got the error message:
Error: Could not find or load main class water.util.H2OPredictor
I am using h2o_3.20.0.8.
I searched the documentation from H2o but didn't help.
> h2o.predict_json(modelpath, jsondata)
$error
[1] "Error: Could not find or load main class water.util.H2OPredictor"
Warning message:
In system2(java, args, stdout = TRUE, stderr = TRUE) :
running command ''java' -Xmx4g -cp .:/Library/Frameworks/R.framework/Versions/3.5/Resources/library/mylib/Models/h2o-genmodel.jar:/Library/Frameworks/R.framework/Versions/3.5/Resources/library/mylib/Models:genmodel.jar:/ water.util.H2OPredictor /Library/Frameworks/R.framework/Versions/3.5/Resources/library/mylib/Models/mymodel.zip '[{"da1":252,"da2":22,"da3":62,"da4":63,"da5":84.83}]' 2>&1' had status 1

It looks like you are missing your h2o-genmodel.jar file - this is what the error message Could not find or load main class water.util.H2OPredictor indicates. You may want to provide all the arguments to checkoff that you have everything:
h2o.predict_json(model, json, genmodelpath, labels, classpath, javaoptions)
documentation here

Related

Error while using write_xes function : Error in defaultvalues[[datatype]] : invalid subscript type 'list'

I want to export an eventlog object built in R using bupaR package function - eventlog as an xes file. For that I am using function write_xes() of package xesreadR. But the function is giving out error :
Error in defaultvalues[[datatype]] : invalid subscript type 'list'
>class(log)
output:
[1] "eventlog" "tbl_df" "tbl" "data.frame"
write_xes(log,"myxes.xes")
According to the documentation it should save the log to the destined file.But instead it is producing the error :
ERROR : Error in defaultvalues[[datatype]] : invalid subscript type
'list'
I have tried multiple things to troubleshoot this problem but haven't came up with a solution. So can somebody help me to solve this error. Thank You!
Your function is defined as follow:
write_xes ( eventlog, case_attributes = NULL, file = file.choose())
Thus, writing
write_xes(log,"myxes.xes")
means
write_xes(eventlog = log, case_attributes = "myxes.xes").
Instead, you shall write
write_xes(eventlog = log, file = "myxes.xes")

Error on the stan file compilation using R 3.6.0. and Win 10

Error in compileCode(f, code, language = language, verbose = verbose) :
Compilation ERROR, function(s)/method(s) not created! Error in .shlib_internal(commandArgs(TRUE)) :
C++14 standard requested but CXX14 is not defined
Calls: <Anonymous> -> .shlib_internal
Execution halted
In addition: Warning message:
In system(cmd, intern = !verbose) :
running command 'C:/PROGRA~1/R/R-36~1.0/bin/x64/R CMD SHLIB file1a1860a0379.cpp 2> file1a1860a0379.cpp.err.txt' had status 1
Error in sink(type = "output") : invalid connection
Some non-English page said that we can overcome by executing the following R script, but it did not work in my case:
dotR <- file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR))
dir.create(dotR)
M <- file.path(dotR, "Makevars")
if (!file.exists(M))
file.create(M)
cat("\nCXX14FLAGS=-O3 -Wno-unused-variable -Wno-unused-function",
"CXX14 = g++ -std=c++1y",
file = M, sep = "\n", append = TRUE)
The above R script is same as in the following page:
https://github.com/stan-dev/rstan/issues/569
I tried to uninstall and install according to the following page, but the above error occurred.
Rstan installation: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
REF; https://github.com/stan-dev/stan/issues/1613
ref: https://github.com/stan-dev/rstan/issues/633
For me, the issue has been solved by adding manually the following line into the file .R/Makevars.win.
CXX14 = "C:\Rtools\mingw_64\bin\g++.exe"

NoSuchMethodException when calling sendKeys on object of class org.openqa.selenium.remote.RemoteWebElement via R package rJava

I am trying to use the selenium webdriver API directly from R using rJava. I am subject to a fairly restrictive IT environment, so I can't access a remote driver currently (hence why I'm not currently using the Rselenium package), and I don't have either Chrome or Firefox availaible--just phantomjs. I am able to get this working okay from the Scala REPL. I used sbt to get all the dependenices--build.sbt contains, for example:
retrieveManaged := true
libraryDependencies ++= Seq (
"org.seleniumhq.selenium" % "selenium-java" % "3.9.1",
"com.codeborne" % "phantomjsdriver" % "1.4.4"
)
(Note that I have phantomjs installed as /usr/local/bin/phantomjs, and it is
version 2.1.1).
I then copied all the jar files to a single-level folder via cp jars/*/*/*.jar alljars/ containing the following:
animal-sniffer-annotations-1.14.jar httpcore-4.4.6.jar selenium-api-3.9.1.jar
byte-buddy-1.7.9.jar j2objc-annotations-1.1.jar selenium-chrome-driver-3.9.1.jar
checker-compat-qual-2.0.0.jar jline-2.14.5.jar selenium-edge-driver-3.9.1.jar
commons-codec-1.10.jar jsr305-1.3.9.jar selenium-firefox-driver-3.9.1.jar
commons-exec-1.3.jar okhttp-3.9.1.jar selenium-ie-driver-3.9.1.jar
commons-logging-1.2.jar okio-1.13.0.jar selenium-java-3.9.1.jar
error_prone_annotations-2.1.3.jar phantomjsdriver-1.4.4.jar selenium-opera-driver-3.9.1.jar
gson-2.8.2.jar scala-compiler-2.12.4.jar selenium-remote-driver-3.9.1.jar
guava-23.6-jre.jar scala-library-2.12.4.jar selenium-safari-driver-3.9.1.jar
httpclient-4.5.3.jar scala-reflect-2.12.4.jar selenium-support-3.9.1.jar
I start Scala via scala -cp "alljars/*" and can the do following:
val drv = new org.openqa.selenium.phantomjs.PhantomJSDriver
drv.get("https://www.google.com")
val q = drv.findElementByName("q")
q.sendKeys("rJava selenium")
q.submit
drv.getTitle
I think the following is roughly the same thing in R using rJava:
library(rJava)
.jinit()
jars <- dir("alljars", pattern = "*.jar", full.names = TRUE)
.jaddClassPath(jars)
drv <- .jnew('org/openqa/selenium/phantomjs/PhantomJSDriver')
drv$get("https://www.google.com")
q <- drv$findElementByName("q")
q$sendKeys("rJava selenium")
q$submit()
drv$getTitle()
This fails at the point q$sendKeys("rJava selenium") with the following error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.lang.NoSuchMethodException: No suitable method for the given parameters
In RStudio, if I type q$ and press TAB, sendKeys is definitely in the list of available methods. I tried to be explicit about this, and tried:
keys <- .jnew("java/lang/String", "rJava selenium")
keys <- .jcast(keys, "java/lang/CharSequence", check = TRUE)
q <- .jcast(q, "org/openqa/selenium/WebElement", check = TRUE)
.jcall(q, "V", "sendKeys", keys)
which resulted in the following error:
Error in .jcall(q, "V", "sendKeys", keys) :
method sendKeys with signature (Ljava/lang/CharSequence;)V not found
q has class org/openqa/selenium/remote/RemoteWebElement in R, and org/openqa/selenium/WebElement in Scala; but in both cases the return is void and the required argument is CharSequence according to the javadocs. I tried a few variations of this--java.lang.String instead of CharSequence, RemoteWebElement instead of WebElement, etc., but no joy.
I doubt this is a problem with rJava, but I'm stumped nonetheless and need help!
Oh good grief. I didn't know about .jmethods. Running this:
> .jmethods(q, "sendKeys")
[1] "public void org.openqa.selenium.remote.RemoteWebElement.sendKeys(java.lang.CharSequence[])"
So, basically, my problem was that I was passing String instead of String[]. That is, instead of:
q$sendKeys("rJava selenium")
I can use:
q$sendKeys(.jarray("rJava selenium"))
The more you know...

Can't create dplyr src backed by SparkSQL in dplyr.spark.hive package

Recently I found out about great dplyr.spark.hive package that enables dplyr frontend operations with spark or hive backend .
There is an information on how to install this package in package's README :
options(repos = c("http://r.piccolboni.info", unlist(options("repos"))))
install.packages("dplyr.spark.hive")
and there are also many examples on how to work with dplyr.spark.hive when one is already connected to hiveServer - check this.
But I am not able to connect to hiveServer, so I can not benefit from the great power of this package...
I've tried such commands, but they did not work out. Does anyone have any solution or comment on what am I doing wrong?
> library(dplyr.spark.hive,
+ lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
>
> Sys.setenv(SPARK_HOME = "/opt/spark-1.5.0-bin-hadoop2.4")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
>
> my_db = src_SparkSQL()
Error in .jfindClass(as.character(driverClass)[1]) : class not found
>
> my_db = src_SparkSQL(host = 'jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl',
+ port = 10000)
Error in .jfindClass(as.character(driverClass)[1]) : class not found
>
> my_db = src_SparkSQL(start.server = TRUE)
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
>
> my_db = src_SparkSQL(start.server = TRUE,
+ list(spark.num.executors='5', spark.executor.cores='5', master="yarn-client"))
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
EDIT 2
I have set more paths to system variables like this but now I receive a warning telling me that some kind of Java logging-configuration is not specified bu I think it is
> library(dplyr.spark.hive,
+ lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
3: package ‘SparkR’ was built under R version 3.2.1
>
> Sys.setenv(SPARK_HOME = "/opt/spark-1.5.0-bin-hadoop2.4")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HADOOP_HOME="/usr/share/hadoop")
> Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop")
> Sys.setenv(PATH='/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/share/hadoop/bin:/opt/hive/bin')
>
>
> my_db = src_SparkSQL()
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
My log properties are not empty.
-bash-4.2$ wc /etc/hadoop/log4j.properties
179 432 6581 /etc/hadoop/log4j.properties
EDIT 3
My exact call to the scr_SparkSQL() is
> detach("package:SparkR", unload=TRUE)
Warning message:
package ‘SparkR’ was built under R version 3.2.1
> detach("package:dplyr", unload=TRUE)
> library(dplyr.spark.hive, lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> my_db = src_SparkSQL()
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
And then the proces does not stop (never).
Where those settings work for beeline with such params:
beeline -u "jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl" -n mkosinski --outputformat=tsv --incremental=true -f sql_statement.sql > sql_output
but I am not able to pass user name and dbname to src_SparkSQL()
so I have tried to manual use the code from inside that function but I receive the sam problem that the below code also does not finish
host = 'tools-1.hadoop.srv'
port = 10000
driverclass = "org.apache.hive.jdbc.HiveDriver"
Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
library(RJDBC)
dr = JDBC(driverclass, Sys.getenv("HADOOP_JAR"))
url = paste0("jdbc:hive2://", host, ":", port)
class = "Hive"
con.class = paste0(class, "Connection") # class = "Hive"
# dbConnect_retry =
# function(dr, url, retry){
# if(retry > 0)
# tryCatch(
# dbConnect(drv = dr, url = url),
# error =
# function(e) {
# Sys.sleep(0.1)
# dbConnect_retry(dr = dr, url = url, retry - 1)})
# else dbConnect(drv = dr, url = url)}
#################
##con = new(con.class, dbConnect_retry(dr, url, retry = 100))
#################
con = new(con.class, dbConnect(dr, url, user = "mkosinski", dbname = "loghost"))
Maybe the url should containg also /loghost - the dbname?
I now see that you tried multiple things with multiple errors. Let me comment error by error.
my_db = src_SparkSQL()
Error in .jfindClass(as.character(driverClass)[1]) : class not found
The RJDBC object could not be created. Unless we solve this, nothing else will work, workarounds or not. Have you set HADOOP_JAR with, for instance,
Sys.setenv(HADOOP_JAR = "../spark/assembly/target/scala-2.10/spark-assembly-1.5.0-hadoop2.6.0.jar"). Sorry I seem to have skipped this in the instructions. Will fix.
my_db = src_SparkSQL(host = 'jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl',
+ port = 10000)
Error in .jfindClass(as.character(driverClass)[1]) : class not found
Same problem. Please note host port argument do not accept URL syntax, just host and port. URL is formed internally.
my_db = src_SparkSQL(start.server = TRUE)
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
Stop thriftserver first or connect to existing one, but you still have to fix the class not found problem.
my_db = src_SparkSQL(start.server = TRUE,
+ list(spark.num.executors='5', spark.executor.cores='5', master="yarn-client"))
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
Same as above.
Plan:
Set HADOOP_JAR. Find host and port of running thriftserver, if not default. Try src_SparkSQL with start.server = FALSE. If happy quit, else goto step 2
Stop existing thriftserver. Try again src_SparkSQL with start.server = TRUE
Let me know how things go.
There was a problem that I did't specify the proper classPath that was needed inside JDBC function that created a driver. Parameters to classPath in dplyr.spark.hive package are passed via HADOOP_JAR global variable.
To use JDBC as a driver to hiveServer2 (through the Thrift protocol) one need to add at least those 3 .jars with Java classes to create a proper driver
hive-jdbc-1.0.0-standalone.jar
hadoop/common/lib/commons-configuration-1.6.jar
hadoop/common/hadoop-common-2.4.1.jar
versions are arbitrary and should be compatible with the installed version of local hive, hadoop and hiveServer2.
They need to be set with the .Platform$path.sep (as described here)
classPath = c("system_path1_to_hive/hive/lib/hive-jdbc-1.0.0-standalone.jar",
"system_path1_to_hadoop/hadoop/common/lib/commons-configuration-1.6.jar",
"system_path1_to_hadoop/hadoop/common/hadoop-common-2.4.1.jar")
Sys.setenv(HADOOP_JAR= paste0(classPath, collapse=.Platform$path.sep)
Then when HADOOP_JAR is set one have to be carefull with hiveServer2 url. In my case it had to be
host = 'tools-1.hadoop.srv'
port = 10000
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
and finally the proper connection with hiveServer2 using RJDBC package is
Sys.setenv(HADOOP_HOME="/usr/share/hadoop/share/hadoop/common/")
Sys.setenv(HIVE_HOME = '/opt/hive/lib/')
host = 'tools-1.hadoop.srv'
port = 10000
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
driverclass = "org.apache.hive.jdbc.HiveDriver"
library(RJDBC)
.jinit()
dr2 = JDBC(driverclass,
classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
#"/opt/hive/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/hadoop-common-2.4.1.jar"),
identifier.quote = "`")
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
dbConnect(dr2, url, username = "mkosinski") -> cont

Building R package with Reference objects fails

I recently rewrote a package to use the new(er) R reference class objects. I've exported the three classes using export() in the NAMESPACE file so as far as I'm aware that should work. However when I test build the package I get an error at lazy loading stage:
** preparing package for lazy loading
Error in file(con, "rb") : invalid 'description' argument
ERROR: lazy loading failed for package ‘PACKAGE_NAME_HERE’
* removing ‘/Library/Frameworks/R.framework/Versions/3.0/Resources/library/PACKAGE_NAME_HERE’
I'm not sure what the problem is here. I don't know if it's relevant but the reference classes do store data on files in the tmp directory by having some fields set as accessor functions - I don't know if that's what s being complained about here when it says (con, "rb") which I guess is some connection thing. Does anybody have any ideas or advice for making sure reference classes get exported properly? My namespace is currently simple -
export(Main)
export(Mainseq)
export(Maintriplet)
Which are the three reference classes I exported by using #export tags in roxygen2.
What is it I'm doing (or not doing) that is throwing the lazy load error?
(ASIDE - I have no compiled code - all R, although the reference class methods do call some internal functions that are not exported, but these are supposed to be internal so I don't think I need to export them.
Thanks,
Ben.
EDIT:
My description file is as follows:
Package: HybRIDS
Type: Package
Title: Quick detection and dating of Recombinant Regions in DNA sequence data.
Version: 1.0
Date: 2013-03-13
Author: Ben J. Ward
Maintainer: Ben J. Ward <b.ward#uea.ac.uk>
Description: A simple R package for the quick detection and dating of Recombinant Regions in DNA sequence data.
License: GPL-2
Depends: ggplot2,grid,gridExtra,png,ape
I can't see what is wrong with this - the Depends are correct.
EDIT:
I've eliminated the first error with the description but I'm still getting the con error.
I think it's because the Mainseq class (which is nested in class Main) has some fields:
FullSequenceFile = "character",
FullSequence = function( value ) {
if( missing( value ) ){
as.character( read.dna( file = FullSequenceFile, format = "fasta", as.matrix = TRUE ) )
} else {
write.dna( value, file = FullSequenceFile, format = "fasta" )
}
},
InformativeSequenceFile = "character",
InformativeSequence = function( value ) {
if( missing( value ) ){
as.character( read.dna( file = InformativeSequenceFile, format = "fasta", as.matrix = TRUE ) )
} else {
write.dna( value, file = InformativeSequenceFile, format = "fasta" )
}
}
The idea being upon initialisation, the two character fields are filled with a path to a temp file in tmpdir, and when the variables are called or edited the files containing the variable data are read or written to. However it seems the variables are being accessed before this path is available because up package build the following happens:
** preparing package for lazy loading
Warning in file(con, "rb") :
cannot open file '/var/folders/kp/clkqvqn9739ffw2755zjwy74_skf_z/T//RtmpLB8ESC/FullSequenceaba52ac591f3': No such file or directory
Error in file(con, "rb") : cannot open the connection

Resources