rJava: failing to load awt.events - r

Right now, I try to implement some event methods of awt.events, however I cannot get it to load, since I always get an ClassNotFoundException error.
> library(rJava)
> .jinit()
[1] 0
> jEvents <- .jnew("java.awt.event")
Error in .jnew("java.awt.event") : java.lang.ClassNotFoundException
Edit:
Even if I try a specific class, I get an error message:
> library(rJava)
> .jinit()
> jEvents <- .jnew("java.awt.event.ActionEvent")
Error in .jnew("java.awt.event.ActionEvent") :
java.lang.NoSuchMethodError: <init>

It seems you are trying to instantiate package instead of class:
https://docs.oracle.com/javase/7/docs/api/java/awt/event/package-summary.html
Maybe you are looking for:
java.awt.event.ActionEvent
You can try this one:
library(rJava)
.jinit()
> EVT <- J("java.awt.event.ActionEvent")
> aEVT <- new(EVT, "StringObject", 1001L, "Hello")
> aEVT
[1] "Java-Object{java.awt.event.ActionEvent[ACTION_PERFORMED,cmd=Hello,when=0,modifiers=] on Str}"
You have to call the constructor with given parameters. Note that ActionEvent doesn't have default constructor.
You can find nice source here:
http://www.deducer.org/pmwiki/pmwiki.php?n=Main.Development#wwjoir

Related

rscopus package on R in MacBook - Invalid API key error

I am trying to use the Scopus API for the first time. I have the API key and the institution token. However, I am still getting an error, when I try to use it in R on my Mac. Here is my code:
library(rscopus)
set_api_key(MY_KEY)
hdr=inst_token_header(MY_TOKEN)
key=get_api_key()
print(rscopus::get_api_key(), reveal=TRUE)
have_api_key()
auth_info = process_author_name(last_name="Muschelli", first_name="John", verbose=FALSE)
The error message is:
> library(rscopus)
>
> set_api_key(MY_KEY)
> hdr=inst_token_header(MY_TOKEN)
> key=get_api_key()
> print(rscopus::get_api_key(), reveal=TRUE)
[1] "MY_KEY"
> have_api_key()
[1] TRUE
>
> if (have_api_key()) {
+ auth = elsevier_authenticate(api_key=key)
+ }
HTTP specified is: https://api.elsevier.com/authenticate
Warning message:
In elsevier_authenticate(api_key = key) : Forbidden (HTTP 403).
> auth_info = process_author_name(last_name="Muschelli", first_name="John", verbose=FALSE)
$`service-error`
$`service-error`$status
$`service-error`$status$statusCode
[1] "AUTHENTICATION_ERROR"
$`service-error`$status$statusText
[1] "Invalid API Key: valid apikey credentials required."
Error in get_complete_author_info(...) : Service Error
I tried
if (have_api_key()) {
auth = elsevier_authenticate(api_key=key)
}
but I get the error:
HTTP specified is: https://api.elsevier.com/authenticate
Warning message:
In elsevier_authenticate(api_key = key) : Forbidden (HTTP 403).
I have tried using auth_token_header(MY_TOKEN) instead of inst_token_header(MY_TOKEN) but the code is still not working.
I have also taken the following step in my terminal:
export Elsevier_API=MY_KEY > ~/.bash_profile
source ~/.bash_profile
I am still getting the error. However, the combination of key and institution token work here: https://dev.elsevier.com/scopus.html
Can anyone please help me debug this issue?
Thank You!
I figured out the issue. So, the correct way to query would be to pass the headers argument as well:
auth_info = process_author_name(last_name="Muschelli", first_name="John", verbose=FALSE, headers=hdr)
And now the code will work! :)

Use R Selenium to deal with pop-ups

How can I navigate to a link, click a link, and scrape data from there?
I tried this code, without success.
library("RSelenium")
startServer()
mybrowser <- remoteDriver()
mybrowser$open()
mybrowser$navigate("https://finance.yahoo.com/quote/SBUX/balance-sheet?p=SBUX")
# click 'Quarterly' button...
Something that is kind of close, is this.
Testing updated code now; results below.
> rm(list=ls())
>
>
> library("RSelenium")
> startServer()
Error: startServer is now defunct. Users in future can find the function in
file.path(find.package("RSelenium"), "examples/serverUtils"). The
recommended way to run a selenium server is via Docker. Alternatively
see the RSelenium::rsDriver function.
> mybrowser <- remoteDriver()
> mybrowser$open()
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused
> mybrowser$navigate("https://finance.yahoo.com/quote/SBUX/balance-sheet?p=SBUX")
Error in checkError(res) :
Undefined error in httr call. httr output: length(url) == 1 is not TRUE
> mybrowser$findElement("xpath", "//button[text() = '
+
+ OK
+ ']")$clickElement()
Error in checkError(res) :
Undefined error in httr call. httr output: length(url) == 1 is not TRUE
> mybrowser$findElement("xpath", "//span[text() = 'Quarterly']")$clickElement()
Error in checkError(res) :
Undefined error in httr call. httr output: length(url) == 1 is not TRUE
>
I think it might be the case you run into this on the website.
You can just "click" the OK button via:
mybrowser$findElement("xpath", "//button[text() = '
OK
']")$clickElement()
And then you can click "quarterly" via:
mybrowser$findElement("xpath", "//span[text() = 'Quarterly']")$clickElement()
(Hint: To identify these kind of errors it can be helpful to check the current state of the browser via: remDr$screenshot(TRUE).)
I am not sure it is up to date, but certain data is also available via the API, you could check the quantmod package to get an easier access.
Full example:
library("RSelenium")
startServer()
mybrowser <- remoteDriver()
mybrowser$open()
mybrowser$navigate("https://finance.yahoo.com/quote/SBUX/balance-sheet?p=SBUX")
mybrowser$findElement("xpath", "//button[text() = '
OK
']")$clickElement()
mybrowser$findElement("xpath", "//span[text() = 'Quarterly']")$clickElement()

H2o with predict_json: "Error: Could not find or load main class water.util.H2OPredictor"?

I tried to use the H2o predict_json in R,
h2o.predict_json(modelpath, jsondata)
and got the error message:
Error: Could not find or load main class water.util.H2OPredictor
I am using h2o_3.20.0.8.
I searched the documentation from H2o but didn't help.
> h2o.predict_json(modelpath, jsondata)
$error
[1] "Error: Could not find or load main class water.util.H2OPredictor"
Warning message:
In system2(java, args, stdout = TRUE, stderr = TRUE) :
running command ''java' -Xmx4g -cp .:/Library/Frameworks/R.framework/Versions/3.5/Resources/library/mylib/Models/h2o-genmodel.jar:/Library/Frameworks/R.framework/Versions/3.5/Resources/library/mylib/Models:genmodel.jar:/ water.util.H2OPredictor /Library/Frameworks/R.framework/Versions/3.5/Resources/library/mylib/Models/mymodel.zip '[{"da1":252,"da2":22,"da3":62,"da4":63,"da5":84.83}]' 2>&1' had status 1
It looks like you are missing your h2o-genmodel.jar file - this is what the error message Could not find or load main class water.util.H2OPredictor indicates. You may want to provide all the arguments to checkoff that you have everything:
h2o.predict_json(model, json, genmodelpath, labels, classpath, javaoptions)
documentation here

Can't create dplyr src backed by SparkSQL in dplyr.spark.hive package

Recently I found out about great dplyr.spark.hive package that enables dplyr frontend operations with spark or hive backend .
There is an information on how to install this package in package's README :
options(repos = c("http://r.piccolboni.info", unlist(options("repos"))))
install.packages("dplyr.spark.hive")
and there are also many examples on how to work with dplyr.spark.hive when one is already connected to hiveServer - check this.
But I am not able to connect to hiveServer, so I can not benefit from the great power of this package...
I've tried such commands, but they did not work out. Does anyone have any solution or comment on what am I doing wrong?
> library(dplyr.spark.hive,
+ lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
>
> Sys.setenv(SPARK_HOME = "/opt/spark-1.5.0-bin-hadoop2.4")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
>
> my_db = src_SparkSQL()
Error in .jfindClass(as.character(driverClass)[1]) : class not found
>
> my_db = src_SparkSQL(host = 'jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl',
+ port = 10000)
Error in .jfindClass(as.character(driverClass)[1]) : class not found
>
> my_db = src_SparkSQL(start.server = TRUE)
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
>
> my_db = src_SparkSQL(start.server = TRUE,
+ list(spark.num.executors='5', spark.executor.cores='5', master="yarn-client"))
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
EDIT 2
I have set more paths to system variables like this but now I receive a warning telling me that some kind of Java logging-configuration is not specified bu I think it is
> library(dplyr.spark.hive,
+ lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
3: package ‘SparkR’ was built under R version 3.2.1
>
> Sys.setenv(SPARK_HOME = "/opt/spark-1.5.0-bin-hadoop2.4")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HADOOP_HOME="/usr/share/hadoop")
> Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop")
> Sys.setenv(PATH='/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/share/hadoop/bin:/opt/hive/bin')
>
>
> my_db = src_SparkSQL()
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
My log properties are not empty.
-bash-4.2$ wc /etc/hadoop/log4j.properties
179 432 6581 /etc/hadoop/log4j.properties
EDIT 3
My exact call to the scr_SparkSQL() is
> detach("package:SparkR", unload=TRUE)
Warning message:
package ‘SparkR’ was built under R version 3.2.1
> detach("package:dplyr", unload=TRUE)
> library(dplyr.spark.hive, lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> my_db = src_SparkSQL()
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
And then the proces does not stop (never).
Where those settings work for beeline with such params:
beeline -u "jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl" -n mkosinski --outputformat=tsv --incremental=true -f sql_statement.sql > sql_output
but I am not able to pass user name and dbname to src_SparkSQL()
so I have tried to manual use the code from inside that function but I receive the sam problem that the below code also does not finish
host = 'tools-1.hadoop.srv'
port = 10000
driverclass = "org.apache.hive.jdbc.HiveDriver"
Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
library(RJDBC)
dr = JDBC(driverclass, Sys.getenv("HADOOP_JAR"))
url = paste0("jdbc:hive2://", host, ":", port)
class = "Hive"
con.class = paste0(class, "Connection") # class = "Hive"
# dbConnect_retry =
# function(dr, url, retry){
# if(retry > 0)
# tryCatch(
# dbConnect(drv = dr, url = url),
# error =
# function(e) {
# Sys.sleep(0.1)
# dbConnect_retry(dr = dr, url = url, retry - 1)})
# else dbConnect(drv = dr, url = url)}
#################
##con = new(con.class, dbConnect_retry(dr, url, retry = 100))
#################
con = new(con.class, dbConnect(dr, url, user = "mkosinski", dbname = "loghost"))
Maybe the url should containg also /loghost - the dbname?
I now see that you tried multiple things with multiple errors. Let me comment error by error.
my_db = src_SparkSQL()
Error in .jfindClass(as.character(driverClass)[1]) : class not found
The RJDBC object could not be created. Unless we solve this, nothing else will work, workarounds or not. Have you set HADOOP_JAR with, for instance,
Sys.setenv(HADOOP_JAR = "../spark/assembly/target/scala-2.10/spark-assembly-1.5.0-hadoop2.6.0.jar"). Sorry I seem to have skipped this in the instructions. Will fix.
my_db = src_SparkSQL(host = 'jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl',
+ port = 10000)
Error in .jfindClass(as.character(driverClass)[1]) : class not found
Same problem. Please note host port argument do not accept URL syntax, just host and port. URL is formed internally.
my_db = src_SparkSQL(start.server = TRUE)
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
Stop thriftserver first or connect to existing one, but you still have to fix the class not found problem.
my_db = src_SparkSQL(start.server = TRUE,
+ list(spark.num.executors='5', spark.executor.cores='5', master="yarn-client"))
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
Same as above.
Plan:
Set HADOOP_JAR. Find host and port of running thriftserver, if not default. Try src_SparkSQL with start.server = FALSE. If happy quit, else goto step 2
Stop existing thriftserver. Try again src_SparkSQL with start.server = TRUE
Let me know how things go.
There was a problem that I did't specify the proper classPath that was needed inside JDBC function that created a driver. Parameters to classPath in dplyr.spark.hive package are passed via HADOOP_JAR global variable.
To use JDBC as a driver to hiveServer2 (through the Thrift protocol) one need to add at least those 3 .jars with Java classes to create a proper driver
hive-jdbc-1.0.0-standalone.jar
hadoop/common/lib/commons-configuration-1.6.jar
hadoop/common/hadoop-common-2.4.1.jar
versions are arbitrary and should be compatible with the installed version of local hive, hadoop and hiveServer2.
They need to be set with the .Platform$path.sep (as described here)
classPath = c("system_path1_to_hive/hive/lib/hive-jdbc-1.0.0-standalone.jar",
"system_path1_to_hadoop/hadoop/common/lib/commons-configuration-1.6.jar",
"system_path1_to_hadoop/hadoop/common/hadoop-common-2.4.1.jar")
Sys.setenv(HADOOP_JAR= paste0(classPath, collapse=.Platform$path.sep)
Then when HADOOP_JAR is set one have to be carefull with hiveServer2 url. In my case it had to be
host = 'tools-1.hadoop.srv'
port = 10000
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
and finally the proper connection with hiveServer2 using RJDBC package is
Sys.setenv(HADOOP_HOME="/usr/share/hadoop/share/hadoop/common/")
Sys.setenv(HIVE_HOME = '/opt/hive/lib/')
host = 'tools-1.hadoop.srv'
port = 10000
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
driverclass = "org.apache.hive.jdbc.HiveDriver"
library(RJDBC)
.jinit()
dr2 = JDBC(driverclass,
classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
#"/opt/hive/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/hadoop-common-2.4.1.jar"),
identifier.quote = "`")
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
dbConnect(dr2, url, username = "mkosinski") -> cont

How to check platform in .onLoad in the R package

I am trying to check if the package is run on Windows during the loading of the package and load some additional files. For some reason this doesn't work (added to my zzz.R):
.onLoad <- function(libname, pkgname){
if(.Platform$OS.type == "windows") {
# do specific task here
}
}
How to make it work, is there a better way to do it?
EDIT to correct a wrong previous answer
After loading, loadNamespace looks for a hook function named .onLoad and calls it. The functions loadNamespace is usually called implicitly when library is used to load a package. However you can be useful at times to call these functions directly. So the following(essentially your code) works for me under windows platform.
.onLoad <- function(libname, pkgname ) {
if(.Platform$OS.type == "windows") print("windows")
else print("others")
}
Then you test it using:
loadNamespace("loadpkg")
[1] "windows"
<environment: namespace:loadpkg>
Note also you can have the same thing using library, but you should unload the namespace before (I guess it checks if the package is already loaded and don't call all hooks):
unloadNamespace("loadpkg")
library("loadpkg")
[1] "windows"
don't use this , it is a wrong but accepted solution
You should initialize function parameters:
.onLoad <- function(libname = find.package(pkg_name), pkgname = pkg_name){
if(.Platform$OS.type == "windows") {
# do specific task here
}
}

Resources