I'm building a Jenkins Job to test a library.
I'm doing most of the scripts work in the r-base docker image.
When I try to install dev tools via Rscript -e
dir.create(Sys.getenv("R_LIBS_USER"), recursive = TRUE) # create personal library
.libPaths(Sys.getenv("R_LIBS_USER")) # add to the path
install.packages("devtools")
install_allstate_github <- function(user, repo, ...) {
tmp_dir <- "$app_dir/ravenclaw"
devtools::install(tmp_dir, dependencies = TRUE)
}
install_allstate_github("SortingHat", "ravenclaw")
library(ravenclaw)
I get the following error
Warning message:
In dir.create(Sys.getenv("R_LIBS_USER"), recursive = TRUE) :
'/export/home/compjenk/workspace/ISG-Sorting-Hat/Ravenclaw/r-libs' already exists
Installing package into ���/export/home/compjenk/workspace/ISG-Sorting-Hat/Ravenclaw/r-libs���
(as ���lib��� is unspecified)
Warning: unable to access index for repository https://****:****#artifactory.allstate.com/artifactory/cran/src/contrib:
cannot open URL 'https://****:****#artifactory.allstate.com/artifactory/cran/src/contrib/PACKAGES'
Warning messages:
1: In gzfile(file, "rb") :
cannot open compressed file '/tmp/RtmpQGoABw/repos_https://****:****#artifactory.allstate.com/artifactory/cran/src/contrib.rds', probable reason 'No such file or directory'
2: package ���devtools��� is not available (for R version 4.0.2)
Error in loadNamespace(name) : there is no package called ���devtools���
Calls: install_allstate_github ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted
I've set the corporate CRAN server in .Rprofile as well as Proxy information but it doesn't seem to help.
.Rprofile
local({
r <- list("cran" = "https://****:****#artifactory.allstate.com/artifactory/cran/")
options(repos = r)
options(RCurlOptions = list(ssl.verifypeer = FALSE,
ssl.verify = FALSE,
proxy = "http://webproxy:8080"))
#set_config( config( ssl_verifypeer = 0L ) )
Sys.setenv(http_proxy = "http://webproxy:8080")
Sys.setenv(https_proxy = "https://webproxy:8080")
So What am I missing or doing wrong? why can't I install devTools?
Related
I am trying to install the ggplot2 package I am getting the following errors & warnings.
Warning: failed to download mirrors file (cannot open URL 'https://cran.r-project.org/CRAN_mirrors.csv'); using local file 'C:/PROGRA~1/R/R-42~1.1/doc/CRAN_mirrors.csv'
Error in contrib.url(repos, type) :
trying to use CRAN without setting a mirror
In addition: Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
URL 'https://cran.r-project.org/CRAN_mirrors.csv': status was 'Couldn't connect to server'
> utils:::menuInstallPkgs()
--- Please select a CRAN mirror for use in this session ---
Warning: failed to download mirrors file (cannot open URL 'https://cran.r-project.org/CRAN_mirrors.csv'); using local file 'C:/PROGRA~1/R/R-42~1.1/doc/CRAN_mirrors.csv'
Warning: unable to access index for repository https://repo.miserver.it.umich.edu/cran/src/contrib:
cannot open URL 'https://repo.miserver.it.umich.edu/cran/src/contrib/PACKAGES'
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) :
argument "pkgs" is missing, with no default
In addition: Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
URL 'https://cran.r-project.org/CRAN_mirrors.csv': status was 'SSL connect error'
> utils:::menuInstallPkgs()
Warning: unable to access index for repository https://repo.miserver.it.umich.edu/cran/src/contrib:
cannot open URL 'https://repo.miserver.it.umich.edu/cran/src/contrib/PACKAGES'
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) :
argument "pkgs" is missing, with no default
> utils:::menuInstallPkgs()
Warning: unable to access index for repository https://repo.miserver.it.umich.edu/cran/src/contrib:
cannot open URL 'https://repo.miserver.it.umich.edu/cran/src/contrib/PACKAGES'
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) :
argument "pkgs" is missing, with no default
> utils:::menuInstallPkgs()
Warning: unable to access index for repository https://repo.miserver.it.umich.edu/cran/src/contrib:
cannot open URL 'https://repo.miserver.it.umich.edu/cran/src/contrib/PACKAGES'
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) :
argument "pkgs" is missing, with no default
> utils:::menuInstallLocal()
> utils:::menuInstallLocal()
Error in utils:::menuInstallLocal() :
Only '*.zip' and '*.tar.gz' files can be installed.
> utils:::menuInstallLocal()
ERROR: dependencies 'digest', 'glue', 'gtable', 'isoband', 'rlang', 'scales', 'tibble', 'withr' are not available for package 'ggplot2'
* removing 'C:/Users/S/AppData/Local/R/win-library/4.2/ggplot2'
Warning message:
In install.packages(files[tarballs], .libPaths()[1L], repos = NULL, :
installation of package ‘C:/Users/S/Documents/ggplot2_3.3.6.tar.gz’ had non-zero exit status
> utils:::menuInstallPkgs()
Warning: unable to access index for repository https://repo.miserver.it.umich.edu/cran/src/contrib:
cannot open URL 'https://repo.miserver.it.umich.edu/cran/src/contrib/PACKAGES'
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) :
argument "pkgs" is missing, with no default
I'm trying to downgrade the package version of officer that I have from the most recent (v 0.3.12) to v 0.3.8.
I tried running the below lines, all of which failed:
What am I doing wrong?
require(devtools)
install_version("officer", version = "0.3.8")
---Error: Failed to install 'unknown package' from URL:
(converted from warning) package ‘officer’ is in use and will not be installed
require(devtools)
install_version("officer", version = "0.3.8", repos = "https://cran.r-project.org/web/packages/officer/index.html")
--- cannot open URL 'https://cran.r-project.org/web/packages/officer/index.html/src/contrib/PACKAGES'
Error in package_find_repo(package, repos) :
could not find package 'officer'
require(devtools)
install_version("officer", version = "0.3.8", repos = "https://davidgohel.github.io/officer/")
--- cannot open URL 'https://davidgohel.github.io/officer/src/contrib/PACKAGES'
Error in package_find_repo(package, repos) :
could not find package 'officer'
library(devtools)
devtools::install_github(davidgohel/officer, ref = "officer_0.3.8")
---Error in lapply(repo, github_remote, ref = ref, subdir = subdir, auth_token = auth_token, :
object 'davidgohel' not found
devtools::install_github("davidgohel/officer", ref = "0.3.8")
---Error in utils::download.file(url, path, method = method, quiet = quiet, :
cannot open URL 'https://api.github.com/repos/davidgohel/officer/tarball/0.3.8
Error in compileCode(f, code, language = language, verbose = verbose) :
Compilation ERROR, function(s)/method(s) not created! Error in .shlib_internal(commandArgs(TRUE)) :
C++14 standard requested but CXX14 is not defined
Calls: <Anonymous> -> .shlib_internal
Execution halted
In addition: Warning message:
In system(cmd, intern = !verbose) :
running command 'C:/PROGRA~1/R/R-36~1.0/bin/x64/R CMD SHLIB file1a1860a0379.cpp 2> file1a1860a0379.cpp.err.txt' had status 1
Error in sink(type = "output") : invalid connection
Some non-English page said that we can overcome by executing the following R script, but it did not work in my case:
dotR <- file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR))
dir.create(dotR)
M <- file.path(dotR, "Makevars")
if (!file.exists(M))
file.create(M)
cat("\nCXX14FLAGS=-O3 -Wno-unused-variable -Wno-unused-function",
"CXX14 = g++ -std=c++1y",
file = M, sep = "\n", append = TRUE)
The above R script is same as in the following page:
https://github.com/stan-dev/rstan/issues/569
I tried to uninstall and install according to the following page, but the above error occurred.
Rstan installation: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
REF; https://github.com/stan-dev/stan/issues/1613
ref: https://github.com/stan-dev/rstan/issues/633
For me, the issue has been solved by adding manually the following line into the file .R/Makevars.win.
CXX14 = "C:\Rtools\mingw_64\bin\g++.exe"
On a new installation of R on Windows 10, I'm having a problem building vignettes for any package. I can manually use knitr to build my vignettes, but when running any of these commands:
devtools::install_github("hadley/devtools", build_vignettes = TRUE, force = TRUE)
devtools::check(document = FALSE)
devtools::install(build_vignettes = TRUE)
While my example URL had devtools, I get identical error from other packages - e.g.
devtools::install_github("CUD2V/pccc", build_vignettes = TRUE, force = TRUE)
I get output like the following:
devtools::install_github("hadley/devtools", build_vignettes = TRUE, force = TRUE)
Downloading GitHub repo hadley/devtools#master
from URL https://api.github.com/repos/hadley/devtools/zipball/master
Installing devtools
Downloading GitHub repo r-hub/rhub#master
from URL https://api.github.com/repos/r-hub/rhub/zipball/master
Installing rhub
Installation failed: install_packages(package_name, repos = remote$repos, type = remote$pkg_type, dependencies = NA, ..., quiet = quiet, out_dir = out_dir, skip_if_log_exists = skip_if_log_exists) : formal argument "repos" matched by multiple actual arguments
Installation failed: install_packages(package_name, repos = remote$repos, type = remote$pkg_type, dependencies = NA, ..., quiet = quiet, out_dir = out_dir, skip_if_log_exists = skip_if_log_exists) : formal argument "repos" matched by multiple actual arguments
Installation failed: install_packages(package_name, repos = remote$repos, type = remote$pkg_type, dependencies = NA, ..., quiet = quiet, out_dir = out_dir, skip_if_log_exists = skip_if_log_exists) : formal argument "repos" matched by multiple actual arguments
Installation failed: install_packages(package_name, repos = remote$repos, type = remote$pkg_type, dependencies = NA, ..., quiet = quiet, out_dir = out_dir, skip_if_log_exists = skip_if_log_exists) : formal argument "repos" matched by multiple actual arguments
Running command C:/PROGRA~1/R/R-34~1.3/bin/x64/Rcmd.exe
Arguments:
INSTALL
C:/Users/username/AppData/Local/Temp/RtmpIJj3Sj/devtools33e84f791ef2/r-hub-rhub-352458b
--library=C:/Users/username/Documents/R/win-library/3.4
--install-tests
ERROR: dependencies 'parsedate', 'prettyunits', 'rappdirs', 'whoami' are not available for package 'rhub'
* removing 'C:/Users/username/Documents/R/win-library/3.4/rhub'
In R CMD INSTALL
Installation failed: run(bin, args = real_cmdargs, stdout_line_callback = real_callback(stdout), stderr_line_callback = real_callback(stderr), stdout_callback = real_block_callback, stderr_callback = real_block_callback, echo_cmd = echo, echo = show, spinner = spinner, error_on_status = fail_on_status, timeout = timeout) : System command error
Running command C:/PROGRA~1/R/R-34~1.3/bin/x64/Rcmd.exe
Arguments:
build
C:\Users\username\AppData\Local\Temp\RtmpIJj3Sj\devtools33e814393089\hadley-devtools-7f5a683
--no-resave-data
--no-manual
* checking for file 'C:\Users\username\AppData\Local\Temp\RtmpIJj3Sj\devtools33e814393089\hadley-devtools-7f5a683/DESCRIPTION' ... OK
* preparing 'devtools':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ...Warning: running command '"C:/PROGRA~1/R/R-34~1.3/bin/x64/Rscript" --vanilla --default-packages= -e "tools::buildVignettes(dir = '.', tangle = TRUE)"' had status 1
ERROR
Error: file 'C:/Users/username/AppData/Local/Temp/RtmpU3YzGM/Rbuild2b8017d6415/devtools/DESCRIPTION' is not in valid DCF format
In addition: Warning message:
In read.dcf(dfile, keep.white = .keep_white_description_fields) :
cannot open compressed file 'C:/Users/username/AppData/Local/Temp/RtmpU3YzGM/Rbuild2b8017d6415/devtools/DESCRIPTION', probable reason 'Permission denied'
Execution halted
Installation failed: run(bin, args = real_cmdargs, stdout_line_callback = real_callback(stdout), stderr_line_callback = real_callback(stderr), stdout_callback = real_block_callback, stderr_callback = real_block_callback, echo_cmd = echo, echo = show, spinner = spinner, error_on_status = fail_on_status, timeout = timeout) : System command error
It appears that although the DESCRIPTION file specifies to use knitr, devtools::install_github and devtools::check is using tools::buildVignettes.
Also, if just running
devtools::build_vignettes
No error is received and vignettes are built.
I also have a macOS machine running current version of R and a Windows 2012 Server running R 3.3.x and both can build vignettes and install packages without getting a similar error.
How do I work around these build errors?
#alistaire noticed the dependency errors - after resolving those, I'm still getting the same problem building packages with vignettes, checking packages, etc.:
> devtools::install_github("hadley/devtools", build_vignettes = TRUE, force = TRUE)
Downloading GitHub repo hadley/devtools#master
from URL https://api.github.com/repos/hadley/devtools/zipball/master
Installing devtools
Running command C:/PROGRA~1/R/R-34~1.3/bin/x64/Rcmd.exe
Arguments:
build
C:\Users\username\AppData\Local\Temp\RtmpQpK5Yi\devtools25f4138c52c3\hadley-devtools-7f5a683
--no-resave-data
--no-manual
* checking for file 'C:\Users\username\AppData\Local\Temp\RtmpQpK5Yi\devtools25f4138c52c3\hadley-devtools-7f5a683/DESCRIPTION' ... OK
* preparing 'devtools':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ...Warning: running command '"C:/PROGRA~1/R/R-34~1.3/bin/x64/Rscript" --vanilla --default-packages= -e "tools::buildVignettes(dir = '.', tangle = TRUE)"' had status 1
ERROR
Error: file 'C:/Users/username/AppData/Local/Temp/RtmpKe9Atv/Rbuild24501e281f1c/devtools/DESCRIPTION' is not in valid DCF format
In addition: Warning message:
In read.dcf(dfile, keep.white = .keep_white_description_fields) :
cannot open compressed file 'C:/Users/username/AppData/Local/Temp/RtmpKe9Atv/Rbuild24501e281f1c/devtools/DESCRIPTION', probable reason 'Permission denied'
Execution halted
Installation failed: run(bin, args = real_cmdargs, stdout_line_callback = real_callback(stdout), stderr_line_callback = real_callback(stderr), stdout_callback = real_block_callback, stderr_callback = real_block_callback, echo_cmd = echo, echo = show, spinner = spinner, error_on_status = fail_on_status, timeout = timeout) : System command error
Although the message references permission denied, these errors appear to occur regardless of user permissions. I've run as my regular user and as Administrator.
Recently I found out about great dplyr.spark.hive package that enables dplyr frontend operations with spark or hive backend .
There is an information on how to install this package in package's README :
options(repos = c("http://r.piccolboni.info", unlist(options("repos"))))
install.packages("dplyr.spark.hive")
and there are also many examples on how to work with dplyr.spark.hive when one is already connected to hiveServer - check this.
But I am not able to connect to hiveServer, so I can not benefit from the great power of this package...
I've tried such commands, but they did not work out. Does anyone have any solution or comment on what am I doing wrong?
> library(dplyr.spark.hive,
+ lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
>
> Sys.setenv(SPARK_HOME = "/opt/spark-1.5.0-bin-hadoop2.4")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
>
> my_db = src_SparkSQL()
Error in .jfindClass(as.character(driverClass)[1]) : class not found
>
> my_db = src_SparkSQL(host = 'jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl',
+ port = 10000)
Error in .jfindClass(as.character(driverClass)[1]) : class not found
>
> my_db = src_SparkSQL(start.server = TRUE)
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
>
> my_db = src_SparkSQL(start.server = TRUE,
+ list(spark.num.executors='5', spark.executor.cores='5', master="yarn-client"))
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
EDIT 2
I have set more paths to system variables like this but now I receive a warning telling me that some kind of Java logging-configuration is not specified bu I think it is
> library(dplyr.spark.hive,
+ lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
3: package ‘SparkR’ was built under R version 3.2.1
>
> Sys.setenv(SPARK_HOME = "/opt/spark-1.5.0-bin-hadoop2.4")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HADOOP_HOME="/usr/share/hadoop")
> Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop")
> Sys.setenv(PATH='/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/share/hadoop/bin:/opt/hive/bin')
>
>
> my_db = src_SparkSQL()
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
My log properties are not empty.
-bash-4.2$ wc /etc/hadoop/log4j.properties
179 432 6581 /etc/hadoop/log4j.properties
EDIT 3
My exact call to the scr_SparkSQL() is
> detach("package:SparkR", unload=TRUE)
Warning message:
package ‘SparkR’ was built under R version 3.2.1
> detach("package:dplyr", unload=TRUE)
> library(dplyr.spark.hive, lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> my_db = src_SparkSQL()
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
And then the proces does not stop (never).
Where those settings work for beeline with such params:
beeline -u "jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl" -n mkosinski --outputformat=tsv --incremental=true -f sql_statement.sql > sql_output
but I am not able to pass user name and dbname to src_SparkSQL()
so I have tried to manual use the code from inside that function but I receive the sam problem that the below code also does not finish
host = 'tools-1.hadoop.srv'
port = 10000
driverclass = "org.apache.hive.jdbc.HiveDriver"
Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
library(RJDBC)
dr = JDBC(driverclass, Sys.getenv("HADOOP_JAR"))
url = paste0("jdbc:hive2://", host, ":", port)
class = "Hive"
con.class = paste0(class, "Connection") # class = "Hive"
# dbConnect_retry =
# function(dr, url, retry){
# if(retry > 0)
# tryCatch(
# dbConnect(drv = dr, url = url),
# error =
# function(e) {
# Sys.sleep(0.1)
# dbConnect_retry(dr = dr, url = url, retry - 1)})
# else dbConnect(drv = dr, url = url)}
#################
##con = new(con.class, dbConnect_retry(dr, url, retry = 100))
#################
con = new(con.class, dbConnect(dr, url, user = "mkosinski", dbname = "loghost"))
Maybe the url should containg also /loghost - the dbname?
I now see that you tried multiple things with multiple errors. Let me comment error by error.
my_db = src_SparkSQL()
Error in .jfindClass(as.character(driverClass)[1]) : class not found
The RJDBC object could not be created. Unless we solve this, nothing else will work, workarounds or not. Have you set HADOOP_JAR with, for instance,
Sys.setenv(HADOOP_JAR = "../spark/assembly/target/scala-2.10/spark-assembly-1.5.0-hadoop2.6.0.jar"). Sorry I seem to have skipped this in the instructions. Will fix.
my_db = src_SparkSQL(host = 'jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl',
+ port = 10000)
Error in .jfindClass(as.character(driverClass)[1]) : class not found
Same problem. Please note host port argument do not accept URL syntax, just host and port. URL is formed internally.
my_db = src_SparkSQL(start.server = TRUE)
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
Stop thriftserver first or connect to existing one, but you still have to fix the class not found problem.
my_db = src_SparkSQL(start.server = TRUE,
+ list(spark.num.executors='5', spark.executor.cores='5', master="yarn-client"))
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
Same as above.
Plan:
Set HADOOP_JAR. Find host and port of running thriftserver, if not default. Try src_SparkSQL with start.server = FALSE. If happy quit, else goto step 2
Stop existing thriftserver. Try again src_SparkSQL with start.server = TRUE
Let me know how things go.
There was a problem that I did't specify the proper classPath that was needed inside JDBC function that created a driver. Parameters to classPath in dplyr.spark.hive package are passed via HADOOP_JAR global variable.
To use JDBC as a driver to hiveServer2 (through the Thrift protocol) one need to add at least those 3 .jars with Java classes to create a proper driver
hive-jdbc-1.0.0-standalone.jar
hadoop/common/lib/commons-configuration-1.6.jar
hadoop/common/hadoop-common-2.4.1.jar
versions are arbitrary and should be compatible with the installed version of local hive, hadoop and hiveServer2.
They need to be set with the .Platform$path.sep (as described here)
classPath = c("system_path1_to_hive/hive/lib/hive-jdbc-1.0.0-standalone.jar",
"system_path1_to_hadoop/hadoop/common/lib/commons-configuration-1.6.jar",
"system_path1_to_hadoop/hadoop/common/hadoop-common-2.4.1.jar")
Sys.setenv(HADOOP_JAR= paste0(classPath, collapse=.Platform$path.sep)
Then when HADOOP_JAR is set one have to be carefull with hiveServer2 url. In my case it had to be
host = 'tools-1.hadoop.srv'
port = 10000
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
and finally the proper connection with hiveServer2 using RJDBC package is
Sys.setenv(HADOOP_HOME="/usr/share/hadoop/share/hadoop/common/")
Sys.setenv(HIVE_HOME = '/opt/hive/lib/')
host = 'tools-1.hadoop.srv'
port = 10000
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
driverclass = "org.apache.hive.jdbc.HiveDriver"
library(RJDBC)
.jinit()
dr2 = JDBC(driverclass,
classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
#"/opt/hive/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/hadoop-common-2.4.1.jar"),
identifier.quote = "`")
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
dbConnect(dr2, url, username = "mkosinski") -> cont