Unable to Install devtools in R on Jenkins - r

I'm building a Jenkins Job to test a library.
I'm doing most of the scripts work in the r-base docker image.
When I try to install dev tools via Rscript -e
dir.create(Sys.getenv("R_LIBS_USER"), recursive = TRUE) # create personal library
.libPaths(Sys.getenv("R_LIBS_USER")) # add to the path
install.packages("devtools")
install_allstate_github <- function(user, repo, ...) {
tmp_dir <- "$app_dir/ravenclaw"
devtools::install(tmp_dir, dependencies = TRUE)
}
install_allstate_github("SortingHat", "ravenclaw")
library(ravenclaw)
I get the following error
Warning message:
In dir.create(Sys.getenv("R_LIBS_USER"), recursive = TRUE) :
'/export/home/compjenk/workspace/ISG-Sorting-Hat/Ravenclaw/r-libs' already exists
Installing package into ���/export/home/compjenk/workspace/ISG-Sorting-Hat/Ravenclaw/r-libs���
(as ���lib��� is unspecified)
Warning: unable to access index for repository https://****:****#artifactory.allstate.com/artifactory/cran/src/contrib:
cannot open URL 'https://****:****#artifactory.allstate.com/artifactory/cran/src/contrib/PACKAGES'
Warning messages:
1: In gzfile(file, "rb") :
cannot open compressed file '/tmp/RtmpQGoABw/repos_https://****:****#artifactory.allstate.com/artifactory/cran/src/contrib.rds', probable reason 'No such file or directory'
2: package ���devtools��� is not available (for R version 4.0.2)
Error in loadNamespace(name) : there is no package called ���devtools���
Calls: install_allstate_github ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted
I've set the corporate CRAN server in .Rprofile as well as Proxy information but it doesn't seem to help.
.Rprofile
local({
r <- list("cran" = "https://****:****#artifactory.allstate.com/artifactory/cran/")
options(repos = r)
options(RCurlOptions = list(ssl.verifypeer = FALSE,
ssl.verify = FALSE,
proxy = "http://webproxy:8080"))
#set_config( config( ssl_verifypeer = 0L ) )
Sys.setenv(http_proxy = "http://webproxy:8080")
Sys.setenv(https_proxy = "https://webproxy:8080")
So What am I missing or doing wrong? why can't I install devTools?

Related

Can't install the ggplot2 package

I am trying to install the ggplot2 package I am getting the following errors & warnings.
Warning: failed to download mirrors file (cannot open URL 'https://cran.r-project.org/CRAN_mirrors.csv'); using local file 'C:/PROGRA~1/R/R-42~1.1/doc/CRAN_mirrors.csv'
Error in contrib.url(repos, type) :
trying to use CRAN without setting a mirror
In addition: Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
URL 'https://cran.r-project.org/CRAN_mirrors.csv': status was 'Couldn't connect to server'
> utils:::menuInstallPkgs()
--- Please select a CRAN mirror for use in this session ---
Warning: failed to download mirrors file (cannot open URL 'https://cran.r-project.org/CRAN_mirrors.csv'); using local file 'C:/PROGRA~1/R/R-42~1.1/doc/CRAN_mirrors.csv'
Warning: unable to access index for repository https://repo.miserver.it.umich.edu/cran/src/contrib:
cannot open URL 'https://repo.miserver.it.umich.edu/cran/src/contrib/PACKAGES'
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) :
argument "pkgs" is missing, with no default
In addition: Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
URL 'https://cran.r-project.org/CRAN_mirrors.csv': status was 'SSL connect error'
> utils:::menuInstallPkgs()
Warning: unable to access index for repository https://repo.miserver.it.umich.edu/cran/src/contrib:
cannot open URL 'https://repo.miserver.it.umich.edu/cran/src/contrib/PACKAGES'
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) :
argument "pkgs" is missing, with no default
> utils:::menuInstallPkgs()
Warning: unable to access index for repository https://repo.miserver.it.umich.edu/cran/src/contrib:
cannot open URL 'https://repo.miserver.it.umich.edu/cran/src/contrib/PACKAGES'
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) :
argument "pkgs" is missing, with no default
> utils:::menuInstallPkgs()
Warning: unable to access index for repository https://repo.miserver.it.umich.edu/cran/src/contrib:
cannot open URL 'https://repo.miserver.it.umich.edu/cran/src/contrib/PACKAGES'
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) :
argument "pkgs" is missing, with no default
> utils:::menuInstallLocal()
> utils:::menuInstallLocal()
Error in utils:::menuInstallLocal() :
Only '*.zip' and '*.tar.gz' files can be installed.
> utils:::menuInstallLocal()
ERROR: dependencies 'digest', 'glue', 'gtable', 'isoband', 'rlang', 'scales', 'tibble', 'withr' are not available for package 'ggplot2'
* removing 'C:/Users/S/AppData/Local/R/win-library/4.2/ggplot2'
Warning message:
In install.packages(files[tarballs], .libPaths()[1L], repos = NULL, :
installation of package ‘C:/Users/S/Documents/ggplot2_3.3.6.tar.gz’ had non-zero exit status
> utils:::menuInstallPkgs()
Warning: unable to access index for repository https://repo.miserver.it.umich.edu/cran/src/contrib:
cannot open URL 'https://repo.miserver.it.umich.edu/cran/src/contrib/PACKAGES'
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) :
argument "pkgs" is missing, with no default

How to downgrade an installed package version?

I'm trying to downgrade the package version of officer that I have from the most recent (v 0.3.12) to v 0.3.8.
I tried running the below lines, all of which failed:
What am I doing wrong?
require(devtools)
install_version("officer", version = "0.3.8")
---Error: Failed to install 'unknown package' from URL:
(converted from warning) package ‘officer’ is in use and will not be installed
require(devtools)
install_version("officer", version = "0.3.8", repos = "https://cran.r-project.org/web/packages/officer/index.html")
--- cannot open URL 'https://cran.r-project.org/web/packages/officer/index.html/src/contrib/PACKAGES'
Error in package_find_repo(package, repos) :
could not find package 'officer'
require(devtools)
install_version("officer", version = "0.3.8", repos = "https://davidgohel.github.io/officer/")
--- cannot open URL 'https://davidgohel.github.io/officer/src/contrib/PACKAGES'
Error in package_find_repo(package, repos) :
could not find package 'officer'
library(devtools)
devtools::install_github(davidgohel/officer, ref = "officer_0.3.8")
---Error in lapply(repo, github_remote, ref = ref, subdir = subdir, auth_token = auth_token, :
object 'davidgohel' not found
devtools::install_github("davidgohel/officer", ref = "0.3.8")
---Error in utils::download.file(url, path, method = method, quiet = quiet, :
cannot open URL 'https://api.github.com/repos/davidgohel/officer/tarball/0.3.8

Error on the stan file compilation using R 3.6.0. and Win 10

Error in compileCode(f, code, language = language, verbose = verbose) :
Compilation ERROR, function(s)/method(s) not created! Error in .shlib_internal(commandArgs(TRUE)) :
C++14 standard requested but CXX14 is not defined
Calls: <Anonymous> -> .shlib_internal
Execution halted
In addition: Warning message:
In system(cmd, intern = !verbose) :
running command 'C:/PROGRA~1/R/R-36~1.0/bin/x64/R CMD SHLIB file1a1860a0379.cpp 2> file1a1860a0379.cpp.err.txt' had status 1
Error in sink(type = "output") : invalid connection
Some non-English page said that we can overcome by executing the following R script, but it did not work in my case:
dotR <- file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR))
dir.create(dotR)
M <- file.path(dotR, "Makevars")
if (!file.exists(M))
file.create(M)
cat("\nCXX14FLAGS=-O3 -Wno-unused-variable -Wno-unused-function",
"CXX14 = g++ -std=c++1y",
file = M, sep = "\n", append = TRUE)
The above R script is same as in the following page:
https://github.com/stan-dev/rstan/issues/569
I tried to uninstall and install according to the following page, but the above error occurred.
Rstan installation: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
REF; https://github.com/stan-dev/stan/issues/1613
ref: https://github.com/stan-dev/rstan/issues/633
For me, the issue has been solved by adding manually the following line into the file .R/Makevars.win.
CXX14 = "C:\Rtools\mingw_64\bin\g++.exe"

Error building vignettes for R package

On a new installation of R on Windows 10, I'm having a problem building vignettes for any package. I can manually use knitr to build my vignettes, but when running any of these commands:
devtools::install_github("hadley/devtools", build_vignettes = TRUE, force = TRUE)
devtools::check(document = FALSE)
devtools::install(build_vignettes = TRUE)
While my example URL had devtools, I get identical error from other packages - e.g.
devtools::install_github("CUD2V/pccc", build_vignettes = TRUE, force = TRUE)
I get output like the following:
devtools::install_github("hadley/devtools", build_vignettes = TRUE, force = TRUE)
Downloading GitHub repo hadley/devtools#master
from URL https://api.github.com/repos/hadley/devtools/zipball/master
Installing devtools
Downloading GitHub repo r-hub/rhub#master
from URL https://api.github.com/repos/r-hub/rhub/zipball/master
Installing rhub
Installation failed: install_packages(package_name, repos = remote$repos, type = remote$pkg_type, dependencies = NA, ..., quiet = quiet, out_dir = out_dir, skip_if_log_exists = skip_if_log_exists) : formal argument "repos" matched by multiple actual arguments
Installation failed: install_packages(package_name, repos = remote$repos, type = remote$pkg_type, dependencies = NA, ..., quiet = quiet, out_dir = out_dir, skip_if_log_exists = skip_if_log_exists) : formal argument "repos" matched by multiple actual arguments
Installation failed: install_packages(package_name, repos = remote$repos, type = remote$pkg_type, dependencies = NA, ..., quiet = quiet, out_dir = out_dir, skip_if_log_exists = skip_if_log_exists) : formal argument "repos" matched by multiple actual arguments
Installation failed: install_packages(package_name, repos = remote$repos, type = remote$pkg_type, dependencies = NA, ..., quiet = quiet, out_dir = out_dir, skip_if_log_exists = skip_if_log_exists) : formal argument "repos" matched by multiple actual arguments
Running command C:/PROGRA~1/R/R-34~1.3/bin/x64/Rcmd.exe
Arguments:
INSTALL
C:/Users/username/AppData/Local/Temp/RtmpIJj3Sj/devtools33e84f791ef2/r-hub-rhub-352458b
--library=C:/Users/username/Documents/R/win-library/3.4
--install-tests
ERROR: dependencies 'parsedate', 'prettyunits', 'rappdirs', 'whoami' are not available for package 'rhub'
* removing 'C:/Users/username/Documents/R/win-library/3.4/rhub'
In R CMD INSTALL
Installation failed: run(bin, args = real_cmdargs, stdout_line_callback = real_callback(stdout), stderr_line_callback = real_callback(stderr), stdout_callback = real_block_callback, stderr_callback = real_block_callback, echo_cmd = echo, echo = show, spinner = spinner, error_on_status = fail_on_status, timeout = timeout) : System command error
Running command C:/PROGRA~1/R/R-34~1.3/bin/x64/Rcmd.exe
Arguments:
build
C:\Users\username\AppData\Local\Temp\RtmpIJj3Sj\devtools33e814393089\hadley-devtools-7f5a683
--no-resave-data
--no-manual
* checking for file 'C:\Users\username\AppData\Local\Temp\RtmpIJj3Sj\devtools33e814393089\hadley-devtools-7f5a683/DESCRIPTION' ... OK
* preparing 'devtools':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ...Warning: running command '"C:/PROGRA~1/R/R-34~1.3/bin/x64/Rscript" --vanilla --default-packages= -e "tools::buildVignettes(dir = '.', tangle = TRUE)"' had status 1
ERROR
Error: file 'C:/Users/username/AppData/Local/Temp/RtmpU3YzGM/Rbuild2b8017d6415/devtools/DESCRIPTION' is not in valid DCF format
In addition: Warning message:
In read.dcf(dfile, keep.white = .keep_white_description_fields) :
cannot open compressed file 'C:/Users/username/AppData/Local/Temp/RtmpU3YzGM/Rbuild2b8017d6415/devtools/DESCRIPTION', probable reason 'Permission denied'
Execution halted
Installation failed: run(bin, args = real_cmdargs, stdout_line_callback = real_callback(stdout), stderr_line_callback = real_callback(stderr), stdout_callback = real_block_callback, stderr_callback = real_block_callback, echo_cmd = echo, echo = show, spinner = spinner, error_on_status = fail_on_status, timeout = timeout) : System command error
It appears that although the DESCRIPTION file specifies to use knitr, devtools::install_github and devtools::check is using tools::buildVignettes.
Also, if just running
devtools::build_vignettes
No error is received and vignettes are built.
I also have a macOS machine running current version of R and a Windows 2012 Server running R 3.3.x and both can build vignettes and install packages without getting a similar error.
How do I work around these build errors?
#alistaire noticed the dependency errors - after resolving those, I'm still getting the same problem building packages with vignettes, checking packages, etc.:
> devtools::install_github("hadley/devtools", build_vignettes = TRUE, force = TRUE)
Downloading GitHub repo hadley/devtools#master
from URL https://api.github.com/repos/hadley/devtools/zipball/master
Installing devtools
Running command C:/PROGRA~1/R/R-34~1.3/bin/x64/Rcmd.exe
Arguments:
build
C:\Users\username\AppData\Local\Temp\RtmpQpK5Yi\devtools25f4138c52c3\hadley-devtools-7f5a683
--no-resave-data
--no-manual
* checking for file 'C:\Users\username\AppData\Local\Temp\RtmpQpK5Yi\devtools25f4138c52c3\hadley-devtools-7f5a683/DESCRIPTION' ... OK
* preparing 'devtools':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ...Warning: running command '"C:/PROGRA~1/R/R-34~1.3/bin/x64/Rscript" --vanilla --default-packages= -e "tools::buildVignettes(dir = '.', tangle = TRUE)"' had status 1
ERROR
Error: file 'C:/Users/username/AppData/Local/Temp/RtmpKe9Atv/Rbuild24501e281f1c/devtools/DESCRIPTION' is not in valid DCF format
In addition: Warning message:
In read.dcf(dfile, keep.white = .keep_white_description_fields) :
cannot open compressed file 'C:/Users/username/AppData/Local/Temp/RtmpKe9Atv/Rbuild24501e281f1c/devtools/DESCRIPTION', probable reason 'Permission denied'
Execution halted
Installation failed: run(bin, args = real_cmdargs, stdout_line_callback = real_callback(stdout), stderr_line_callback = real_callback(stderr), stdout_callback = real_block_callback, stderr_callback = real_block_callback, echo_cmd = echo, echo = show, spinner = spinner, error_on_status = fail_on_status, timeout = timeout) : System command error
Although the message references permission denied, these errors appear to occur regardless of user permissions. I've run as my regular user and as Administrator.

Can't create dplyr src backed by SparkSQL in dplyr.spark.hive package

Recently I found out about great dplyr.spark.hive package that enables dplyr frontend operations with spark or hive backend .
There is an information on how to install this package in package's README :
options(repos = c("http://r.piccolboni.info", unlist(options("repos"))))
install.packages("dplyr.spark.hive")
and there are also many examples on how to work with dplyr.spark.hive when one is already connected to hiveServer - check this.
But I am not able to connect to hiveServer, so I can not benefit from the great power of this package...
I've tried such commands, but they did not work out. Does anyone have any solution or comment on what am I doing wrong?
> library(dplyr.spark.hive,
+ lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
>
> Sys.setenv(SPARK_HOME = "/opt/spark-1.5.0-bin-hadoop2.4")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
>
> my_db = src_SparkSQL()
Error in .jfindClass(as.character(driverClass)[1]) : class not found
>
> my_db = src_SparkSQL(host = 'jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl',
+ port = 10000)
Error in .jfindClass(as.character(driverClass)[1]) : class not found
>
> my_db = src_SparkSQL(start.server = TRUE)
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
>
> my_db = src_SparkSQL(start.server = TRUE,
+ list(spark.num.executors='5', spark.executor.cores='5', master="yarn-client"))
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
EDIT 2
I have set more paths to system variables like this but now I receive a warning telling me that some kind of Java logging-configuration is not specified bu I think it is
> library(dplyr.spark.hive,
+ lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
3: package ‘SparkR’ was built under R version 3.2.1
>
> Sys.setenv(SPARK_HOME = "/opt/spark-1.5.0-bin-hadoop2.4")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HADOOP_HOME="/usr/share/hadoop")
> Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop")
> Sys.setenv(PATH='/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/share/hadoop/bin:/opt/hive/bin')
>
>
> my_db = src_SparkSQL()
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
My log properties are not empty.
-bash-4.2$ wc /etc/hadoop/log4j.properties
179 432 6581 /etc/hadoop/log4j.properties
EDIT 3
My exact call to the scr_SparkSQL() is
> detach("package:SparkR", unload=TRUE)
Warning message:
package ‘SparkR’ was built under R version 3.2.1
> detach("package:dplyr", unload=TRUE)
> library(dplyr.spark.hive, lib.loc = '/opt/wpusers/mkosinski/R/x86_64-redhat-linux-gnu-library/3.1')
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> my_db = src_SparkSQL()
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
And then the proces does not stop (never).
Where those settings work for beeline with such params:
beeline -u "jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl" -n mkosinski --outputformat=tsv --incremental=true -f sql_statement.sql > sql_output
but I am not able to pass user name and dbname to src_SparkSQL()
so I have tried to manual use the code from inside that function but I receive the sam problem that the below code also does not finish
host = 'tools-1.hadoop.srv'
port = 10000
driverclass = "org.apache.hive.jdbc.HiveDriver"
Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
library(RJDBC)
dr = JDBC(driverclass, Sys.getenv("HADOOP_JAR"))
url = paste0("jdbc:hive2://", host, ":", port)
class = "Hive"
con.class = paste0(class, "Connection") # class = "Hive"
# dbConnect_retry =
# function(dr, url, retry){
# if(retry > 0)
# tryCatch(
# dbConnect(drv = dr, url = url),
# error =
# function(e) {
# Sys.sleep(0.1)
# dbConnect_retry(dr = dr, url = url, retry - 1)})
# else dbConnect(drv = dr, url = url)}
#################
##con = new(con.class, dbConnect_retry(dr, url, retry = 100))
#################
con = new(con.class, dbConnect(dr, url, user = "mkosinski", dbname = "loghost"))
Maybe the url should containg also /loghost - the dbname?
I now see that you tried multiple things with multiple errors. Let me comment error by error.
my_db = src_SparkSQL()
Error in .jfindClass(as.character(driverClass)[1]) : class not found
The RJDBC object could not be created. Unless we solve this, nothing else will work, workarounds or not. Have you set HADOOP_JAR with, for instance,
Sys.setenv(HADOOP_JAR = "../spark/assembly/target/scala-2.10/spark-assembly-1.5.0-hadoop2.6.0.jar"). Sorry I seem to have skipped this in the instructions. Will fix.
my_db = src_SparkSQL(host = 'jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl',
+ port = 10000)
Error in .jfindClass(as.character(driverClass)[1]) : class not found
Same problem. Please note host port argument do not accept URL syntax, just host and port. URL is formed internally.
my_db = src_SparkSQL(start.server = TRUE)
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
Stop thriftserver first or connect to existing one, but you still have to fix the class not found problem.
my_db = src_SparkSQL(start.server = TRUE,
+ list(spark.num.executors='5', spark.executor.cores='5', master="yarn-client"))
Error in start.server() :
Couldn't start thrift server:org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 37580. Stop it first.
In addition: Warning message:
running command 'cd /opt/tech/prj_bdc/pmozie_status/user_topics;/opt/spark-1.5.0-bin-hadoop2.4/sbin/start-thriftserver.sh ' had status 1
Same as above.
Plan:
Set HADOOP_JAR. Find host and port of running thriftserver, if not default. Try src_SparkSQL with start.server = FALSE. If happy quit, else goto step 2
Stop existing thriftserver. Try again src_SparkSQL with start.server = TRUE
Let me know how things go.
There was a problem that I did't specify the proper classPath that was needed inside JDBC function that created a driver. Parameters to classPath in dplyr.spark.hive package are passed via HADOOP_JAR global variable.
To use JDBC as a driver to hiveServer2 (through the Thrift protocol) one need to add at least those 3 .jars with Java classes to create a proper driver
hive-jdbc-1.0.0-standalone.jar
hadoop/common/lib/commons-configuration-1.6.jar
hadoop/common/hadoop-common-2.4.1.jar
versions are arbitrary and should be compatible with the installed version of local hive, hadoop and hiveServer2.
They need to be set with the .Platform$path.sep (as described here)
classPath = c("system_path1_to_hive/hive/lib/hive-jdbc-1.0.0-standalone.jar",
"system_path1_to_hadoop/hadoop/common/lib/commons-configuration-1.6.jar",
"system_path1_to_hadoop/hadoop/common/hadoop-common-2.4.1.jar")
Sys.setenv(HADOOP_JAR= paste0(classPath, collapse=.Platform$path.sep)
Then when HADOOP_JAR is set one have to be carefull with hiveServer2 url. In my case it had to be
host = 'tools-1.hadoop.srv'
port = 10000
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
and finally the proper connection with hiveServer2 using RJDBC package is
Sys.setenv(HADOOP_HOME="/usr/share/hadoop/share/hadoop/common/")
Sys.setenv(HIVE_HOME = '/opt/hive/lib/')
host = 'tools-1.hadoop.srv'
port = 10000
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
driverclass = "org.apache.hive.jdbc.HiveDriver"
library(RJDBC)
.jinit()
dr2 = JDBC(driverclass,
classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
#"/opt/hive/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/hadoop-common-2.4.1.jar"),
identifier.quote = "`")
url = paste0("jdbc:hive2://", host, ":", port, "/loghost;auth=noSasl")
dbConnect(dr2, url, username = "mkosinski") -> cont

Resources