Error with dbConnect to Snowflake via Rscript (but not R Studio) - r

I have successfully connected/queried Snowflake from R Studio using an ODBC driver. When I try the code in Rgui.exe, it also works. However, in Rterm (or calling rScript from a batch script), it does not. Rterm returns the following error:
OOB curl_easy_perform() failed: SSL peer certificate or SSH remote key was not OK
My R code is:
library(ROracle)
library(methods)
username <- keyring::key_list("blake-snowflake")[1,2]
password <- keyring::key_get("blake-snowflake", keyring::key_list("my-snowflake")[1,2])
### connect to EDW
con_snowflake <- dbConnect(
odbc::odbc(),
"EDW_sample",
uid=username,
pwd=password)

I switched from using ODBC to JDBC.
library(RJDBC)
jdbcDriver <- JDBC(driverClass="com.snowflake.client.jdbc.SnowflakeDriver", classPath = "..\\java\\snowflake-jdbc-3.7.2.jar")
con_snowflake <- dbConnect(jdbcDriver, "jdbc:snowflake://xxx.snowflakecomputing.com/", keyring::key_list("my-snowflake")[1,2], keyring::key_get("my-snowflake", keyring::key_list("my-snowflake")[1,2]), db="db_name", schema="schema_name")
### read in data
query = readr::read_file("...\\query.sql")
df <- ROracle::dbGetQuery(con_snowflake, query)

Related

dbConnect works in Rstudio.exe and RGui.exe but fails as an executable in RScript.exe

I am trying to create a RScript file that can be run like an executable. I have R code that connects to a Microsoft Azure SQL Server database which uses active directory password authentication, queries the database, and writes a csv report. I created a DSN for the database and have used the following code to successfully connect to the database in both 32-bit and 64-bit environments of both RStudio.exe and RGui.exe:
library(DBI)
library(tidyverse)
library(profvis)
show("Library Installed...")
pause(2)
CON <- dbConnect(odbc::odbc(), "My_DSN", uid = "UserName", pwd = "Password", timeout = 10)
show("Database Connected...")
pause(2)
SQL <- "SELECT * FROM Table"
DATA <- dbGetQuery(CON, SQL)
show("Data Extracted...")
pause(2)
NAME = unique(DATA$Name)
DATA.INDIVIDUAL = list()
for (i in NAME){
DATA.INDIVIDUAL[[i]] <- DATA %>% filter(Name == i) %>% select("Field1", "Field2", "Field3")
write.csv(DATA.INDIVIDUAL[[i]], paste("C:/My Documents/", i, "/Report.csv", sep = ""), row.names = FALSE)
show(paste("Exported",i))
pause(2)
}
I have also connected by explicitly naming the database with
library(DBI)
con <- dbConnect(odbc::odbc(), uid = "UserName", pwd = "Password", Driver = "ODBC Driver 17 for SQL Server", Server = "ServerName", Database = "DBName", Authentication = "ActiveDirectoryPassword")
show("Database Connected...")
However, when I use RScript.exe to run the same code (both versions), NULL is printed to the RScript command line (before the output "Database Connected..." is printed) and the application exits without completing the rest of my code. Why would Rstudio and RGui connect but RScript fail to connect? Why is there no error, just NULL printed?
Appreciate any help!
For future reference I found the cause. The problem is in the odbc driver, in applications without an user interface, CoInitialize is not called.
See
https://github.com/r-dbi/odbc/issues/343
Edit: updating the driver solved the problem

Athena Connection with R

I am new to Athena. I want to connect this with R
Sys.getenv()
URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC42_2.0.14.jar'
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
drv <- JDBC(driverClass="com.simba.athena.jdbc.Driver", fil, identifier.quote="'")
This is the error message
Error in .jfindClass(as.character(driverClass)[1]) :
java.lang.ClassNotFoundException
Referred this article
https://aws.amazon.com/blogs/big-data/running-r-on-amazon-athena/
con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.ap-south-1.amazonaws.com:443/',
s3_staging_dir="s3://aws-athena-query-results-ap-south-1-region/",
user=("xxx"),
password=("xxx"))
Need help really struggling from two days
Thanks in advance. I downloaded jar files and java.
You are using a newer driver version and the driver is now developed by simba and therefore the driver class name has changed.
The driver class is now com.simba.athena.jdbc.Driver.
You may also want to check out AWR.Athena - A nice R package to interact with Athena.
If you are still having difficulty with the JDBC drivers for Athena you could always try: RAthena or noctua. These two packages opt in using AWS SDK to make the connection to AWS Athena.
RAthena uses Python boto3 sdk (similar to pyathena), through reticulate.
noctua uses R paws sdk.
Code Example:
library(DBI)
# connect to AWS
# using ~/.aws/credentials to store aws credentials
con <- dbConnect(RAthena::athena(),
s3_staging_dir = "s3://mybucket/")
# upload some data into aws athena
dbWriteTable(con, "iris", iris)
# query iris in aws athena
dbGetQuery(con, "select * from iris")
NOTE: noctua works extactly the same way as code example above but instead the driver is: noctua::athena()

Error in dbDriver("PostgreSQL") : could not find function "dbDriver"

I have a shiny-server set up on an Amazon Web Services instance, I am trying to get my app.R onto it but am getting this error:
Error in dbDriver("PostgreSQL") : could not find function "dbDriver"
Calls: runApp ... sourceUTF8 -> eval -> eval -> ..stacktraceon.. -> get_query
Execution halted
I think it has to do with the library install of the package DBI, but I've tried installing it again on the instance and haven't been successful.
Not sure what to try next..
Here's the whole image of the error and I can add any other information required:
Also I can confirm that the shiny-server is installed correctly because this page loads normally:
This is how I've tried to install my packages in the instance:
sudo su - -c "R -e \"install.packages(c('shiny', 'shinythemes', 'shinycssloaders', 'dplyr', 'xlsx','ggplot2','ggthemes','DT','stringr','RPostgreSQL','tidyr','dbplyr', DBI','splitstackshape'), repos='http://cran.rstudio.com/')\""
and dbDriver is a function in the DBI package
This is part of what my app.R code contains:
required_packages <- c("shiny", "shinythemes", "shinycssloaders", "dplyr", "xlsx","ggplot2","ggthemes","DT","stringr","RPostgreSQL","tidyr","dbplyr","DBI","splitstackshape"
,"magrittr","tidyverse","shinyjs","data.table","plotly")
absent_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(absent_packages)) install.packages(absent_packages)
set.seed(1)
get_query <- function(querystring){
# create a connection
# loads the PostgreSQL driver
drv <- dbDriver("PostgreSQL")
# creates a connection to the postgres database
# note that "con" will be used later in each connection to the database
con <- dbConnect(drv, dbname = "postgres", host = "/var/run/postgresql", port = 5432, user = "postgres", password = "pw")
on.exit(dbDisconnect(con))
#rstudioapi::askForPassword("Database password")
query <- eval(parse(text = querystring))
return(query)
}
And these are the tables and connection info to the postgreSQL database on the same instance:
If I add DBI:: in front of dbConnect() and dbDisconnect() and used RPostgres::Postgres() as the driver in the dbConnect function I get this error:
Installing a package does not mean it is loaded into your namespace. Further, the use of dbDriver is deprecated, as shown in ?dbDriver:
These methods are deprecated, please consult the documentation of the individual backends for the construction of driver instances.
I suggest either explicitly loading DBI or using DBI:: with each call to its functions (not a bad idea anyway):
library(DBI)
get_query <- function(querystring){
# create a connection
# save the password that we can "hide" it as best as we can by collapsing it
# creates a connection to the postgres database
# note that "con" will be used later in each connection to the database
con <- DBI::dbConnect(RPostgres::Postgres(), dbname = "postgres", host = "/var/run/postgresql", port = 5432, user = "postgres", password = "pw")
on.exit(DBI::dbDisconnect(con))
#rstudioapi::askForPassword("Database password")
query <- eval(parse(text = querystring))
return(query)
}
(Again, you don't need to do both library(DBI) and use DBI::, you choose.)
I used RPostgres::Postgres() here, but this applies also to many other drivers, including RPostgreSQL::PostgreSQL(), RSQLite::SQLite(), and rodbc::odbc() (several others exist).
Further points, though I don't know what else you have going on here to be certain:
making a connection each time you call this function can get "expensive"; consider connecting outside of this function and passing in your con object; if this is a one-or-two-times thing, then you might be alright as-is;
the use of eval(parse(...)) seems wrong ... executing user-provided queries is flat-out dangerous, look up "SQL Injection" if you are not familiar. Why not just DBI::dbGetQuery(con, querystring)?

Connecting to Hive in R

I am trying to connect to hive in R. I have loaded RJDBC and rJava libraries on my R env.
I am using a Linux server with hadoop (hortonworks sandbox 2.1) and R (3.1.1) installed in the same box. This is the script I am using to connect:
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/default")
I get this error:
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :java.lang.NoClassDefFoundError: Could not initialize class org.apache.hive.service.auth.HiveAuthFactory
I have checked that my classpath contains all the jar files in /usr/lib/hive and /usr/lib/hadoop,but can not be sure if anything else is missing. Any idea what is causing the problem??
I am fairly new to R (and programming for that matter) so any specific steps are much appreciated.
I succesffuly connect to Hive from R just with RJDBC and a few configuration lines. I prefere RJDBC to rHive because rHive needs complex installations on all the node of the cluster (and I don't really understand why).
Here is my R solution :
#loading libraries
library("DBI")
library("rJava")
library("RJDBC")
#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)
#init of the connexion to Hive server
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")
#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
You can simply connect to hiveserver2 from R using the RHIVE package
Below are the commands that I had to use.
Sys.setenv(HIVE_HOME="/usr/local/hive") Sys.setenv(HADOOP_HOME="/usr/local/hadoop") rhive.env(ALL=TRUE) rhive.init() rhive.connect("localhost")

Connectivity between R and Hive

I am trying to establish a connection between RStudio (on my machine) and Hive (which is setup on a different server). Here's my R code:
install.packages("RJDBC",dep=TRUE)
require(RJDBC)
drv <- JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver",
classPath = list.files("C:/Users/37/Downloads/hive-jdbc-0.10.0.jar",
pattern="jar$",full.names=T),
identifier.quote="'")
Here is the error I get while executing the above commands:
Error in .jfindClass(as.character(driverClass)1) : class not found
conn <- dbConnect(drv, "jdbc:hive2://65.11.23.453:10000/default", "admin", "admin")
I downloaded the jar files from here and placed them in the CLASSPATH. Please advise if am doing anything wrong and how I could get this to work.
Thanks.
If you have a cloudera, check version and download jars for that.
Example
CDH 5.9.1
hadoop-common-2.6.0-cdh5.9.1.jar
hive-jdbc-1.1.1-standalone.jar
copy the jars into a folder into R host and execute:
library("DBI")
library("rJava")
library("RJDBC")
#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/home/youruser/rlibs/hadoop-common-2.6.0-cdh5.9.1.jar", "/home/youruser/rlibs/hive-jdbc-1.1.1-standalone.jar")
.jinit(classpath=cp)
#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", /home/youruser/rlibs/hive-jdbc-1.1.1-standalone.jar", identifier.quote="`")
conn <- dbConnect(drv,"jdbc:hive2://HiveServerHostInYourCluster:10000/default;", "YourUserHive", "xxxx")
#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
I tried this sample code and it worked for me:
library(RJDBC)
#Load Hive JDBC driver
hivedrv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver",
c(list.files("/home/amar/hadoop/hadoop",pattern="jar$",full.names=T),
list.files("/home/amar/hadoop/hive/lib",pattern="jar$",full.names=T)))
#Connect to Hive service
hivecon <- dbConnect(hivedrv, "jdbc:hive://ip:port/default")
query = "select * from mytable LIMIT 10"
hres <- dbGetQuery(hivecon, query)
Same error happened to me earlier when I was trying to use RJDBC to connect to Cassandra, it was solved by putting the Cassandra JDBC dependencies in your JAVA ClassPath.
See this answer:
For anyone who finds this post there are a couple things you can try to fix the problem:
1.) reinstall rJava from source install.packages("rJava","http://rforge.net/",type="source")
2.) Initiate java debugger for loading and try to connect again
.jclassLoader()$setDebug(1L)
3.) I've had to use both Sys.setenv(JAVA_HOME = /Path/to/java) before and utilize dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/server/libjvm.dylib') to find the right jvm library.
4.) As stated rJava load error in RStudio/R after "upgrading" to OSX Yosemite, you can also create a link between the the libjvm.dylib to /usr/local/lib
sudo ln -f -s $(/usr/libexec/java_home)/jre/lib/server/libjvm.dylib /usr/local/lib
If all of these fail, a uninstall and install of R has also worked for me in the past.
This has helped me so far.
1) First check if the hive service is running, if not restart it.
sudo service hive-server2 status
sudo service hive-server2 restart
2) install rJava and RJDBCin R.
library(rJava)
library(RJDBC)
options(java.parameters = '-Xmx8g')
hadoop_jar_dirs <- c('/usr/lib/hadoop/lib',
'/usr/lib/hadoop',
'/usr/lib/hive/lib')
clpath <- c()
for (d in hadoop_jar_dirs) {
clpath <- c(clpath, list.files(d, pattern = 'jar', full.names = TRUE))
}
.jinit(classpath = clpath)
.jaddClassPath(clpath)
hive_jdbc_jar <- '/usr/lib/hive/lib/hive-jdbc-2.1.1.jar'
hive_driver <- 'org.apache.hive.jdbc.HiveDriver'
hive_url <- 'jdbc:hive2://localhost:10000/default'
drv <- JDBC(hive_driver, hive_jdbc_jar)
conn <- dbConnect(drv, hive_url)
show_databases <- dbGetQuery(conn, "show databases")
show_databases
Make sure to give correct path to hadoop_jar_dirs, hive_jdbc_jar and hive_driver.
I wrote a package for dealing with this (and kerberos):
devtools::install_github('nfultz/hiveuberjar')
require(DBI)
con <- dbConnect(hiveuberjar::HiveUber(), url="jdbc://host:port/schema")
dbListTables(con)
dbGetQuery(con, "Select count(*) from nfultz.iris limit 10")

Resources