ODBC Hive Connection makes Rstudio crash // Connection pane issue? - r

I use Rstudio Server 1.1.453 / R version 3.5.2 and when I try to initiate a Hive connection through ODBC, RStudio crashes.
The code I run :
library(odbc)
library("DBI")
con <- DBI::dbConnect(odbc(), Driver = "/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so",
Host = "myserver",
port = 10000,
Schema = "default",
UseSASL = 0,
AuthMech=3,
UID="myuser",
password="mypassword",
TrustedCerts="/home/centos/truststore.pem",
AllowSelfSignedServerCert=1,
SSL=1)
dbGetQuery(con, "show databases")
The crash message (pretty generic, isn't it...)
enter image description here
The most weard thing is if I run the same query directly in a terminal by enabling an another R session or if I run the same code into a reprex function, I can query hive table right after the connection has been made.
So my questions :
As an intuitive solution, I'd like to test to have no interaction with the RStudio connection pane. Is there a way to initiate a such connection without any interaction or results into the connection pane ?
Is there any other solution I could test ?
How could I log what Rstudio try to do when I run the code ?
Thanks
Note : I don't have any issue to establish an impala connection with the help of the implyr package

I found a workaround with the callr package which is not so bad considering I run only basic queries in hive with Rstudio (create table).
callr:r(function(){
library(odbc)
con <- dbConnect(odbc::odbc(), Driver = '/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so',
Host = 'server',
Port = 10000,
Schema = 'default',
AuthMech=3,
UID='',
PWD='',
TrustedCerts='/home/centos/truststore.pem',
AllowSelfSignedServerCert=1,
SSL=1)
dbGetQuery(con, 'SHOW DATABASES')} )

Related

Connecting to Azure Databricks from R using jdbc and sparklyr

I'm trying to connect my on-premise R environment to an Azure Databricks backend using sparklyr and jdbc. I need to perform operations in databricks and then collect the results locally. Some limitations:
No RStudio available, only a terminal
No databricks-connect. Only odbc or jdbc.
The configuration with odbc + dplyr is working, but it seems too complicated, so I would like to use jdbc and sparklyr. Also, if I use RJDBC it works, but it would be great to have the tidyverse available for data manipulation. For that reason I would like to use sparklyr.
I've the jar file for Databricks (DatabricksJDBC42.jar) in my current directory. I downloaded it from: https://www.databricks.com/spark/jdbc-drivers-download. This is what I got so far:
library(sparklyr)
config <- spark_config()
config$`sparklyr.shell.driver-class-path` <- "./DatabricksJDBC42.jar"
# something in the configuration should be wrong
sc <- spark_connect(master = "https://adb-xxxx.azuredatabricks.net/?o=xxxx",
method = "databricks",
config = config)
spark_read_jdbc(sc, "table",
options = list(
url = "jdbc:databricks://adb-{URL}.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/{ORG_ID}/{CLUSTER_ID};AuthMech=3;UID=token;PWD={PERSONAL_ACCESS_TOKEN}",
dbtable = "table",
driver = "com.databricks.client.jdbc.Driver"))
This is the error:
Error: java.lang.IllegalArgumentException: invalid method toDF for object 17/org.apache.spark.sql.DataFrameReader fields 0 selected 0
My intuition is that the sc might not be not working. Maybe a problem in the master parameter?
PS: this is the solution that works via RJDBC
databricks_jdbc <- function(address, port, organization, cluster, token) {
location <- Sys.getenv("DATABRICKS_JAR")
driver <- RJDBC::JDBC(driverClass = "com.databricks.client.jdbc.Driver",
classPath = location)
con <- DBI::dbConnect(driver, sprintf("jdbc:databricks://%s:%s/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/%s/%s;AuthMech=3;UID=token;PWD=%s", address, port, organization, cluster, token))
con
}
DATABRICKS_JAR is an environment variable with the path "./DatabricksJDBC42.jar"
Then I can use DBI::dbSendQuery(), etc.
Thanks,
I Tried multiple configurations for master. So far I know that jdbc for the string "jdbc:databricks:..." is working. The JDBC connection works as shown in the code of the PS section.
Configure R studios with azure databricks -> go to cluster -> app -> set up azure Rstudio .
For information refer this third party link it has detail information about connecting azure databricks with R
Alternative approach in python:
Code:
Server_name = "vamsisql.database.windows.net"
Database = "<database_name"
Port = "1433"
user_name = "<user_name>"
Password = "<password"
jdbc_Url = "jdbc:sqlserver://{0}:{1};database={2}".format(Server_name, Port,Database)
conProp = {
"user" : user_name,
"password" : Password,
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
df = spark.read.jdbc(url=jdbc_Url, table="<table_name>", properties=conProp)
display(df)
Output:

Unknown error when trying to connect to postgresql from R with RPostgresql [duplicate]

I'm new to R and I'm trying to connect to PostgreSQL using RStudio.
I've installed the RPostgreSQL and tried the following code:
> library("DBI", lib.loc="~/R/win-library/3.2")
> library("RPostgreSQL", lib.loc="~/R/win-library/3.2")
> con <- dbConnect(dbDriver("PostgreSQL"), dbname="Delta", user="postgres")
Error in postgresqlNewConnection(drv, ...) :
RS-DBI driver: (could not connect postgres#local on dbname "Delta"
I'm not able to connect to the database for some reason. I'm trying to solve this issue for a long time and couldn't figure out how.
My solution to this problem is to use RPostgres https://github.com/rstats-db/RPostgres.
Assuming you have a connection url, the following code will work:
library(DBI)
library(RPostgres)
con <- dbConnect(RPostgres::Postgres(),
host = url$host,
port = url$port,
dbname = url$dbname,
user = url$user,
password = url$password
)
My solution using the odbc package
db <- DBI::dbConnect(odbc::odbc(),
Driver = "{PostgreSQL ODBC Driver(ANSI)}",
Database = "db_name",
UserName = "user",
Password = "pass",
Servername = "localhost",
Port = 5432)
Running into this unclear error, I found RPostgreSQL will work by adjusting the PostgreSQL password encryption from the default of libpq 10 at scram-sha-256 to md5. See this SO post: How can I solve Postgresql SCRAM authentication problem?
I arrived at this fix by using an ODBC driver connection, specifically replacing DBI + RPostgreSQL packages for DBI + odbc which raised a much clearer error:
SCRAM authentication requires libpq version 10 or above.
Changing the authentication worked for both odbc and RPostgreSQL connections in R.
Do note: the user for R-PostgreSQL connection password must be adjusted (even with the same exact one) since the encrypted authentication will be changed:
-- SUPERUSER
ALTER USER postgres WITH PASSWORD 'new or same password';
-- LOGIN USER
ALTER USER myuser WITH PASSWORD 'new or same password';
I had the same problem . I installed DBI after I installed RPostgreSQL and loaded the DBI library seperately. Worked for me

R - handle error when accessing a database

I'm trying to automate data download from db using RJDBC using a for loop. The database im using automatically closes the connection after every 10mins, so what i want to do is somehow catch the error, remake the connection, and continue the loop. In order to do this i need to capture the error somehow, the problem is, it is not an r error so none of the commands trycatch and similar works. I just get a text on the console telling me:
Error in .jcheck() : No running JVM detected. Maybe .jinit() would help.
How do i handle this in terms of:
if (output == ERROR) {remake connection and run dbQuery} else {run dbQuery}
thanks for any help
You could use the pool package to abstract away the logic of connection management.
This does exactly what you expect regarding connection management with DBI.
It should work with RJDBC which is an implentation of DBI, but I didn't test it with this driver.
libray(pool)
library(RJDBC)
conn <- dbPool(
drv = RJDBC::JDBC(...),
dbname = "mydb",
host = "hostadress",
username = "test",
password = "test"
)
on.exit(poolClose(conn))
dbGetQuery(conn, "select... ")

how to read data from Cassandra (DBeaver) to R

I am using Cassandra CQL- system in DBeaver database tool. I want to connect this cassandra to R to read data. Unfortunately the connection takes more time (i waited for more than 2 hours) with RCassandra package. but it does not seem to get connected at all and still loading. Does anyone has any idea on this?
the code as follows:
library(RCassandra)
rc <- RC.connect(host ="********", port = 9042)
RC.login(rc, username = "*****", password = "******")
after this step RC.login, it is still loading for more than 2 hours.
I have also tried using RJDBC package like posted here : How to read data from Cassandra with R?.
library(RJDBC)
drv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
list.files("C:/Program Files/DBeaver/jre/lib",
pattern="jar$",full.names=T))
But this throws error
Error in .jfindClass(as.character(driverClass)[1]) : class not found
None of the answers are working for me from the above link.I am using latest R version 3.4.0 (2017-04-21) and New version of DBeaver : 4.0.4.
For your first approach, which I am less familiar with, should you not have a line that sets the use of the connection?
such as:
library(RCassandra)
c <- RC.connect(host ="52.0.15.195", port = 9042)
RC.login(c, username = "*****", password = "******")
RC.use(c, "some_db")
Did you check logs that you are not getting some silent error while connecting?
For your second approach, your R program is not seeing a driver in a classpath for Java (JMV).
See this entry for help how to fix it.

Connect to MSSQL using DBI

I can not connect to MSSQL using DBI package.
I am trying the way shown in package itself
m <- dbDriver("RODBC") # error
Error: could not find function "RODBC"
# open the connection using user, passsword, etc., as
# specified in the file \file{\$HOME/.my.cnf}
con <- dbConnect(m, dsn="data.source", uid="user", pwd="password"))
Any help appreciated. Thanks
As an update to this question: RStudio have since created the odbc package (or GitHub version here) that handles ODBC connections to a number of databases through DBI. For SQL Server you use:
con <- DBI::dbConnect(odbc::odbc(),
driver = "SQL Server",
server = <serverURL>,
database = <databasename>,
uid = <username>,
pwd = <passwd>)
You can also set a dsn or supply a connection string.
It looks like there used to be a RODBC driver for DBI, but not any more:
http://cran.r-project.org/src/contrib/Archive/DBI.RODBC/
A bit of tweaking has got this to install in a version 3 R but I don't have any ODBC sources to test it on. But m = dbDriver("RODBC") doesn't error.
> m = dbDriver("RODBC")
> m
<ODBCDriver:(29781)>
>
Suggest you ask on the R-sig-db mailing list to maybe find out what happened to this code and/or the author...
Solved.
I used library RODBC. It has great functionality to connect sql and run sql queries in R.
Loading Library:
library(RODBC)
# dbDriver is connection string with userID, database name, password etc.
dbhandle <- odbcDriverConnect(dbDriver)
Running Sql query
sqlQuery(channel=dbhandle, query)
Thats It.

Resources